• 0 Posts
  • 222 Comments
Joined 1 year ago
cake
Cake day: June 7th, 2025

help-circle
  • As much as I would like to not let Nazis hijack certain things, the simple fact is they did, they have, and they continue to. If Nazi-ism were gone and a distant memory, which perhaps even a decade ago I might’ve believed we were nearly at, then we can start working on reclaiming what they have polluted and bringing back its true, full history.

    But that has all changed, the masks are coming off and we are starting to see how far these white supremacists have metastasized and how much power they are beginning to take control of with their fascism. Now is not the time to begin trying to reclaim these symbols, precisely when they are actively using them as dogwhistles and trial balloons, actively spreading their vile ideology and starting to signal to and coordinate with each other to make their ever-so-predictable moves against humanity once again.


  • Because they want at least some of our work to get done while we starve and fight over scraps and eventually die in poverty, conflicts, riots and violence while they hide in their bunkers and build wealthy, gated, libertarian tech-utopia fortresses where the survivors they deem “worthy” might seek refuge if they are willing to bend their knee and pledge fealty to their new tech-gods. When all is said and done, they expect to emerge from their bunkers and rule with an iron fist over the much diminished remnants of humanity, thereby having solved our sustainability problem by letting most of us get killed off, and they can then instruct their totally-dependent disciples to rebuild the world in their utopian image.

    This sounds like insane science fiction plot, but you have to understand these guys have read way too much science fiction and misunderstood most of it, and they believe they have the means to turn it into science reality, and I can’t predict whether they’re going to succeed but I can see the writing on the wall clearly telling me that they’re absolutely going to fucking try. And if we all die in the process (even including them) I’m not too sure they even care that much or are capable of thinking of those sort of consequences. These are not deeply thoughtful people, they think they are, and want us to believe they are, but they’re not.



  • I agree, I’ve been recommending people to try to develop some level of nuance on the topic. I understand the fear, hatred, and loathing of AI; especially the way it’s currently being implemented and used. I really do, and I share 99% of the concerns. But there is room for nuance in the understanding of how it’s being used and what it’s being used for and who is using it, and when nuance leaves the room, we’re blind. And blind hatred is never a good thing and it does not lead to good places.


  • I’m not an expert by any means I’m just a dabbler, but my understanding is: In theory, more parameters make richer, wider, and deeper model knowledge possible, and with extensive enough training, those parameters could all be important. That said, there is a lot of megapixel-like inflation and there is no guarantee that any of those parameters are actually useful so in practice, really “advanced” models tend to do a better job of maximizing the usefulness of the limited parameters they do have to run on smaller devices. In general, I tend towards the highest parameter size of a particular model that I can reasonably run. My typical target range is between 8GB up to maybe 20GB, which depending on model might be in the 9b to 30b parameters range, and I might even be erring on the wrong side of this and maybe I’d even be better off with smaller parameter models.

    There’s also a lot of models nowadays that use “active” parameters, so the model itself will have X parameters, but then it will determine which of those parameters are most relevant to the task or query at hand, and prune off all but the most relevant ones, so you might have a 30B model, but as soon as you run it, it turns itself into a specialized 4B model. You still need to load the whole model into some kind of RAM typically so it can decide which parameters are relevant, but once it does, it will run much faster. This is another way you can try to run larger models on more limited hardware. Older “dense” models that don’t use this technique with all parameters always active are still typically preferred for some tasks like coding, but YMMV.

    Either way, it’s still sort of a crapshoot, there’s a lot of randomness and subjectiveness, and very small parameter models often seem to realistically be able to outperform much bigger models when they are “good”, “well-trained” advanced models, and they will typically be much faster, so if you don’t like the response, it’s much easier to just ask again or retry. I tend to trust the community wisdom when it comes to this, although I also think there’s a lot of cargo-culting and herd-following going on, I don’t know enough to do anything too much different from the herd myself, other than be willing to experiment a little. Latest is not always greatest, but in a field as quickly moving as this it often is. Don’t be afraid to try older models, or less popular models. You’ll often be disappointed, but not always.

    Quantization is a form of compression, basically instead of using floating point precision to weigh the “strengths” of the various parameters (default is typically F16 or 16 bits per parameter weight), they get quantized down to smaller groups of bits. Q4 means you’re using 4 bits (essentially ranking each parameter on an integer scale from 0 to 15 instead of a floating point from 0 to 1) and in practice this is usually almost as good. Q8 would be even closer to the original full-size model, but smaller quants like Q2 and Q3 start losing quality. Other quantization-related techniques like i-Matrix (imat) map these values non-linearly and situationally, which is particularly helpful on quantizations Q3 and smaller, which are then called IQ3. The community has adopted Q4 as pretty much the go-to quantization level as the best available compromise between having more parameters being squeezed into less memory without destroying the inherent accuracy of those parameters.



  • For chat usage (which is strictly a more efficient way to generate code on the LLM’s part, although you have to keep carefully guided and compartmentalized otherwise it typically requires a lot more testing and sometimes back-and-forth iteration on your part) 12GB is plenty to run many decent LLMs, you’ll typically want to use a Q4 quantization to make models with larger parameter fit into smaller memory, sometimes an IQ2 or IQ3 if you really want a particular model.

    For agentic usage (where the LLM is trained and optimized to use a harness like this to start requesting tool calls and getting their results and using the results of the tool calls to inform what it’s trying to do) it’s quite a bit more challenging to do on consumer hardware at a tolerable speed. The tools often generate large amounts of output which then take a long time to process, and the models and harnesses are both typically quite a bit stupider about using your limited resources efficiently. If you’re using to commercial “frontier” agentic models like Claude Code you’re going to have a bad time.

    That said, it is absolutely possible to do agentic AI on consumer hardware (just the GPU you have, not 6 of them), as long as you’re reasonably patient, using a harness properly tuned for efficiency. Out-of-the-box, many if not most are designed for remote API usage, even the “open source, local” ones realistically rely on free tier APIs and are inherently wasteful in terms of them not really caring how many tokens you burn in these remote datacenters and they’re expecting to just be able to iterate over and over again until they get it right. You don’t have that luxury when you’re getting slow tokens.

    Is PewDiePie’s any better or more efficient? I don’t know, I haven’t tried it yet. I prefer more minimal harnesses personally, OpenCode is about the most usable I’ve found personally, although I’m starting to experiment with Pi-mono (called Pi, but that’s unsearchable) which seems very promising, and I know quite a few people who have had good successful agent usage with Hermes Agent.

    I’m not going to pretend it’s going to be easy or that you’ll necessarily have very good results. I am pretty lukewarm on AI as a whole, but I am personally deeply invested in making sure I have fully local access to it in as much capacity as is currently technologically possible, as a personal digital sovereignty issue.

    As for hardware, I have a 12GB card myself and you don’t really need to fit everything into VRAM these days. I have an AMD X3D CPU which allows me to offload some of the model to system RAM with pretty decent performance, maybe it’s prohibitive on different architectures or configurations I don’t know but it’s worth a try. glm-4.7-flash:Q4_K_M from ollama is the model I’ve had the most consistent success with and with ollama running it with the context window set to 50,000 (context should also be set to be quantized to Q4_K_M), I end up with almost half of it offloaded to system RAM and it still runs quite fast thanks to the flash attention feature. I’ve worked with gemma4 quite a lot too and it’s definitely really fast but it’s also a bit unstable/weird at times, at least the heretic version hf.co/Stabhappy/gemma-4-26B-A4B-it-heretic-GGUF:Q4_K_M I’m running is. Still, if you really do need to fit everything into a smaller set of RAM you might try the gemma4 E4B models which clock in around 9GB when quantized. Qwen3.6 is I guess supposed to be really good too and should fit nicely on your 12GB card, but I haven’t had much opportunity to play with it yet. Qwen3 and 3.5 felt rather disappointing to me for agentic use but YMMV.

    You’re not completely going to outsource all software and all code you write to AI using a local model, the way companies are doing with those commercial models. But I consider that an advantage, not a flaw. I find it’s much more useful to have it help, suggest and advise, not to completely replace everything I’m doing. Yes, sometimes it’s slow and sometimes it’s wrong, but so are other people when I ask them sometimes. I’m prepared for it, and you should be too. Don’t get complacent.






  • cecilkorik@piefed.catoFuck AI@lemmy.worldListen to JC Denton
    link
    fedilink
    English
    arrow-up
    24
    ·
    10 days ago

    He only merges with an AI if you choose the wrong ending.

    Given that we know the AI techbros are obviously sociopaths with endless greed, and the old world order of illuminati seem to be a bunch of creepy pedos, I am forced to the conclusion that Tracer Tong knows what’s up and he’s the one we should be listening to, nevermind JC Denton.

    Cutscene: Area 51 is destroyed by the massive blasts, cutting world communication networks and destroying both Bob Page and Helios. This thrusts the world into a technological blackout, but also frees it from media control by the Illuminati.

    “Yesterday we obeyed kings and bent our necks before emperors. But today we kneel only to truth…” — Kahlil Gibran


  • I know everyone has different learning styles, but you actually want the learning to be rote, like a book, despite using an inherently interactive media like a game? I would consider that a flaw, and perhaps even the wrong tool for the job.

    If you want a textbook there are lots of textbooks available, lots of videos on youtube that explain topics in depth and in various levels from surface level to tedious detail, what do you even need the game part for? It’s way more work to make something like a game, and what is the game part doing besides getting in the way?

    The point of a game is to be fun and allow you to experiment more creatively with different ideas to see how they work in practice so you can learn a thing by doing the thing. It’s a form of hands-on learning, and that’s great.

    Something like Kerbal Space Program or Factorio or Cities: Skylines will force you to learn about things like orbital mechanics, telemetry, logic, data organization, observability, and traffic optimization just to proceed meaningfully in the game without ever explicitly telling you that you’re learning. A flight simulator or a farm simulator or a racing simulator or an offroad simulator allows you to experiment with and be challenged with (sometimes in very accurate detail) real issues that the real people in those fields have to deal with and with additional learning materials (again, books, videos) you will be able to learn the same way they do. That’s about as close as games get to being the kind of learning material you seem to be looking for. Maybe you need to be explicitly told what you’re learning, I get it, but if you’re not then allowed to play around with those concepts in real-time, why are you bothering with the game? Games are really not an ideal medium for that kind of education.

    If you want book-learning, use a book? Use game-learning for the kind of things gaming is good at.




  • Yes. mine is exposed publicly (with fail2ban) on a VPS with a public IP and a public DNS name and it’s fine. Use a minimal configuration that meets your needs, use secure passwords like you would for any public service and keep it up to date, and stay aware of any potential news that might make you aware of any severe and widespread vulnerabilities in the future (there haven’t been any in Nextcloud so far). It is not nearly as terrifying as people make it out to be to share public services on the public internet. Most decent software is secure-by-default. Yes vulnerabilities and attacks can happen but they are the exception not the rule.