Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 9 hours ago

Not just them. GLM, Qwen, Kimi, Stepfun, Baidu’s models. Z-Image. Small finetuners, Huawei’s prototype. There’s even a Chinese fast food chain that trains a ridiculously good audio/text mixed model (Longcat).

I actually thought the recent Deepseek preview was a little underwhelming and “deep fried” compared to competition, though maybe it’s just underbaked. And the architecture is interesting.

Gemma is great, too, if Google would actually unrestrain it and give it Gemini’s architecture.

Europe is struggling though. Mistral (and everyone else) basically can’t do anything because the EU left regulation ambiguous; however strictly they regulate AI (and it should be pretty strict), anything is better than “we have no idea if we’ll get litigated, the law is clear as mud and might change?” They have at least one communal training project too, but everything I’ve seen is weirdly dated, architecture wise, like they’re living two years in the past.

brucethemoose@lemmy.world · edit-2 9 hours ago

A constriction on GPUs is literally the best thing to ever happen to Chinese ML dev.

It made them thrifty, it made them focus, it forced them to go open weights, it made them build proper ASICs, research new techniques, pay engineers to implement them, and now their models are supremely efficient, dirt cheap, running Nvidia free on Huawei NPUs, and close to better tools than the US models.

Meanwhile, US models are all (except maybe Google) enshittifying and getting benchmaxxed. Engineers are wasting man hours hopelessly trying to scale training, which does not scale like people think, and are literally giving GPUs busywork to meet utilization quotas. They’re trying to scale data and parameter count, without improving architecture or data quality or even basic problems like random token sampling, and it’s not working anymore.

At the same time, the big US AI houses have squashed nearly every bit of “garage innovation” I’ve seen. Cool teams, hero devs with proven work on a budget, they all just disappear into the maw of Microsoft or whomever like it’s a black hole, their work never integrated into anything.

US AI is GOING to collapse because we gave all the money to tech bros so they can poison the well. The ML research community has been screaming this since like 2022. And apparently before, as Aaron Swartz allegedly identified Altman as a sociopath right before he died by suicide.

Sorry to rant.

Not that China doesn’t have significant dev issues, to be clear.

Europe, too.

But this is a sensitive point for me. Hobbyist machine learning has been a passion of mine for a decade, and it makes me sick to hear people quote Altman, like throwing GPUs at tech bros going to fix this. That. Is. A. LIE.

I don’t have a solution either. In the AI space, I do not even see a path back to moonshot-style cooperative innovation like the US has repeatedly pulled off before.

brucethemoose@lemmy.world · edit-2 11 hours ago

Yeah. Or SRWare Iron, IIRC. Or DuckDuckGo or Orion on mobile. Cromite. Firefox, Zen, whatever.

There are tons of good options, certainly more than I know. But it’s a hard thing for the average person to research, especially when forks get abandoned or whatever.

brucethemoose@lemmy.world · edit-2 11 hours ago

Can we agree that Brave:

Is scummy.
Has a shady ceo, and a shady history.
Is possibly a security risk.
Is still orders of magnitude better than using Google Chrome.

And that:

This headline is both true and clickbait-ish.
You can turn these things off in Brave’s settings, for free.
That doesn’t make this feature not scummy.

Basically no one should be using Brave, but no one should be using Google Chrome either, yet here we are.

And the revolving door of “best unabandoned Chromium fork to use” (Helium for the moment, or Ungoogled Chromium if you don’t mind some broken features, just to name two), is buried under so much SEO that it’s legitimately difficult to research.

So… I’m not gonna go out of my way to flame Brave users. If they’re trying to do better than Chrome, good! Not-Google is good. They can pay for this I guess. I’m not installing Brave, though, I’m not recommending it, and this certainly isn’t making me want to.

brucethemoose@lemmy.world · edit-2 12 hours ago

Yeah.

At the same time, I think people don’t really understand how toxic Facebook is. It’s a physical health hazard for one relative of mine, a mental one for some others. I’m not exaggerating when I say it would be better for their health if they drank and smoked cigarettes.

brucethemoose@lemmy.world · edit-2 12 hours ago

Even worse: nested API calls to ever-changing black boxes with literally randomized outputs.

brucethemoose@lemmy.world · edit-2 13 hours ago

They’re calling this out because Anthropic is afraid of dirt cheap, “good enough” open weights models undercutting them. Probably very afraid now that even Nvidia is on that boat, with huge Nemotron models.

The real battle isn’t pro AI vs anti AI. It’s closed weights answers-as-a-premium-service vs open weights, hackable tools. It’s Huggingface vs OpenAI. It’s akin to Lemmy vs Reddit.

Why would anyone use Anthropic once people figure out LLMs are configurable tools, not “AGI,” and efficient ones cost like 2 orders of magnitude less to run?

So they want to squash open research. Because businesses are asking about costs now, they don’t realize they can just host assistants on-prem or through dirt cheap competing providers, but they’re starting to figure it out.

brucethemoose@lemmy.world · edit-2 13 hours ago

It’s both smarter in many ways, and extremely dumb in others, even in a really good and secure tool harness (which is rare).

It’s a great assistant and task finisher, but the architecture is just fundamentally not suitable for handing any significant responsibility.

brucethemoose@lemmy.world · 13 hours ago

They do when it’s in their business interest, though. They absolutely spend effort to skew opinions.

brucethemoose@lemmy.world · edit-2 15 hours ago

The propaganda is heavy controlled via a few companies. Google and Meta, primarily.

I’d argue it’s worse, as now it’s intensely personalized, so everyone gets their own little shard of reality instead of one “averaged” version. That, and even with slants, the old networks did follow journalistic standards to some extent, while modern influencers are under no such obligation.

brucethemoose@lemmy.world · 16 hours ago

Not sure what your long form video was about, but if it’s a specialized topic, it’s probably more rewarding to find where the community hangs out and bring it up there.

brucethemoose@lemmy.world · 17 hours ago

Zuckerberg FOMO, at his finest.

Even among tech bros, it really is amazing how insecure he is.

brucethemoose@lemmy.world · edit-2 20 hours ago

In response to:

Because lemmy.ml federates with the broader Lemmyverse. So if instances like .world is blocked, that same content is visible on .ml so it’s blocked too. Defederate from us and we might have a chance.

lemmy.ml was not blocked because of lemmy.world. Or other federated instances, as far as I can tell through testing a few.

brucethemoose@lemmy.world · 21 hours ago

Because that’s what industries are pushing for?

For home use, tons of people don’t even own laptops anymore. Microsoft is working on Windows thin clients (albiet under a different name) for business. Nvidia and others are pushing subscription gaming, as they constrict consumer hardware.

brucethemoose@lemmy.world · 1 day ago

This future, unfortunately, appears to be rented thin clients and smartphones/tablets, for most people.

So… technically, yes, the future is ARM? As those things will often use that.

brucethemoose@lemmy.world · 1 day ago

Less if you buy used.

Unfortunately, Valve can’t stick an eBay AMD 6800 + a DDR4 CPU in there; they gotta take the hit and buy everything new.

brucethemoose@lemmy.world · 1 day ago

Check yourself. Go to any IP tester or http resolver, or try your own VPN.

brucethemoose@lemmy.world · edit-2 1 day ago

As of this post, Lemmy.world is not blocked in China. Neither is piefed.social.

But lemmy.ml is blocked in China.

brucethemoose@lemmy.world · 2 days ago

Don’t really see how that matters.

In other words, it’s… unfortunate, but necessary?

brucethemoose@lemmy.world · edit-2 2 days ago

Every time I see such memes, I remind myself that lemmy.ml is blocked in:

China (including Hong Kong)
Iran
North Korea

It is currently unblocked in:

The United States
Israel (and Palestine)
Ukraine
EU
Pretty much the rest of the world
Russia

Though I worry about Russia, as they’ve already blocked Bluesky and at least one part of the fediverse: https://lemmy.world/post/14156170

(And FYI I checked these regions myself, with a VPN, and then when another external tool, just now).

brucethemoose@lemmy.world · edit-2 2 years ago

Guide to Self Hosting LLMs Faster/Better than Ollama