How to stop spending money on US llms.

Kkk2237pl@szmer.info · 1 month ago

How to stop spending money on US llms.

[object Object]@lemmy.ca · 1 month ago

So self hosting is still not great.

The big problem is you can get large memory but slow prompt processing, which reduces your context window, or you can get semi-fast GPU with low memory, where you’re capped on models.

Sometimes I run pi agent in a container with Gemma 4 or Qwen 3.6, but even on strix halo after 60k tokens the quadratic slowdown is brutal.

We aren’t there yet for complex agaentic workflows locally, and it’s primarily a hardware issue.

Though innovations in performance are being shipped regularly, they’re incremental.