Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from US companies. I want to recommend something… I think that will be cheaper to use it on own machines instead…

  • [object Object]@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    So self hosting is still not great.

    The big problem is you can get large memory but slow prompt processing, which reduces your context window, or you can get semi-fast GPU with low memory, where you’re capped on models.

    Sometimes I run pi agent in a container with Gemma 4 or Qwen 3.6, but even on strix halo after 60k tokens the quadratic slowdown is brutal.

    We aren’t there yet for complex agaentic workflows locally, and it’s primarily a hardware issue.

    Though innovations in performance are being shipped regularly, they’re incremental.