• Dran@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    7 hours ago

    You can’t just write off capital expenditure though. The hardware, even for “effecient” MOE inference is still very expensive to buy, house, run, and cool. Even assuming open-weight model serving at $0 r&d for the models themselves, mixing high-prefill workloads doesn’t batch well with decode heavy concurrency (or other prefill-heavy jobs). The moment you do anything nontrivial you start running into very complicated architectural problems to efficiently solve at scale.

    Hardware that is useful for 5-10 years at most, plus development and support for the inference workflows, doesn’t leave a lot of margin on the table.

    My gut, along with basically everything I read, suggests that not most (even pure inference) shops are not profitable and are still floating on loans or vc money.

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      At 10 years lifetime, it’s sounding like the hardware costs as much to buy as it does to run - not factoring in time value of money…

    • Joe@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      If you assume they are unprofitable, the Q only becomes whether they are more or less unprofitable by serving the older models for longer.