• 2 Posts
  • 505 Comments
Joined 2 years ago
cake
Cake day: October 23rd, 2024

help-circle









  • Very difficult project to track data for. Major difficulty is that capex budgets are like share buyback budgets. Implementation speed subject to change. It’s hard to measure revenue/profitability because AI credits tend to be bonus perks for enterprise customers, and in Meta’s case they use it for better brainwashing/ad revenue.

    The best way to measure Bubbliciousness is trends in spot GPU rentals. Nvidia has been selling more datacenter GPUs than datacenters completion for over 1 year. Those GPUs filter down to tier 2 providers because they cost $1.30 to $2.70 per hour to just sit in a warehouse. Anthropic’s deal to replace grok at Colossus 1 has significantly reduced tier 2 provider usage, and those spot rates have collapsed below their runcosts. https://lemmy.ca/comment/23443450



  • It’s much more likely to be like internet fiber. Some was needed/used.

    datacenters will always leverage scale, and AI is only economic at 16+ concurrent users. delivers 3x the tokens/s of a single user. Current rental rates for H200s are below their runcosts. Capacity is already too high in US. Innovations for smaller, faster, cheaper models are providing significant value for less hardware. Gemini flash 3.5 is very small and fast, at much lower cost as top 2 US labs. Deepseek v4 has massive cost reductions that will filter down to rest on industry, especially for context compression which is what allows more users on a single GPU cluster. Qwen 3.6 does bring size down enough to run 3-4 month old state of the art models on consumer hardware, but again multi user service at (pro instead of industrial) 96gb ram.

    MTP and Turboquant are other technologies that increase tps delivery at less ram. Software stacks making better use of GPUs is eating token demand growth by itself even as exaggerated capacity comes online at slower pace than hardware investment values justified.