Stuff like Cerebras API, anyone hosting Deepseek v4, self hosting Qwen 27B or RAG models or whatever all use less energy than your computer will use as you read this comment.
This is absolutely just not the case. A couple of minutes of an idling end user device does not use as many joules as a few seconds of a self hosted model. There are other tasks that will be as intensive, but reading static text in a browser won’t do it. That’s not to say it’s an unforgiveable waste of resources on a personal level or anything, just that your comparison is a bit busted.
The hosted models in pursuit of going faster take disproportionately more energy, analagous to how a redline engine burns way more fuel than a modest operating engine.
This is absolutely just not the case. A couple of minutes of an idling end user device does not use as many joules as a few seconds of a self hosted model. There are other tasks that will be as intensive, but reading static text in a browser won’t do it. That’s not to say it’s an unforgiveable waste of resources on a personal level or anything, just that your comparison is a bit busted.
The hosted models in pursuit of going faster take disproportionately more energy, analagous to how a redline engine burns way more fuel than a modest operating engine.