I want to talk about what I see : Economics in AI age will be drastically different than that of internet age, and that’s boiled down to cost structure of compute.
The current AI company business model reminds me of a business disaster — MoviePass. It launched a subscription platform that let users go to cinemas across the U.S. and watch unlimited movies during their subscription period. It’s a promising idea, until it turns into to a disaster.
Unit Economics
The reason was simple: the power users, the people who loved and used the service the most — also cost the company the most. In other words, the more people loved the product, the faster money the company bleed money. That’s a business built upside down.
Most AI companies are on the same path, they are using the proven working subscription funnel business model in the Internet age : to attract as many users as possible first, and figure out cost efficiency later. This doesn’t seems working in age of AI.
Cursor is a great example. It’s the golden boy of vibe coding culture, blessed with user love, there is only one problem : users who rely on cursor most — ones who use it all day — also cost the company the most to serve. And when Cursor tried to restrict usage ( so it doesn’t die), it faced backlash from loyal users who felt betrayed.
This foretells a fundamental shift towards how business model change in AI age, and to fully grasp it, one must understand the oldest business concept — unit economics — aka do you make or lose money per unit of usage? In AI, that unit is an inference —each time the model generates an answer.
Fundamental difference between Internet and AI
Unit economics didn’t come up much during the internet age, because scaling software had low marginal cost. As Kevin Kelly puts it : “The Internet is a copy machine. At its most foundational level, it copies every action, every character, every thought we make while we ride upon it.”
AI, on the other hand, it’s underlying technology, Deep Neural Network, is all about matrix multiplications. Its costs comes from two parts. Cost of training a model, and cost of inference. Training a model is expensive, but it’s a one off payment (for now) and cost is predictable, not a big issue. Inference cost is an issue, as in business terms —- it means the cost rises proportionately with usage.
Now — how to solve this?
Quantization, Architectural knowledge, business model.
Quantization rocks.
With Quantization you could lower cost of inference cost. Here is the math. Now, would quantization lower precision? Yes. It seems clear now that dominating AI use case would be : specialized AI and multi-agents framework. Which means
- importance how well you coordinate and organize models around a goal rather than maximizing brilliance of the model.
- Cost explodes if you’re planning to use full precision model to do multi agent work rather than quantized models
- Via fine tuning model on specific domain data, you could have a quantized model perform on par, even better than a much larger full precision model.
Of all these choices —- the best combination would be binary neural network. In a Binary neural network, weights and activations are constrained to 0 and 1s only. BNN seems to be most scalable choice as :
- Lower inference cost ; BNNs lowers memory footprint by up to 32x. Memory footprint is the amount of data a model or computation needed to store data and move it around. This directly addresses the number 1 cost of AI —— memory movement, rather than computation itself, uses more energy, thus money in AI inference.
- BNN is built for edge usages . Edge means offloading computing from cloud to local device, for example the recent M5 chip Apple released is designed for exact that — inference locally on device, offline.
as mentioned, inference cost is most important; you can’t scale AI as a service if the more people love your work the more money you bleed. And BNN attack the heart of this problem.
Perhaps the only downside is that BNN, tho cheaper at inferences, is more expensive to train, which means a higher CapEX . But this isn’t an issue — a cost you can predict is hardly a problem as ones you can’t in business.
Computer Architecture choice.
Why is Sam Altman obsessed with compute recently ? It’s because OpenAI is bleeding money —— users cost more than they pay, and compute is the make or die challenge to overcome. In other words, for OpenAI to be sustainable — it must drastically lower the cost of inference. And that means custom hardware–software co-design.
Google, on the other hand, co-designs Gemini with their custom TPU v5. This integration cuts the cost per inference by an order of magnitude.
Seemingly, Apple is preparing to do the same. Reportedly, Gemini’s inference cost is much lower than that of ChatGPT. But why? Because specialized chips can increase energy efficiency by 5 to 10 times — and energy cost is the biggest variable in inference, and ultimately, in total cost.
As mentioned before bottleneck comes from memory, not compute. Co-designing custom architectures exploits the principle of locality — keeping data close to the processor, moving less of it, and shortening the distance of data movement.
Business Model
In the future, AI companies will buy compute from specialized compute providers — much like how businesses buy large quantities from factories. When you buy in bulk, you get a lower unit price. Factories prefer large, predictable orders because they stabilize cash flow and reduce uncertainty, and in return, they offer discounts.
AI will likely work the same way, companies will negotiate long-term compute contracts at wholesale rates, in other words, eonomies of scale will emerge not only in model training but in compute purchasing itself.
On the user side, charging by usage makes far more sense than the current subscription model. It works like a coffee shop: each cup has a price. Users pay per inference, per generation, per unit of value. This aligns cost with consumption. The business stays profitable as usage scales, rather than bleeding faster as users love the product more.
In the end, the future of AI economics may look less like the internet and more like manufacturing — stable supply chains underneath, predictable costs, and pricing models grounded in outcomes.