Tokenmaxxing was right about the destination and wrong about the direction. The premise to get maximum value from every AI interaction was sound. Getting maximum value per interaction is exactly what you want.
But organizations heard “maximum” and thought more: more queries, more generation, more output.
That’s how tokenmaxxing became a race to increase interaction count. Because when you need to show progress, you measure what’s measurable. And what’s measurable, it turns out, is output. So that’s what got measured, rewarded, and gamed.
What the leaderboards actually measured
To measure progress, Uber ranked its engineers by token usage. More AI queries meant a higher score, so engineers ran more queries. By April, the company had burned through its entire 2026 Claude Code and Cursor budget in four months. CTO Praveen Neppalli Naga said he was “back to the drawing board” because the budget he’d forecast was “blown away already.” President and COO Andrew Macdonald was more measured but no more reassuring: the link between token consumption and shipped consumer features, he said, is “not there yet.”
Amazon ran the same experiment under a different name. KiroRank scored engineers on AI usage. The score went up when you ran more tasks, whether those tasks produced anything useful or not. Amazon SVP Dave Treadwell eventually stepped in: “Please don’t use AI just for the sake of using AI.” The leaderboard was scrapped
Two companies, two names, the same mistake. But the mistake wasn’t the leaderboard — that was just the symptom. The mistake was the implicit theory underneath it: that AI value scales with output volume. That generating more is, by definition, getting more. Neither company asked what engineers were putting in before they hit generate. Neither measured the quality of the question, the sharpness of the framing, nor the clarity of the context. They measured what came out and ignored everything that determined whether what came out was any good.
Generating more is not the same as getting more
If this problem hit Uber and Amazon, it’s safe to assume it’s everywhere. And yet over 70% of organizations say their AI investment is paying off. Fewer than 1% can show returns above 20%. Enterprise AI spending averaged over $85,000 per month in 2025, up 36% year on year, and only half of organizations can say with confidence whether any of it is working. Around 30% of generative AI projects collapse after the proof-of-concept stage because costs keep climbing, and no one can explain what the investment has bought.
McKinsey’s 2025 State of AI survey of nearly 2,000 organizations found 88% are now using AI in at least one business function, but only 39% report any EBIT impact at the enterprise level. BCG’s parallel study of 1,250+ firms found 60% reporting minimal gains despite substantial investment. Meanwhile, Gartner’s 2024 data puts the project abandonment rate at 30% after proof of concept. Regardless, AI spending keeps surging.
So what’s actually going wrong? Most organizations skipped a step. They deployed the tools, tracked usage, and assumed value would follow. What they didn’t ask was what their people were feeding the model: how well-defined the problems were, how rich the context was, how much judgment went into the prompt before anything was generated. Output volume filled that gap because it is easy to count.
But AI doesn’t fix fuzzy thinking. It scales it. A poorly framed prompt generates a plausible-sounding answer that wastes more time than no answer at all. A vague brief produces a first draft that has to be rebuilt from scratch. The speed of generation creates an illusion of progress that the actual work must then undo.
The Pragmatic Engineer’s 2026 survey of over 900 engineers puts a face on this. Junior engineers, according to director-level respondents, consistently fall into the top-spender category for AI tokens within their organizations. They’re not running more tasks. They spend more tokens per task because they lack the experience to frame a problem precisely or judge the output quickly. They prompt, check, re-prompt, and check again.
A senior engineer gets a usable answer in two exchanges. A junior engineer gets a plausible one in eight. The token count goes up, the output not so much.
A senior engineer gets a usable answer in two exchanges. A junior engineer gets a plausible one in eight. The token count goes up, the output not so much.
Highest-ROI enterprises are investing in capability
Amazon’s fix was to replace KiroRank with normalized deployments: measuring AI-generated code that actually gets merged and shipped. That moved the signal from activity to output. An improvement, but it still sits on the output side of the equation.
The organizations seeing the highest returns figured out something harder. The value lives in what goes in before anything is produced: the precision of the problem statement, the quality of the context, the human judgment that shapes the generation before it starts. Those things only develop through practice, structured habits of thought, and building internal capability that lets people use AI tools well, not just use them often.
Gartner and CIO survey data from late 2025 confirms the pattern: the highest-ROI enterprises are investing in training, not headcount cuts. There’s no measurable link between AI-driven layoffs and returns. The gains come from improving existing people, specifically in the upstream work that determines what AI is asked to do and how well it’s asked to do it.
That’s what structured AI transformation actually is, when it works: not a rollout, not a tooling decision, but a sustained investment in input quality. The organizations that get this will have something to show the board in two years. Those still optimizing for token volume will have spent a great deal of money learning what Uber learned in four months.