The Pulse: token spend breaks budgets – what next?

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of three topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Last week, we covered the slightly perverse trend of “tokenmaxxing” across the industry, where devs run agents with the sole aim of boosting their personal “token stats” in an effort to rank higher on internal token leaderboards, and not be seen as a Luddite who doesn’t use AI tools enough compared to peers.

This week, I spoke with a software engineer at a large company and another at a seed-stage place. Both shared almost identical stories: at their latest all-hands, company leadership expressed concerns about the fast-rising costs of tokens. At both places, token spend has increased by ~10x in the last six months – with no signs of slowing down.

I wanted to find out about this trend, so I talked to devs at 15 businesses. Below is what I learned about what’s happening in workplaces of all sizes. Names are anonymized.

Large companies

Setting the default model to a cheaper one: 10,000+ person SaaS company, offices on all continents

Inside a large SaaS company, most devs use an internal background coding tool for coding. This model defaults to Claude Sonnet, which is the cheaper Claude version. Model selection is not persisted, so devs who prefer working with Opus, for instance, must reselect it on every subsequent startup.

This tool supports all major frontier models such as Sonnet, Opus, GPT, and Gemini. Devs at the company whom I talked to are very heavy users of the tool and have not encountered usage limitations.

Fintech company, US, Series D, ~8,000 people. Staff engineer:

“The cost in token spend is off the charts – and leadership has shared this trend with us. They have not said anything beyond showing growth in spend, and mentioning that this won’t be sustainable. So, nothing specific yet, but my sense is that something will have to change. Limits or prioritizing cheaper models, cutting back on hiring? Who knows.”

Infra company, US, publicly traded, ~5,000 people. Engineering Director:

“We’re monitoring but not restricting. We are spot checking the heaviest users, but we are seeing the business cases working out.

We are offering some guidance on model selection - e.g., turn off the new high-effort setting in Claude. Some users are trying open source models – but open source model usage is a bottom-up initiative, not a top-down one.”

Information technology, US, 10,000+ people. Director of Engineering:

“We have already had to raise our API budget limits multiple times in April. We recently switched to a much higher-effort level for Claude, which significantly increased the cost per PR.

One reason for the cost spike is using state-of-the-art models for demanding tasks. We are using that high-effort setting even for fairly trivial tasks that could have been handled by much cheaper models, or even by lower-effort Claude loops. Despite a few of us pointing this out, leadership has basically said budget is not the concern right now.

I sense that the budget increase has not been forecasted, and we’re in for a reckoning. I suspect the attitude changes once finance and other cost-conscious parts of the org realize we are spending hundreds of dollars per day, per highly-engaged developer. For now, fear of missing out and not wanting to fall behind seems to be outweighing cost discipline.”

Games studio, US+Europe, ~5,000 people. Senior developer:

“What budget increase? It’s very hard to get a budget for AI here! Claude Code is still not rolled out because $200/month/dev is seen as too high a cost. I talk with people at startups where $1,000/month in spending is totally normal, and it’s night and day here.”

Fintech company, US+Europe, late stage, ~5,000 people. Staff engineer:

“Some developers are now spending $500 a day (!!) on Claude Code. Practically speaking, this means that employee costs have doubled. Productivity has increased, in my view, but now the bottleneck is code reviews. AI can spit out code quite quickly, but we still have human reviews in place. Leadership encourages using AI for code review, but my team will not blindly trust AI.

The push from AI is coming from the top. This year’s performance review had a section on AI, rating devs by how well they used AI, so this is another reason everyone just uses it as much as they can.”

Mid-sized companies

SaaS industry, US, ~2,000 people. Dev Productivity Lead:

“Model routing helped keep our costs growing less dramatically. For example, changing the default model reduced cost by 30%. This is our strategy with AI spend, summarized:Short term: spend, spend, spend! Experiment and use whatever models make sense.Measure the impact. Measure key outcomes and report on spend, monthly.When spend vs results diverge: adjust. When our spend increases dramatically, but outcomes don’t follow: see what we can do to adjust the delta. More spend should mean better outcomes. If not, we are doing something wrong.”

Finance industry, US, ~2,000 people. VP of AI:

“We have Cursor and Claude Desktop, both of which have around 800-1,200 total users. Token usage is growing somewhat unexpectedly. Estimates are being adjusted on the fly; the initial plan to have strict limits (say, $100 per user) is breaking when reality hits, and people exhaust them in 3-5 working days.

Using expensive models is a problem. In regards to Cursor, many devs are defaulting to the most expensive models without realizing that going with Opus gives single percentage gains in intelligence compared to Sonnet, for example, while exhausting their budgets almost immediately.

We are working on blocking/managing out the most expensive models [with Cursor], as going into thousands of dollars per user, per month is not sustainable on our scale. Cursor is a good partner and we’re working with them to switch to a “pooled spend” model where heavy users can tap into a pool of extra spend.

Claude is a similar story. We were at $100 of Claude Desktop limit for everyone, but as we are moving forward, I can see that we would need to go much higher, especially for business-critical use cases.”

Infra company, US, late-stage, ~700 people. Founder:

“We haven’t had much of an issue. Most folks police themselves for runaway costs; for example, we had someone hit like $10K in a week because they messed up caching, but it was caught and they corrected their harness.

For the most part, we don’t see our high-end folks spending more than ~$1K/week. Now, to be clear, this is not a small amount! BUT it’s already a small subset of the population.

We’re just factoring it into engineering costs at this point: if it’s, say, $2K/month per employee, that’s $24K per year.

Who cares, then, when engineers already cost $200-400K/year in cash comp? Okay, so what if it’s $5K/month. That’s $60K/year.

Our bet is that token costs will stabilize and we’ll eventually end up with local-ish models.

Now, it could be five years before they stabilize, but overall, spend today isn’t that insane to me.

There’s a lot of people who are just dumb about it, but most legit execs push back on this. Take the Ralph loops or other insanity where someone spends $1K/day, $5K/week or stuff like this. That’s all just people being fools thinking they’re doing “R&D,” or somehow that they’re smarter than everyone else, but they’re just producing junk that never ships or is not useful.

We saw a bit of “stupid overspend” in the first couple months, but that’s all gone now. Costs could go up even more if we would “crack the whip” in wanting to see even more output, but we’re not doing that.”

Healthcare industry, US, ~500 people. Senior engineering manager:

We are not holding back on spend, and have a monthly spend leaderboard. And we WANT devs to spend more on tokens! For example, one of my engineers spent $1,400 on a long Claude Code session in a single day.

We are seeing massive leverage, and we do more with the same number of people. This is why we are okay with our spending spiking. Our traffic is growing more than 10x, year-on-year, and we have managed to keep things running with the same team, and these AI tools.

Engineering is now blocked on Product and Design – which never happened before! This is how fast execution has become. We now have Staff+ engineers writing Product PRDs so we can move faster.

I’ve been in tech for close to 15 years and I never saw dramatic change like this. I just came back after a 3-month break, and every single thing is different in my day! I feel these AI agents are the biggest change in the industry since high-level languages became widespread.”

E-commerce company, US & Europe, ~2,000 devs. Head of Engineering:

“The increase in spend is INSANE. It’s about usage going up, with no signs of stopping. Usage is off the charts.

We currently do not have limits in place, and are not pausing now. Our CEO is AI-pilled and won’t let us slow down.

We do buy tokens at a discount. They start from 5% and go up with usage with the vendors we use (the usual suspects.)

We don’t let devs use anything lower than Opus 4.7 for coding. Cheaper models might work better, but a slight error pushed to prod would result in hours of toil.”

Small companies

Series A, US, ~50 people. Principal Engineer:

“About 15 devs are heavy users of AI and costs are rising very fast. Almost everyone uses Claude and Claude Code. We are considering four potential options:Increase AI budget, and start measuring more. Continue doing what we are, but allow devs to use more tokens instead of hiring limits. The precise ROI is hard to quantify, but we’ll start to measure and track both AI adoption and impact.Optimize token consumption. Use cheaper models for simpler tasks, review token usage, and see where we can cut usage. Downside: this approach could become one with diminishing returns, fast.Integrate more AI providers in the company. Find wrappers to abstract LLMs. The problem is: how do you replace Claude Code, for instance?Pivot to local models: such as Kimi, Qwen, and so on. The problem is it’s a big investment in high-end hardware or cloud GPUs. Upside: it offers better long-term cost control, once done.

We are likely to go with option #1: increase spend BUT maintain momentum and put the right measurements in place. We can do #2, #3 and #4 later. But if we kill AI usage momentum inside the company, the outcome will probably be worse.”

AI infra, US, seed stage, ~15 people. Founder:

We saw a 15x increase in 6 months:Six months ago our spend per developer was ~$200/monthToday, it’s around $3,000/developer/month, for our seven devs
We’re not slowing usage, especially as we are building an AI infra product. The increase was much faster than expected, though.”

Small, bootstrapped company, Europe. Founding engineer:

“Our current strategy in dealing with the increase in costs is to switch to a cheaper model; unfortunately, from Opus to Sonnet in our case. That said, Sonnet is quite decent.”

How businesses manage token spend

Regardless of company size, there seems to be two strategies for how companies deal with increased spending. A summary:

Strategy #1: “let it rip and start measuring.” Around half of respondents say AI spend is rising dramatically, and they have decided to do nothing about it. They want devs to use AI as much as it makes sense to, and to help the work as much as possible.

However, because the cost is rising dramatically, these companies are now starting to measure usage and attempting to measure the impact of their AI tools.

There’s a few companies where the impact seems to be very positive, already. Smaller startups whose business is exploding in numbers of customers, load, and revenue, see that they don’t need to hire more staff because existing engineers can keep supporting the growth with AI tools.

Strategy #2: curb spending. Commonly mentioned cost-saving approaches:

  • Use cheaper models for simpler tasks
  • Set default models to less capable ones
  • Set a spending cap and make it hard for engineers to exceed it, or require consent for doing so

Most companies using strategy #1 have briefly considered going with this approach, but threw it away, because they see this approach as optimizing on the wrong thing: cutting costs before the productivity impact of using state-of-the-art tools is even known!

Discounts exist when the spend is in the millions of dollars. I asked several people if they are getting discounts from vendors when buying tokens at scale. There were no exact numbers, but this is what I gathered in aggregate about possible custom agreements:

  • Cursor: open to discounts above a few million dollars in spend. Companies have negotiated discounts with Cursor after crossing $1M of spending. Some companies negotiated tiered discounts from this level, starting at 5% and going higher as their spend goes up.
  • Anthropic: no discounts. I talked with companies spending $5M+ per year on Claude which have received no discounts. If Anthropic offers discounts, it will likely be at a much higher tier.
  • All discounts are custom, so try to negotiate – it’s free! Pricing discounts are on a per-customer basis, and highly custom. The easiest way to see if a discount is available is to ask the vendors!

—-

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

  1. Load from AI breaks GitHub – but why not other vendors? GitHub’s reliability is less than one nine, and getting worse. Prolific open source contributor, Mitchell Hashimoto, is quitting GitHub because he thinks it’s not suited for professional work. GitHub’s leadership blames the 3.5x increase in service load as the cause of degradation – or it might be self-inflicted.
  2. Anthropic’s speedrun to destroy trust. Anthropic could do no wrong until recently, but in the past month, that’s all changed. Silently nerfing Claude Code, banning companies from Claude, and baffling price rises all add to a sense that Anthropic is in its “extraction” era of generating more revenue for the same or worse service.
  3. Industry pulse. Dramatic price increases at GitHub Copilot, explosive growth at Codex, Google scrambling to build a good coding model, Cursor might be bought by SpaceX, AI agent deletes car business, and more.
  4. Mitchell Hashimoto & the “building block economy.Ghostty’s creator finds that open source “building blocks” are the best way to win massive adoption by software components – but it’s got harder to build a business on top of open building blocks.

Subscribe to my weekly newsletter to get articles like this in your inbox. It's a pretty good read - and the #1 software engineering newsletter on Substack.