When Your CFO Doesn't Get Your Token Burn

It’s Boston Tech Week! I’ve been heads down on a few projects and walked into a session at Venture Lane mostly because I already work out of the building — half expecting startup pitch, half expecting slides I’d already seen.

I was so wrong! The first event I attended was amazing: Priced to Scale — a guided conversation on usage-based and AI pricing, hosted by Limitr — was the real thing: operators, whiteboard, the messiness of who owns pricing when engineering, product, finance, and sales all touch it. It put labels on a question I keep hearing from fractional clients and founders in the room: how the hell do we budget for AI?

I’m not saying your CFO is anti-AI. I mean you can’t put a reliable number on the spend yet — and when they push back, they’re doing their job.

If you can’t meter it, you’ll reach for something simpler. This isn’t only an enterprise SaaS problem — startups are feeling the pain now. Even companies that don’t call themselves SaaS are waking up to internal AI burn they didn’t plan for.

One solution a lot of SaaS teams are using is a credit model — you price tokens for the customer in bundles the founder can sell without explaining inference cost on every invoice. HubSpot, Whimsical, and Beehiiv are among the companies threading AI in that direction — the details differ, but the pattern is the same: offer AI features without a huge subscription step-change for the customer.

The hard part is converting token and credit spend into blocks of work you can actually price. Founders get it; their CFOs often don’t — not because they’re anti-AI, but because the unit economics still don’t close, and credits add their own accounting problem on top of the token bill. Credits repackage spend; they don’t make it go away. That pain is showing up across startups, internal AI use, and teams that never called themselves SaaS. At scale, the CFO isn’t asking whether AI is useful — they want to know where the tokens went and what you charged the customer for.

It’s not just our bubble. A May 2026 Fortune report has Microsoft pulling back on internal Claude Code licenses, Uber burning its 2026 AI coding budget in four months, and companies pushing people to “tokenmaxx” while the total bill still climbs — especially as agents eat more per task.

I keep reaching for two imperfect analogies. Gym membership: everybody signs up; you only make money on the people who don’t show up. Healthcare: nobody can tell you what a unit actually costs, so you can’t meter, budget, or plan overages honestly. The point: when buyers can’t see the real cost of a resource, they have a hard time trusting the price — credits included.

Alexa Gjonca on the panel said:

Your consumer of an AI budget is not always going to be a human. Humans have a very different comfort level with complexity, but AI agents are much happier with higher complexity data signals. And so they will consume your budget at a different rate.

Two questions decide how fast you burn: Who is consuming — people or agents? How is the feature built — open or closed?

Open vs closed AI

I had to split this in two because I kept mixing up where I use AI with what I ship.

Where you run it: When I’m in Claude Code, Codex, or the big general models, I’m in open-ended problem solving. Token burn is brutal to forecast — the thread runs long, the context shifts, and a “quick question” turns into a spiral.

When I’m in HubSpot, a CRM workflow, or an MCP built for one job (Whimsical is a good example), the work stays narrower. I’m trying to get one or two things done. Usage stays in a box.

What you ship: Same words, different layer — and this is the conversation I think founders should be having with their teams.

Open in your product means the user (or an agent) can wander. Prompts fan out. They start at X and end up at Y or Z.

Closed means you hand the model a few known inputs and ask for one job — rewrite this paragraph, summarize this ticket — without letting the conversation leave the guardrails.

Label the features. Not as a vocabulary exercise. As a budget exercise. Where could something closed on paper still tip open in production? That’s where your forecast dies — and a single credit pool won’t save you if the feature keeps wandering.

That’s the same regret pattern as handing a junior developer unrestricted AWS — they spin up fifteen extra-large instances and the burn rate blows past the forecast. AI token consumption is no different than any other SaaS consumption. You need to break out the use case and know the blast radius when something tips open.

When the CFO says it doesn’t make sense

CFOs like budgets and forecasts — for good reason. If a CFO says this doesn’t make any sense because there’s not enough data, they’re exactly right. The team needs to get better at understanding the risk for AI consumption.

They need to think in terms of their own product — or the products they bring to market — as open versus closed. The current knowledge, the current best practices are just not set for this yet.

Open versus closed is already hard on your own roadmap. It gets a lot messier when you’re a SaaS founder selling to enterprise — big seat counts, departments, and a contract that has to say who pays when a closed feature tips open.

That’s the layer where Limitr — who hosted the session — makes sense as infrastructure: help you cost-share token use back to the customer, or build AI burn into the contract up front instead of eating it in one founder-sized credit pool. You don’t have to use them — but that’s when pass-through metering stops being theory and shows up in the deal.

What to do with this

You can’t explain token burn to your CFO with one lump-sum line on the P&L. You need a map — value prop by value prop — so you know what’s open, what’s closed, and what you’re willing to pay for.

Break it down. Split your product offering and the parts of your tech stack that touch AI into single value props — one line each: what the customer gets, not what the engineer built.
Label the risk. For each value prop, ask: open or closed AI consumption? Tie that to the customer need it serves.
Assign a budget. Give each value prop its own cap or credit pool — not one founder-sized bucket for the whole product.
Track each one. Report burn at that granularity so a spike shows up on a named feature, not “AI went up this month.”

Bottom line: when a low-value feature starts burning tokens, you want reporting that tells you where to cut, where to add guardrails, and where you’re actually willing to spend more — not a scramble in front of the CFO with no lever to pull.

Tech Stack Clarity Check (15 min) — Book a slot if you want a second pair of eyes on where AI features touch your product and how you’re budgeting token risk.