Most engineering leaders made the same bet in 2024. They licensed GitHub Copilot for the team, added Cursor for the power users, maybe rolled out Claude Code for a few senior engineers. The invoices added up fast. A mid-sized engineering organization with 100 developers can easily spend $400,000 to $600,000 per year across these tools before accounting for the API costs that accumulate quietly in the background.
The bet seemed obvious. The tools were impressive in demos. Every vendor had benchmarks showing dramatic productivity gains. And the competitive pressure to “enable developers with AI” made saying no feel reckless. So the tools went in, the credit cards got charged, and the organization moved on to the next priority.
Twelve months later, most of those organizations still cannot answer the most basic question their CFO will eventually ask: is this working?
The Benchmark Problem
The AI coding tool vendors are not shy about publishing productivity statistics. GitHub claims Copilot users are 55% faster at coding tasks. Cursor publishes testimonials from engineers who describe 10x output improvements. Anthropic’s data on Claude Code shows meaningful reductions in time-to-completion for well-defined tasks.
These numbers are real, in the sense that they come from controlled evaluations of specific tasks. But controlled evaluations are not engineering organizations. The gap between “this tool helped a developer complete an isolated coding challenge faster” and “this tool made our entire engineering organization more effective” is where most ROI analysis breaks down.
The industry research is more sobering. Jellyfish, which analyzes data from over 500 engineering organizations, puts the average cycle time improvement from AI coding tools at around 25%, with PR throughput gains of roughly 12%. Those are meaningful numbers for a well-run rollout. But Jellyfish also tracks adoption rates, and the data shows that AI-assisted PRs account for roughly half of all merged pull requests across their customer base, up from 14% just two years ago — which means roughly half of your developers’ output still has no AI involvement at all, despite the licenses sitting idle in the admin console.
McKinsey’s research on AI-enabled software engineering found that productivity gains are highly uneven across teams and functions, and that organizations with structured measurement programs capture three to four times more value from AI tools than those without. The tools don’t create value uniformly. Whether your organization captures that value depends almost entirely on whether anyone is paying attention to the data.
What “Paying Off” Actually Means
There is a version of this analysis that stops at cycle time. If your AI-assisted pull requests close 25% faster than non-AI PRs, and you can assign a dollar value to engineering time, you can construct a spreadsheet that shows a positive return. Many organizations do exactly this and call it done.
That math is not wrong, but it is incomplete in ways that matter. Three dimensions of ROI tend to get ignored.
Adoption is not uniform. Aggregate adoption rates hide the distribution underneath. In most engineering organizations, AI coding tool adoption follows a familiar pattern: a small cohort of power users who have integrated AI deeply into their workflow, a larger group of casual users who pull the tool out occasionally, a segment who have never meaningfully engaged, and new adopters still learning. These cohorts have entirely different productivity profiles. A 50% adoption rate that is all casual usage delivers a fraction of the value compared to a 50% rate built on genuine depth. The aggregate metric obscures everything interesting.
Tool spending is not consolidated. The average engineering organization is paying for multiple AI coding tools simultaneously. The same developers who have GitHub Copilot licensed are also using Cursor, and some have Claude Code running in their IDE. The vendor consoles report usage for their own tool only. No single view shows you cost per PR across all providers, which tool is delivering the best return per dollar, or where licenses are sitting unused. Without that cross-vendor view, optimization is impossible.
Not all PRs are equal. AI coding tools deliver more value on some work than others. Boilerplate generation, documentation, test writing, and well-scoped feature additions tend to see strong AI contribution. Architecture decisions, complex debugging, and novel problem-solving tend to see less. If your metric is simply “AI code ratio” — the percentage of merged lines that originated from an AI tool — you may be measuring the wrong thing, or at least measuring it in a way that tells you nothing about whether the AI contribution was on the work that matters most.
What Measurement Actually Requires
Getting a real answer to the ROI question requires connecting three data sources that almost no organization has unified.
The first is GitHub data: PR volume, cycle time, AI commit detection, code contribution patterns by developer and team. This is where the before-and-after comparison lives. AI-assisted PRs versus non-AI PRs, by team, by developer cohort, by time period. Without this, you are estimating.
The second is provider cost data: per-user spend, token consumption, acceptance rates, and usage patterns by tool. This requires pulling from the admin APIs of each vendor — Anthropic, GitHub, Cursor, Windsurf, OpenAI — and normalizing the data into a single cost view. The math is not complicated, but the data integration work is non-trivial, and almost no engineering organization has done it.
The third is the developer adoption dimension: who is in which cohort, which teams are getting deep value versus surface-level usage, and where the gaps are. This is where the improvement roadmap lives. If your power user cohort is 8% of your developers and your casual cohort is 42%, you have a very different problem than if those numbers are reversed.
When these three data sources are unified, the analysis becomes tractable. Cost per PR by provider. Cycle time delta for AI-assisted versus non-AI work. Developer cohort distribution by team. Which providers are getting the most usage per dollar. Where idle licenses should be reassigned. These are the questions the CFO is eventually going to ask. The organizations that can answer them will have a very different conversation than those who cannot.
How Coding IQ Approaches This
This is the problem Olakai’s Coding IQ was built to solve. Rather than requiring engineering teams to build custom data pipelines or rely on fragmented vendor consoles, Coding IQ connects directly to your GitHub organization and your AI coding tool admin APIs — Anthropic, GitHub Copilot, Cursor, Windsurf, OpenAI — and pulls the data together automatically.
The result is a unified view: cycle time comparison between AI-assisted and non-AI PRs, provider cost breakdown, developer adoption cohorts (Power, Casual, New, Idle), team-level benchmarks, and a cost-per-PR metric by provider. Questions that previously required a data engineering project — “which coding tool gives us the best ROI?”, “which teams have the lowest AI adoption?”, “what is our AI code ratio trending toward?” — become answerable in seconds.
Coding IQ also surfaces what the vendor dashboards cannot. Shadow AI in engineering is real: developers using personal API keys, unauthorized tools, or AI assistants outside sanctioned tools. A developer who builds on Claude’s API with a personal account doesn’t show up in your GitHub Copilot analytics. Coding IQ detects AI contribution patterns from the code itself — not just from vendor data — so the picture is complete rather than bounded by what each vendor chooses to report.
For organizations already using a dedicated engineering intelligence platform, the question worth asking is whether that platform can show you governance, shadow AI exposure, and the full cross-vendor cost picture alongside your engineering metrics. For most, the answer is no. Coding IQ was built to provide that layer.
The Question Worth Asking Now
Engineering organizations are entering a moment where AI coding tool budgets are large enough to require accountability. The days of “it feels productive” as sufficient justification are ending. CFOs are starting to ask for the data. Boards are asking whether AI investments across the organization are generating returns.
The organizations that will be able to answer those questions are the ones that started measuring before the question was forced on them. Not because the tools are failing — many of them are genuinely delivering value — but because value without measurement is invisible. And invisible value does not survive budget season.
If you are spending $400,000 per year on AI coding tools and cannot answer what your cost per PR is, which teams are in which adoption cohort, or whether your investment would be better concentrated in one tool over another, the issue is not the tools. The issue is measurement.
You have the data. You are probably just not looking at it yet.
Schedule a demo to see how Coding IQ gives engineering leaders the full picture on AI coding tool ROI.