Why are AI companies shipping desktop apps instead of staying browser-based?

AI workloads need capabilities browsers cannot provide: filesystem access for reading codebases, long-running background processes for agents, screen and audio capture for meeting copilots, and sub-10ms latency for voice and autocomplete. OpenAI, Anthropic, and Perplexity all shipped native desktop apps in 2024 for these reasons.

What is the Model Context Protocol and why does it matter for enterprise AI?

The Model Context Protocol (MCP) is an open standard released by Anthropic in November 2024 that lets AI assistants connect to local data sources on a user's machine. It grew from 2 million to 97 million monthly SDK downloads in 16 months because browsers cannot give AI the filesystem access that agents need.

How does AI moving to the desktop affect enterprise measurement and governance?

Browser-based AI left a centralized trail your SaaS admin consoles could capture. Desktop AI apps and local inference bypass those centralized chokepoints, which means network logs and SSO reports no longer see the full picture. Enterprises need a visibility layer that works across browser, desktop, and on-device environments.

Can you still measure AI usage when it happens on local machines?

Yes, but only with a measurement approach designed for distributed AI. That means instrumenting at the application layer rather than the network layer, aggregating events from desktop, browser, API, and local inference into a unified view, and focusing on business outcomes rather than raw API call counts.

What is tokenmaxxing?

Tokenmaxxing is the practice of maximizing employee AI token consumption and using it as a proxy for AI adoption or productivity. Companies like Meta built internal leaderboards ranking employees by tokens processed. The term combines AI tokens with Gen Z slang for extreme optimization.

Why did Meta shut down Claudeonomics?

Meta shut down its internal tokenmaxxing dashboard, nicknamed Claudeonomics after Anthropic's Claude model, in April 2026 after news of the leaderboard leaked to the press. The dashboard ranked employees by tokens processed and generated, with rewards for top consumers.

Is token usage a good measure of AI productivity?

No. Token usage is an input metric, not an output metric. Even Reid Hoffman, who publicly defended the practice, conceded it is not a perfect measure of productivity. It tells you who is trying AI but not whether the work produced was useful, correct, or tied to a business outcome.

How should enterprises measure AI ROI instead?

Measure the business outcome the AI was deployed to change: cycle time for coding assistants, first-contact resolution for support copilots, win rate for sales tools. Tie AI usage to KPIs the executive team already tracks. Olakai's SEE, MEASURE, DECIDE, ACT framework walks through this end to end.

What is the governance risk of tokenmaxxing leaderboards?

Leaderboards that reward token consumption create an incentive to bypass governance controls. Employees who find sanctioned tools too slow or limited will reach for unsanctioned ones, accelerating shadow AI growth. Measuring AI ROI without measuring AI risk misses half the cost.

What is the ROI of AI in finance?

According to SVB's 2026 State of the VC-Backed CFO report, 42% of companies report measurable ROI from AI in their finance function, making it the second-highest ROI function behind Product and Engineering at 73%. AI is cutting the monthly financial close by 7.5 days and accelerating FP&A forecasts by 30-40%.

How are CFOs using AI in 2026?

McKinsey found that 44% of CFOs now use generative AI across five or more finance use cases, up from 7% the prior year. Top applications include cost analytics, accounts payable processing, FP&A forecasting, variance analysis, and fraud detection. Median AI spending is expected to double to $50K in 2026.

Can AI reliably handle financial calculations?

Early large language models struggled with basic math, but reasoning models, code execution capabilities, and structured outputs have closed this gap. An MIT/Stanford study found AI now cuts the monthly financial close by 7.5 days across 79 companies, demonstrating reliable performance in real financial operations.

How do you measure AI ROI in finance?

Start by auditing which AI tools your finance team is already using. Then pick one high-volume process like the monthly close or AP processing, establish a baseline, and track improvements. Only 51% of companies with AI budgets can currently demonstrate measurable ROI, so building measurement infrastructure early gives you a competitive advantage.

What percentage of companies see ROI from AI spending?

SVB's 2026 survey of 230 finance leaders at VC-backed companies found that 51% report measurable ROI from AI spending. The breakdown by function shows Product/Engineering at 73%, Finance at 42%, Marketing at 41%, Customer Support at 41%, Sales at 34%, and Legal/Compliance at 27%.

How do you measure AI ROI in the enterprise?

Measuring enterprise AI ROI requires connecting three layers: tool adoption data (who uses what), productivity impact (workflow changes), and business outcomes (revenue, margin, retention). Most tools only measure the first layer. Forrester found fewer than 1 in 3 AI decision-makers can tie AI value to P&L changes.

What does Microsoft Copilot Analytics measure?

Microsoft Copilot Analytics tracks M365 Copilot usage including prompts submitted, documents generated, meetings summarized, and emails drafted. It provides a 28-day aggregated view but does not correlate usage to business outcomes or per-user ROI. It measures activity, not impact.

What is GitHub Copilot's code acceptance rate?

GitHub Copilot's code suggestion acceptance rate averages 27-30%, meaning developers reject roughly 70% of suggestions. While Copilot reports saving developers about 3.6 hours per week, acceptance rate is a product metric, not a business metric — it doesn't measure whether accepted code shipped faster or reduced bugs.

Can you build an AI ROI dashboard in Tableau or Power BI?

Yes, but enterprise-grade AI dashboards take 3-6 months to build and cost $510K to $1.2M in the first year. They lack standardized AI usage schemas, have no external benchmarks, and break when vendor APIs change. Most become unsustainable when the analysts who built them move on.

What is the difference between AI usage tracking and AI ROI measurement?

AI usage tracking tells you who is using which tools and how often. AI ROI measurement connects that usage data to workflow productivity changes and ultimately to business outcomes like revenue, cost savings, or retention. Most enterprise tools only track usage and call it ROI.

How much do companies spend on AI coding tools?

A mid-sized engineering organization with 100 developers typically spends $400,000 to $600,000 per year across AI coding tools like GitHub Copilot, Cursor, and Claude Code, before accounting for additional API costs that accumulate in the background.

How do you measure AI coding tool ROI?

Measuring AI coding tool ROI requires unifying three data sources: GitHub data (PR volume, cycle time, AI commit detection), provider cost data (per-user spend, token consumption, acceptance rates across all vendors), and developer adoption cohorts (power users, casual users, idle licenses by team).

What is the average productivity gain from AI coding tools?

Jellyfish data from 500+ engineering organizations shows an average 25% cycle time improvement and 12% PR throughput gain from AI coding tools. However, AI-assisted PRs account for only about half of merged pull requests, meaning many developer licenses go underutilized.

What is shadow AI in software engineering?

Shadow AI in engineering refers to developers using personal API keys, unauthorized AI tools, or AI assistants outside sanctioned platforms. These contributions don't appear in vendor dashboards, creating blind spots in usage tracking, cost analysis, and governance compliance.

What is Olakai Coding IQ?

Coding IQ is Olakai's platform for measuring AI coding tool ROI. It connects to GitHub and AI tool admin APIs to provide unified views of cycle time impact, provider cost breakdowns, developer adoption cohorts, and shadow AI detection across all coding tools in one dashboard.

How do you measure AI coding tool ROI?

Measure AI coding tool ROI through cycle time delta (AI-assisted vs non-AI pull requests), incident rate on AI-authored code, cost per pull request by provider, and deployment frequency changes. BCG found 60% of companies lack defined financial KPIs for AI initiatives.

What percentage of code is now AI-generated?

Almost half of companies now have at least 50% AI-generated code, up from 20% at the start of 2025. Ninety percent of engineering teams use AI coding tools, and 4% of all public GitHub commits are generated by Claude Code alone, with projections reaching 20% by end of 2026.

Is AI-generated code less secure than human-written code?

Research cited by CIO.com found that developers using AI coding assistants wrote less secure code in 80% of tasks, yet were 3.5 times more likely to believe their code was secure. This confidence gap makes post-deployment quality tracking essential for AI-assisted development.

What is shadow AI in software development?

Shadow AI in development occurs when developers use personal accounts for AI coding tools like Cursor or Claude Code outside IT governance. The code generated becomes part of your product permanently, creating IP, licensing, and data handling risks that are harder to remediate than other forms of shadow AI.

What are AI developer adoption cohorts?

Developer adoption cohorts segment engineers by AI usage intensity: Power users (over 70% AI-assisted PRs), Casual users (20-70%), New adopters (first AI PR within 14 days), and Idle (under 20%). Cohort analysis reveals which developers get genuine value and which licenses go unused.

Why do AI pilots fail to scale to production?

Most organizations treat scaling as a deployment problem rather than a transformation problem. S&P Global found 46% of pilots are scrapped before production. Even those that scale often fail to redesign workflows, so the AI runs at production capacity with pilot-level business impact. Workflow redesign is 2.8 times more common among AI high performers.

How do you build a business case for scaling AI?

Present four elements to the CFO: operational cost structure at production scale, a counterfactual showing what the work costs without AI, scaling math with sensitivity analysis for different adoption scenarios, and 90-day stage gates with defined KPIs and decision points. Business cases with counterfactuals are far more defensible than efficiency claims.

What is the AI Cloning Playbook?

The Cloning Playbook replicates success DNA from your first AI deployment to subsequent ones. It identifies five transferable elements: business case structure, measurement infrastructure, governance framework, change management pattern, and executive sponsorship model. Organizations that clone see 70 to 80 percent reduction in time-to-value for their second AI initiative.

How often should enterprises review AI performance?

Use a three-cycle operating rhythm. Monthly reviews track each AI initiative against business KPIs to catch performance issues early. Quarterly portfolio assessments evaluate all AI investments together for optimal resource allocation. Annual strategic resets align the AI portfolio with evolving business priorities and emerging technology capabilities.

What is the SEE MEASURE DECIDE ACT framework?

A four-step enterprise AI ROI playbook. SEE maps all AI tools across the organization. MEASURE connects AI activity to business outcomes like revenue and cost savings. DECIDE uses 30-day structured pilots to produce scaling decisions. ACT scales proven initiatives using CFO-ready business cases, success cloning, and monthly-quarterly-annual operating rhythms.

What is the difference between AI analytics and AI observability?

AI observability focuses on technical performance like latency and error rates. AI analytics extends beyond technical metrics to include business outcomes, ROI measurement, cost analysis, and governance. Observability tells you whether the system is running. Analytics tells you whether it is delivering value.

How do you measure AI ROI?

AI ROI is measured by comparing total AI costs (licensing, compute, API calls, implementation, human oversight) against measurable business value (time saved, revenue influenced, cost avoided, error reduction). The key is instrumenting AI systems to capture both sides continuously, not just during quarterly reviews.

What is shadow AI and why does it matter for analytics?

Shadow AI refers to AI tools used by employees without IT approval. It matters because you cannot measure what you cannot see. If 30% of AI usage happens in unsanctioned tools, your analytics are incomplete, cost estimates are wrong, and security has blind spots. Shadow AI detection is typically the first step in building an AI analytics practice.

Do you need a dedicated platform for AI analytics?

For one or two AI tools, vendor dashboards may suffice. For enterprises using multiple AI tools across teams, vendor dashboards create fragmented views. A dedicated vendor-neutral AI analytics platform provides the unified perspective needed to make strategic decisions about the entire AI program.

What industries benefit most from AI analytics?

Every industry deploying AI at scale benefits, but urgency is highest in regulated industries. Financial services, healthcare, and government face regulatory requirements demanding continuous monitoring and audit-ready evidence. Technology companies benefit from ROI optimization, understanding which AI investments deliver the highest return.

What is AI pilot purgatory?

AI pilot purgatory is when enterprise AI projects run indefinitely without producing decision-quality data. MIT found 95% of AI pilots deliver zero financial return. The average enterprise loses $15-25 million annually on pilots that are too expensive to kill and too poorly measured to champion or scale.

How long should an AI pilot last?

Best practice is 30 to 45 days. This is long enough to generate meaningful business outcome data and short enough to maintain executive attention. Shorter pilots lack statistical significance. Longer pilots risk losing stakeholder engagement and drifting into purgatory without clear decision points.

Why do 95% of AI pilots fail?

Most pilots are designed to test technology, not prove business value. They lack baseline metrics, predefined success thresholds, and decision frameworks. Without these, pilots generate usage data but cannot answer the question that matters: should we invest more in this AI initiative? The fix is structured measurement from day one.

What should you measure during an AI pilot?

Track two types of metrics. Outcome metrics measure the business KPI the AI should impact, such as revenue, cost reduction, or resolution time. Diagnostic metrics measure operational factors like adoption, data quality, and workflow integration. McKinsey found organizations that track defined KPIs see two-thirds meet or exceed targets.

What is a scale, fix, or kill decision framework?

At the end of a structured pilot, use predefined thresholds to decide: scale the initiative if it exceeds the success KPI, fix specific issues if performance is in the middle range, or kill the initiative if it falls below the minimum threshold. Set thresholds before the pilot starts to prevent post-hoc bias.

What is Future of Agentic?

Future of Agentic is a free, ungated research site for enterprise leaders navigating agentic AI. It includes a KPI library with 18 metrics, interactive ROI calculators, hundreds of enterprise use cases sortable by department and complexity, governance frameworks, an AI readiness quiz, and the Enterprise AI Unlocked podcast.

How do you measure AI agent success?

The Future of Agentic KPI library provides 18 metrics across agentic, chatbot, and AI application categories u2014 including agent task completion rate, autonomous resolution percentage, and cost per automated decision. Each metric includes definitions, calculation methods, and benchmarks that connect directly to business outcomes.

How do you calculate the cost of AI agents vs. employees?

The Agent TCO vs. FTE calculator on Future of Agentic models real total cost of ownership u2014 infrastructure, maintenance, monitoring, and iteration u2014 against human equivalents over time. A companion Zombie Agent Cost calculator estimates the expense of deployed agents no longer delivering results, a problem most vendors do not discuss.

Why do 40% of agentic AI projects get canceled?

Gartner predicts over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, and inadequate risk controls. Most agentic AI propositions today lack significant value because current models do not have the maturity to autonomously achieve complex business goals without proper planning and governance.

What is an AI readiness assessment?

An AI readiness assessment is a quick evaluation u2014 about 30 seconds u2014 that asks targeted questions about your organization's current AI maturity and produces a customized roadmap with recommended next steps. It helps executive teams align on where they actually stand versus where they think they stand before making investment decisions.

Category: AI Strategy

Strategic guidance for enterprise AI adoption and measurement

The Return of the Desktop App: And the AI Measurement Gap It Creates

For 25 years, the entire direction of travel in enterprise software was the same: everything moved to the browser. Salesforce on CDs gave way to Salesforce in a tab, Office gave way to Google Docs, Sketch gave way to Figma, and every installer eventually got replaced by a URL. The logic behind that shift was airtight. Zero friction to distribute, one codebase across every operating system, native multiplayer, continuous deployment, and a subscription revenue model that buyers actually preferred. The web won so decisively that even Adobe capitulated to subscription pricing in 2013, and Microsoft declared itself “cloud-first” within 52 days of Satya Nadella taking over in 2014. If you were building software in 2020 and told a VC you were shipping a desktop app, you were laughed out of the room.

And then, somewhere in the last 18 months, every AI-native company that could have stayed browser-only started shipping desktop apps instead.

OpenAI released a ChatGPT Mac app in May 2024, before they had reached feature parity on mobile. Anthropic followed with Claude desktop in November, alongside the Model Context Protocol, which went from around 2 million to 97 million monthly SDK downloads in 16 months. The entire point of MCP is giving AI access to the local filesystem that browsers cannot reach. Perplexity shipped a native Mac app. Cursor, a desktop IDE you download and install the old-fashioned way, is reportedly in talks to raise at a $50 billion valuation, which is roughly the last price tag attached to a desktop-first software company when that company was Microsoft.

Meanwhile Ollama, which exists purely to run AI models locally on your laptop with no API call involved, went from around 100,000 monthly downloads in early 2023 to over 52 million in early 2026. That is a 520x increase in three years for a product whose defining feature is that it does not touch the cloud. And Microsoft, the same Microsoft that was cloud-first, now mandates that every Copilot+ PC ship with 40 trillion operations per second of on-device AI silicon. The company that spent a decade telling its customers to move everything to Azure is now re-engineering consumer PCs around local inference.

The Web Won on Five Things, and AI Wants All Five Reversed

Every piece of enterprise software that moved to the browser did so because the browser offered five structural advantages: zero-friction distribution, SaaS economics, native multiplayer collaboration, cross-device access, and continuous deployment. For most categories, those advantages were decisive. Desktop software only held on in the handful of places where GPU access, filesystem access, offline reliability, or sub-10-millisecond latency were in the critical path. Video editing stayed on the desktop. So did CAD, IDEs, gaming, and anything that needed to push pixels or bits in real time. Those constraints were not ideological. They were physics.

And every serious AI workload happens to sit squarely inside them. AI agents need to read your actual codebase, not whatever you remembered to paste into a chat window. They run for minutes or hours, not the lifetime of a browser tab that the operating system feels free to suspend the moment you switch windows. Meeting copilots need raw screen and audio access that browsers wall off by design, for good security reasons. Voice AI and autocomplete UX fall apart the moment you introduce a network round-trip, which is why Cursor feels instant and most browser-based AI tools feel laggy. The same constraints that kept Premiere on the desktop in 2005 are now shaping the entire AI application layer in 2026, which means for the first time in a generation the list of software categories that have to live on your machine is growing rather than shrinking.

And That Creates a Measurement Problem

Here is where this gets interesting for anyone trying to run an enterprise AI program.

When AI lived in the browser, you could measure it. Your employees logged into ChatGPT through a centralized account, or they used a SaaS tool whose admin console told you exactly who used what and when. Single sign-on, audit logs, API gateway usage reports, the entire governance stack that evolved for SaaS could be pointed at AI with a few configuration tweaks. The web’s centralization was a pain point for vendors in 2000 and a gift to CIOs in 2020. Everything flowed through a known endpoint, and everything left a trace.

The desktop renaissance is dismantling that model, category by category, in a matter of months.

A developer using Cursor is running AI inference against your codebase on their local machine, and your IT team cannot see what they are doing through any centralized log. A knowledge worker using Claude desktop is having conversations with a foundation model that may or may not touch your network. A sales leader using Granola is recording every meeting on their device, with no browser session to inspect. A product team experimenting with Ollama is pulling seventy-billion-parameter models down from Hugging Face and running inference entirely offline, with no API call that your network observability tools can capture. The shadow AI problem that was already keeping CISOs up at night is about to get qualitatively worse, because the new generation of AI tools is specifically engineered to bypass the centralized chokepoints that corporate governance depends on.

You cannot measure what you cannot see, and you cannot govern what you cannot measure. The measurement gap that enterprises are already struggling to close in their AI ROI programs is about to widen significantly, precisely at the moment when boards and CFOs are starting to demand proof of value.

What the Old Playbook Got Wrong

For years, the default AI governance playbook at most enterprises has been some version of: restrict access to sanctioned tools, route traffic through an approved gateway, and generate usage reports from the gateway logs. That playbook works reasonably well when the AI tool in question is a cloud-hosted chatbot that an employee reaches through a browser. It falls apart the moment the AI tool is a desktop app that talks directly to a foundation model provider, or worse, runs inference on the laptop itself.

The uncomfortable truth is that measurement and governance in an AI-first enterprise cannot be built from the network layer or from SaaS admin consoles alone. You need a visibility layer that works across cloud, browser, desktop, and local-inference environments, and that treats each AI interaction as an observable event regardless of where the compute happened. You need metrics that CFOs actually want to see rather than vanity counts of API calls. And you need a governance model that assumes AI usage is heterogeneous and distributed by default, not centralized and inspectable by default.

What Leaders Should Do Now

The shift to AI-native desktop is not a reason to panic, and it is not a reason to try to block desktop AI apps. Every serious study on enterprise AI adoption points to the same conclusion: knowledge workers will use the tools that make them productive, and the companies that lean into that rather than fighting it capture disproportionate value. The question is not whether to allow your teams to use Cursor and Claude and Ollama. The question is whether you can see enough of what is happening across all of them to understand the true ROI of agentic AI, catch governance failures before they become incidents, and make informed decisions about where to invest next.

That starts with accepting that your AI measurement layer needs to extend into the desktop, the IDE, and the on-device inference runtime, not just the browser. It continues with building unified AI analytics that aggregate events from across environments into a single view. And it ends with a governance model that is resilient to heterogeneity, because the direction of travel for the next five years is more AI, in more places, running on more devices, against more models, not less.

The desktop is back. The browser is not going away. Most enterprises will run both, permanently. The organizations that win will be the ones that can see across both, measure across both, and make decisions grounded in that visibility.

If you are thinking about how to build that measurement layer inside your organization, we would love to talk.

April 21, 2026
Tokenmaxxing Is the New Lines of Code: Why Token Leaderboards Won’t Prove AI Value

Someone at Meta built a leaderboard called Claudeonomics. It ranked employees by the number of tokens their AI models processed and generated. Top spenders got rewards. Then it leaked to the press. Then Meta quietly shut it down.

That was earlier this month. This week, Reid Hoffman came out in measured defense of the practice at Semafor’s World Economy Summit. On the same day, an inference-infrastructure startup called Parasail raised $32 million on the thesis that “tokenmaxxing” will create the next compute giant. The company already generates 500 billion tokens a day.

If you run an AI program and you haven’t yet been asked by your CEO or your board why your engineers aren’t in the top quartile of token consumption, you will be soon. And when that conversation arrives, you need a better answer than a bigger number.

What tokenmaxxing actually measures

A token is a small chunk of text an AI model processes. Every prompt consumed, every response generated, every line of code auto-completed — they all add up to a token count. “Maxxing” is Gen Z slang for optimizing something to the extreme. Put them together and you get the idea: rank employees by how many tokens they burn, and call the top of the list your best AI adopters.

Meta built the internal dashboard. Shopify folded AI usage into performance reviews. Venture capital is now funding the picks and shovels. Hoffman’s defense, notable because he’s one of the more careful voices in the debate, was a cautious endorsement: “You should be getting people at all different kinds of functions actually engaging and experimenting [with AI].” He then immediately added that token tracking “doesn’t mean it’s a perfect example of productivity.”

Read that second sentence again. The strongest public defender of tokenmaxxing concedes, in the same breath, that it doesn’t measure productivity. Which raises the question everyone at Meta was too polite to ask before the leaderboard leaked: what exactly are we measuring, and why?

The new lines of code

If this pattern feels familiar, it should. For decades, engineering organizations tried measuring developer productivity by lines of code written. The metric was easy to count, easy to rank, and spectacularly broken. Engineers who wrote terse, elegant code scored poorly. Engineers who produced verbose, repetitive code scored well. Every competent engineering leader learned the lesson the hard way: when you turn an input metric into a target, people optimize for the metric, not the work.

Economists call this Goodhart’s Law. When a measure becomes a target, it ceases to be a good measure. Token consumption is lines of code with a fresh coat of paint. It’s an input. It’s easy to count. And it tells you almost nothing about whether the work the AI produced was useful, correct, or worth the compute bill that came with it.

The cynical version of tokenmaxxing plays out predictably. Employees pad their AI usage with throwaway prompts. Managers celebrate the chart going up. Finance sees the OpenAI and Anthropic invoices climbing and asks what changed. Nobody can tell them, because the leaderboard only measures spend. We covered this exact anti-pattern in AI Metrics That Matter — the gap between what’s easy to count and what a CFO actually wants to see.

Why it’s seductive anyway

Tokenmaxxing isn’t popular because executives are naive. It’s popular because real AI measurement is hard and token counts are sitting right there in the API billing dashboard. When a CEO asks the head of AI whether the organization is actually using its new tools, “we processed 4.2 billion tokens last quarter, up 340%” is a satisfying answer to give. It’s specific. It’s directional. It trends up and to the right.

It’s also, as NVIDIA’s recent survey of 3,200 enterprise leaders revealed, roughly the level of measurement most organizations have settled for. As we covered in our analysis of the NVIDIA State of AI report, 30% of enterprises still cannot measure the ROI of their AI investments at all. Token counts are what you reach for when you’ve given up on measuring the thing you actually care about.

The other reason tokenmaxxing spreads is that it pushes a real problem — AI adoption — through an easy pipe. In most enterprises, the gap between AI tool licenses purchased and AI tools actually used by employees is enormous. Licenses go unclaimed. Copilots go idle. Shadow AI proliferates in the gap. Counting tokens at least tells you who’s trying something. But “trying something” is a foundation for measurement, not its destination.

What outcome-based measurement looks like

The measurement you want isn’t on the API invoice. It’s in the business system the AI was supposed to change. If your developers are using AI coding tools, the question isn’t how many tokens they generated — it’s whether cycle time dropped, whether pull request quality held, whether production incidents stayed flat. If your sales team is using an AI assistant, the question is whether deal velocity improved, not whether reps sent more prompts.

This is the measurement layer missing from almost every tokenmaxxing dashboard we’ve seen. It’s also the layer that Coding IQ and the rest of the Olakai platform exist to provide. The question we ask our customers to answer isn’t “how much AI did you use?” It’s “what did your AI produce, for whom, and at what business outcome?” Those three questions are the ones a CFO will ask when the bill arrives, and the ones a CISO will ask when governance gets challenged.

We built an entire framework around this. We call it SEE → MEASURE → DECIDE → ACT. SEE surfaces every AI tool in use, not just the sanctioned ones. MEASURE ties usage to business KPIs the executive team already cares about. DECIDE gives you the evidence to scale, fix, or kill each pilot. ACT turns the answers into an operating rhythm instead of a once-a-quarter scramble. None of those steps begin with token counts. All of them produce numbers your board will actually recognize as value.

The governance blind spot

There’s a second problem with tokenmaxxing that rarely gets discussed. A leaderboard that rewards token spend creates an incentive to bypass governance controls to get more of it. Employees who find a sanctioned tool too slow, too throttled, or too narrow in capability will reach for something unsanctioned. Shadow AI already grew fast in the absence of measurement. Adding a scoreboard that rewards consumption accelerates it.

This is the worry that haunts every CISO we talk to, and it’s why the CFO view and the CISO view of AI can’t live in separate dashboards. You cannot measure AI ROI without measuring AI risk, because the risk is the other half of the cost. Tokenmaxxing, by design, only counts one side.

Getting started: audit what your AI produces, not what it consumes

If your organization is under pressure to show AI adoption and you’re being nudged toward tokenmaxxing, there’s a better first step. Pick the three most visible AI deployments in your organization — coding assistants, a customer support copilot, a sales enablement tool — and, for each one, write down the business outcome it was supposed to change. Cycle time. First-contact resolution. Win rate. Whatever it is, write it down. Then measure whether the outcome moved. Our AI ROI framework walks through this end to end.

Do that for three deployments and you’ll know more about the real state of AI in your organization than any token leaderboard will tell you. You’ll also have the beginnings of a measurement system that survives the next wave of AI hype, whatever it gets called. Lines of code didn’t survive the last one. Tokenmaxxing won’t survive this one. Outcomes always do.

Olakai helps enterprises measure what their AI is actually producing — across every tool, every user, every workflow — and tie it back to the business KPIs executives already track. If tokenmaxxing is the conversation your board is having, we can help you lead a better one. Talk to an expert.

April 15, 2026
AI Can Do Math After All: Finance Is the #2 AI ROI Function and Nobody’s Talking About It

A year ago, the knock on AI in finance was simple: it can’t do math. And honestly, the critics had a point. A University of Waterloo study found that GPT-4o got basic multiplication wrong more than 70% of the time. The internet’s favorite example was even simpler than that: ask ChatGPT how many R’s are in “strawberry” and it would confidently tell you two. For CFOs and finance leaders watching from the sidelines, the message was clear. If this thing can’t count letters, it’s not touching our books.

That was twelve months ago. The tools caught up faster than almost anyone predicted. Reasoning models, code execution, structured outputs, and vertical-specific AI applications have closed the gap between “can’t do math” and “cuts your financial close by a week.” And now we have the data to prove it.

Finance Is the Second-Biggest AI ROI Story Nobody’s Talking About

Silicon Valley Bank just published their 2026 State of the VC-Backed CFO report, surveying 230 finance leaders at high-performing venture-backed companies. The headline finding on AI: 51% of companies that budgeted for AI tools last year report measurable ROI from that spending. But the more interesting number is the breakdown by function.

Product and Engineering leads at 73%, which surprises no one. The AI coding assistant market has been the loudest story in enterprise software for two years. But right behind it, at 42%, is Finance. Ahead of Marketing (41%), Customer Support (41%), Sales (34%), and Legal (27%). Finance teams are quietly generating more measurable AI returns than almost every other function in the company, and the conversation hasn’t caught up yet.

Most of the media coverage, the conference panels, and the vendor marketing around AI ROI have centered on engineering productivity. That makes sense — that’s where the tooling matured first. But the SVB data tells a different story. The CFO’s office is becoming one of the most productive proving grounds for AI in the enterprise, and the returns are showing up in places that directly affect the bottom line.

Where AI Is Delivering Real Returns in Finance

So where exactly is the 42% coming from? The gains are concentrated in a handful of core finance operations that share a common trait: they’re repetitive, data-heavy, and historically consumed enormous amounts of skilled human time.

The monthly close. A joint study from MIT Sloan and Stanford GSB, published in August 2025, analyzed hundreds of thousands of transactions across 79 companies and found that AI cuts the monthly financial close by 7.5 days on average. For anyone who’s lived through the close process, that number speaks for itself. A week back is a week of analysis, planning, and decision-making that finance teams didn’t have before.

FP&A and forecasting. Financial planning and analysis teams are running forecast cycles 30-40% faster with AI-assisted modeling. The FP&A function has historically been one of the most strategic roles in finance but also one of the most time-constrained. When your team spends less time building the model and more time interpreting what it says, the quality of the output changes. According to a 2025 FP&A Trends survey, 53% of organizations still don’t use AI in any FP&A process, which means the early movers have a significant head start.

Accounts payable and cost analytics. McKinsey found that 44% of CFOs now use generative AI across five or more finance use cases, up from just 7% the year before. AP processing, cost analytics, variance analysis, and fraud detection are among the most common deployments. These aren’t moonshot applications. They’re the blocking and tackling of corporate finance, automated at scale for the first time.

The SVB report adds another layer to this: companies that reported ROI from AI in customer service applications showed the highest median revenue per employee at $327K, followed by Marketing at $311K and Finance at $259K. Finance may not top that particular metric, but the breadth of its AI adoption across multiple sub-functions — close, FP&A, AP, audit, compliance — makes it one of the most versatile AI verticals inside any company.

The Spending Is Accelerating. The Measurement Isn’t.

The SVB report reveals just how aggressively companies are investing in AI. Median spending on AI platforms and tools jumped from $2K in 2024 to $20K in 2025 — a 10x increase in a single year. CFOs expect that to double again to $50K in 2026. And 65% of the companies surveyed plan to spend more on AI this year than they spent on accounting software last year. That’s a striking data point. AI budgets are approaching parity with one of the most established categories in enterprise finance software.

But here’s the tension: while spending is doubling, only about half of companies can actually demonstrate that the investment is working. The other 49% are spending without a clear picture of return. This is a familiar pattern in enterprise technology adoption. The budget moves faster than the infrastructure to measure what it’s actually producing.

Deloitte’s Q4 2025 CFO Signals survey reinforces this gap. Among 200 North American CFOs at companies with $1B+ in revenue, 87% said AI would be “extremely or very important” to finance operations in 2026. Technology transformation displaced enterprise risk management as CFOs’ top priority for the first time. Yet only 21% of active AI users in finance said it had delivered clear, measurable value. The ambition is there. The measurement infrastructure, for most companies, is not.

This is the core problem we’re building Olakai to solve. Not running the AI, but giving finance leaders — and every other function — visibility into whether their AI investments are actually delivering returns. When you can measure AI ROI across tools, teams, and use cases from a single platform, the conversation with the board changes from “we think AI is working” to “here’s exactly what it’s producing.”

Why This Matters for CFOs and Board Members Right Now

The SVB data carries an implication that goes beyond operational efficiency. Companies that have demonstrated ROI from AI implementation are half as likely to have raised a bridge round or extension round in the last 12 months compared to those that haven’t. AI isn’t just saving time in the back office — it’s becoming a signal of operational discipline that investors are watching for.

Meanwhile, 91% of the VC-backed companies surveyed now encourage employees to use AI at work, up from 68% last year. One in three companies is already hiring fewer junior-level employees because of AI. The workforce implications are real and accelerating, and they’re landing squarely on the CFO’s desk — headcount planning, budget reallocation, productivity benchmarking, all of it.

For CFOs and board members who haven’t yet engaged deeply with AI in their own function, the SVB report should be a catalyst. The question is no longer whether AI can handle finance work. The “strawberry” era is over. The question is whether your organization can measure the value it’s already generating — and whether you can build the framework to prove ROI before your next board meeting.

Getting Started: Three Steps for Finance Leaders

If the SVB data resonates and you’re thinking about where to start, the playbook is more straightforward than it appears. First, audit what your team is already using. Gartner’s 2025 data shows that 59% of finance functions have already adopted some form of AI, but in many cases leadership doesn’t have full visibility into what tools are deployed, who’s using them, and what they’re accomplishing. Start with a visibility audit — you can’t measure what you can’t see.

Second, pick one high-volume process and measure it. The monthly close is the most obvious candidate based on the data, but AP processing and FP&A forecasting are equally strong starting points. Define a baseline, deploy an AI tool, and track the delta. The companies seeing 42% ROI in the SVB survey didn’t transform their entire finance stack overnight. They ran structured pilots, measured the results, and scaled what worked.

Third, build the measurement layer before you scale. The 49% of companies that can’t demonstrate AI ROI aren’t necessarily failing at AI — they’re failing at measurement. Put the infrastructure in place to track what your AI tools are doing across finance before you double the budget. That’s how you turn the SVB report’s 42% from a benchmark into a floor.

The CFO has always been the person in the room who measures everything — revenue, burn, margins, headcount efficiency. Now that same discipline needs to be applied to AI itself. The finance leaders who figure out how to measure their own AI investments are going to be the ones driving the next conversation with their boards.

Talk to an Expert about how Olakai gives finance leaders visibility into AI ROI across every tool and team.

April 13, 2026
5 Tools Enterprises Actually Use to Measure AI ROI — And What None of Them Get Right

Picture the quarterly board meeting at a Fortune 500 company. The CFO pulls up a slide: $12 million spent on AI tools in the past year. Copilot licenses. Cursor seats. ChatGPT Enterprise. A handful of custom agents. Three pilots that turned into “ongoing experiments.” Then the question: What did we get for it? Silence. Not because the tools aren’t being used — they are, more than anyone expected. Because nobody in the room can answer that question with a number. That’s the gap this post is about.

Enterprise AI measurement today exists at three layers: tool usage and adoption (who’s using what), workflow and productivity impact (are they faster), and business outcomes (did revenue, margin, or retention actually move). The problem is structural. Every measurement tool on the market lives at layer one or two — and calls it ROI. None of them connect to layer three. That’s not a product limitation. It’s a measurement philosophy problem.

1. Microsoft Copilot Analytics

Microsoft’s built-in Copilot Dashboard tracks M365 Copilot usage across the organization: prompts submitted, documents generated, meetings summarized, emails drafted. It’s native to the Microsoft ecosystem, which means zero integration effort and instant visibility for IT admins. For a 10,000-person org paying $30–60 per seat per month, that visibility matters — you’re looking at $3.6 to $7.2 million a year in Copilot licensing alone.

Microsoft Viva Insights Copilot Analytics — tracks usage activity, not business outcomes.

The weakness is fundamental. The dashboard provides a 28-day aggregated view with no per-user ROI correlation and no connection to business outcomes. You know Copilot is being used. You know how often. You have no idea whether it’s helping. Microsoft also disclosed a metric computation bug that underreported email engagement data for nine months — a quiet reminder that vendor-reported metrics aren’t always reliable, even from the vendor itself. Activity is not impact.

2. GitHub Copilot and GitLab Duo Metrics

GitHub Copilot reports code suggestion acceptance rates (averaging 27–30%), time saved per developer (roughly 3.6 hours per week), and suggestion frequency across your engineering org. GitLab Duo offers similar dashboards for its AI features. Developer teams love this data. Engineering leaders use it to justify expansion, track adoption curves, and identify which teams are getting the most value from AI-assisted coding.

GitHub Copilot Metrics — acceptance rates and usage charts, but no connection to business outcomes.

The limitation is scope. These tools measure developers — and only developers using that specific tool. Your marketing team running campaigns through ChatGPT? Invisible. Your finance team using Gemini for forecasting models? Invisible. Your legal team reviewing contracts with Claude? Invisible. And “acceptance rate” is a product metric, not a business metric. A 30% acceptance rate tells you developers kept 30% of suggestions. It says nothing about whether those suggestions shipped faster, reduced bugs, or moved a revenue number. Dev-only measurement in an enterprise where every department uses AI is a partial answer at best.

3. GetDX, Pluralsight Flow, and LinearB

These platforms measure developer productivity through DORA metrics, developer experience scores, PR cycle time, and deployment frequency. They’re legitimate engineering intelligence tools — McKinsey’s 2025 State of AI report found that 88% of organizations have adopted AI, but only 39% can report any EBIT impact. These developer platforms didn’t cause that gap, but they don’t close it either.

DX platform architecture — strong developer intelligence, but scoped to engineering teams. Image courtesy of DX.

The positioning is explicit: these are developer productivity tools, not AI ROI platforms. Some vendors have started rebranding DORA metrics as “AI measurement,” adding overlays that compare AI-assisted versus non-AI-assisted PRs. That’s useful context for an engineering VP. It’s not what the CFO means when she asks about AI ROI. DORA metrics existed before AI coding tools did. Relabeling them doesn’t make them an AI measurement strategy.

4. Workday and ServiceNow Built-In AI Analytics

Both Workday and ServiceNow — along with Salesforce Einstein, SAP Joule, and dozens of other enterprise platforms — now report on their own AI feature usage. Workday shows you AI-generated job descriptions and skills recommendations. ServiceNow tracks virtual agent deflection rates and case summarization usage. The strength is obvious: zero integration effort, immediate availability, and perfect accuracy within that vendor’s walls.

ServiceNow AI Control Tower — comprehensive within ServiceNow, but silent on every other AI tool in the stack. Image courtesy of ServiceNow.

The weakness is equally obvious: each platform is a silo. Workday tells you about Workday AI. ServiceNow tells you about ServiceNow AI. Salesforce tells you about Salesforce AI. Nobody tells you about all of them together. For an enterprise running AI across fifteen platforms, you’d need to log into fifteen dashboards, normalize fifteen different metric definitions, and somehow reconcile them into a single view. Most don’t try. The result is that enterprise AI measurement defaults to whoever shouts the loudest in the vendor review.

5. Custom BI Dashboards (Tableau, Power BI)

This one isn’t a product — it’s a pattern. Many enterprises, frustrated by the limitations above, decide to build their own AI measurement dashboard. Pull API data from each AI tool into a data warehouse, model it in dbt or Databricks, visualize it in Tableau or Power BI. The appeal is total customization: you define the metrics, you own the schema, you control the narrative.

The reality is expensive and slow. Enterprise-grade BI implementations take three to six months for multi-source deployments, and first-year costs for a 5,000-person org run between $510K and $1.2 million — often more than the AI tools being measured. There’s no standardized schema for AI usage data, no external benchmarks to compare against, and every API change from every vendor breaks something. Most custom dashboards become the responsibility of one or two analysts, and when they leave, the dashboard dies with them. You’ve built a measurement tool that costs more than what it measures.

The Real Problem: A Measurement Philosophy Gap

Each of the tools above measures AI in isolation. Microsoft measures Microsoft. GitHub measures GitHub. Workday measures Workday. The custom dashboard tries to stitch them together but creates a maintenance burden that’s unsustainable at enterprise scale. Meanwhile, the actual ROI question is cross-enterprise: which teams adopted which tools, what changed in their output, and did any of it move a business metric?

That question requires connecting three dots: adoption data (who’s using what), productivity signals (what changed in their work), and business outcomes (did it matter). Forrester’s 2026 Predictions report found that fewer than one in three AI decision-makers can tie AI value to P&L changes. Not because they aren’t trying — because their tools don’t connect those layers. That’s not a product gap. It’s a measurement philosophy gap. You can’t vibe-code accountability.

What Olakai Does Differently

This is the problem we built Olakai to solve. Not another vendor-specific dashboard. Not another developer productivity overlay. A vendor-neutral analytics and governance platform that works across your entire AI stack — ChatGPT, Copilot, Gemini, Cursor, Claude, custom agents, and the AI features embedded in your SaaS applications — and connects what’s being used to what it’s actually producing.

Olakai is structured around three product lines, each covering a category that the tools above treat in isolation. Assistive IQ measures adoption, productivity, and shadow AI across chatbots and copilots — deployed through a Chrome extension that takes minutes, not months. Coding IQ connects to your GitHub org and AI coding tool providers to unify cycle time data, AI-assisted PR rates, developer adoption cohorts, and cost-per-PR across Copilot, Cursor, Claude Code, and Windsurf in a single view. Agent IQ tracks custom agentic workflows with execution metrics, success rates, and cost-per-execution tied to business KPIs you define. None of these exist in separate tools. They exist in one platform, measured against the same outcomes.

Olakai — unified AI analytics across assistive, coding, and agentic AI in a single platform.

The difference isn’t just breadth — it’s the connection between layers. Every tool in this article measures activity. Olakai connects that activity to business outcomes through custom KPIs that map AI usage to the metrics your CFO actually reports on: revenue influenced, cost avoided, time recaptured, risk reduced. When the board asks what $12 million in AI spend produced, Olakai is the platform that gives you the answer — not a usage chart, not an acceptance rate, but a number tied to a business result.

We’re not replacing the tools above — most of our customers use several of them. Microsoft Copilot Analytics still tells you how Copilot is being used. GitHub Copilot Metrics still shows acceptance rates. ServiceNow’s AI Control Tower still tracks its own AI features. What none of them do is answer the cross-enterprise question: across all of these tools, all of these teams, all of these investments — are we getting ROI, and where? That’s the layer Olakai provides. And with Kai, anyone on the team can ask that question in plain language and get a reasoned, data-backed answer in seconds — no analyst required, no dashboard to build.

Kai — ask “What’s my AI ROI this month?” and get a reasoned, data-backed answer in seconds.

See how Olakai connects AI adoption to business outcomes →

April 2, 2026
Is Your $500K AI Coding Tool Investment Paying Off? What the Data Shows

Most engineering leaders made the same bet in 2024. They licensed GitHub Copilot for the team, added Cursor for the power users, maybe rolled out Claude Code for a few senior engineers. The invoices added up fast. A mid-sized engineering organization with 100 developers can easily spend $400,000 to $600,000 per year across these tools before accounting for the API costs that accumulate quietly in the background.

The bet seemed obvious. The tools were impressive in demos. Every vendor had benchmarks showing dramatic productivity gains. And the competitive pressure to “enable developers with AI” made saying no feel reckless. So the tools went in, the credit cards got charged, and the organization moved on to the next priority.

Twelve months later, most of those organizations still cannot answer the most basic question their CFO will eventually ask: is this working?

The Benchmark Problem

The AI coding tool vendors are not shy about publishing productivity statistics. GitHub claims Copilot users are 55% faster at coding tasks. Cursor publishes testimonials from engineers who describe 10x output improvements. Anthropic’s data on Claude Code shows meaningful reductions in time-to-completion for well-defined tasks.

These numbers are real, in the sense that they come from controlled evaluations of specific tasks. But controlled evaluations are not engineering organizations. The gap between “this tool helped a developer complete an isolated coding challenge faster” and “this tool made our entire engineering organization more effective” is where most ROI analysis breaks down.

The industry research is more sobering. Jellyfish, which analyzes data from over 500 engineering organizations, puts the average cycle time improvement from AI coding tools at around 25%, with PR throughput gains of roughly 12%. Those are meaningful numbers for a well-run rollout. But Jellyfish also tracks adoption rates, and the data shows that AI-assisted PRs account for roughly half of all merged pull requests across their customer base, up from 14% just two years ago — which means roughly half of your developers’ output still has no AI involvement at all, despite the licenses sitting idle in the admin console.

McKinsey’s research on AI-enabled software engineering found that productivity gains are highly uneven across teams and functions, and that organizations with structured measurement programs capture three to four times more value from AI tools than those without. The tools don’t create value uniformly. Whether your organization captures that value depends almost entirely on whether anyone is paying attention to the data.

What “Paying Off” Actually Means

There is a version of this analysis that stops at cycle time. If your AI-assisted pull requests close 25% faster than non-AI PRs, and you can assign a dollar value to engineering time, you can construct a spreadsheet that shows a positive return. Many organizations do exactly this and call it done.

That math is not wrong, but it is incomplete in ways that matter. Three dimensions of ROI tend to get ignored.

Adoption is not uniform. Aggregate adoption rates hide the distribution underneath. In most engineering organizations, AI coding tool adoption follows a familiar pattern: a small cohort of power users who have integrated AI deeply into their workflow, a larger group of casual users who pull the tool out occasionally, a segment who have never meaningfully engaged, and new adopters still learning. These cohorts have entirely different productivity profiles. A 50% adoption rate that is all casual usage delivers a fraction of the value compared to a 50% rate built on genuine depth. The aggregate metric obscures everything interesting.

Tool spending is not consolidated. The average engineering organization is paying for multiple AI coding tools simultaneously. The same developers who have GitHub Copilot licensed are also using Cursor, and some have Claude Code running in their IDE. The vendor consoles report usage for their own tool only. No single view shows you cost per PR across all providers, which tool is delivering the best return per dollar, or where licenses are sitting unused. Without that cross-vendor view, optimization is impossible.

Not all PRs are equal. AI coding tools deliver more value on some work than others. Boilerplate generation, documentation, test writing, and well-scoped feature additions tend to see strong AI contribution. Architecture decisions, complex debugging, and novel problem-solving tend to see less. If your metric is simply “AI code ratio” — the percentage of merged lines that originated from an AI tool — you may be measuring the wrong thing, or at least measuring it in a way that tells you nothing about whether the AI contribution was on the work that matters most.

What Measurement Actually Requires

Getting a real answer to the ROI question requires connecting three data sources that almost no organization has unified.

The first is GitHub data: PR volume, cycle time, AI commit detection, code contribution patterns by developer and team. This is where the before-and-after comparison lives. AI-assisted PRs versus non-AI PRs, by team, by developer cohort, by time period. Without this, you are estimating.

The second is provider cost data: per-user spend, token consumption, acceptance rates, and usage patterns by tool. This requires pulling from the admin APIs of each vendor — Anthropic, GitHub, Cursor, Windsurf, OpenAI — and normalizing the data into a single cost view. The math is not complicated, but the data integration work is non-trivial, and almost no engineering organization has done it.

The third is the developer adoption dimension: who is in which cohort, which teams are getting deep value versus surface-level usage, and where the gaps are. This is where the improvement roadmap lives. If your power user cohort is 8% of your developers and your casual cohort is 42%, you have a very different problem than if those numbers are reversed.

When these three data sources are unified, the analysis becomes tractable. Cost per PR by provider. Cycle time delta for AI-assisted versus non-AI work. Developer cohort distribution by team. Which providers are getting the most usage per dollar. Where idle licenses should be reassigned. These are the questions the CFO is eventually going to ask. The organizations that can answer them will have a very different conversation than those who cannot.

How Coding IQ Approaches This

This is the problem Olakai’s Coding IQ was built to solve. Rather than requiring engineering teams to build custom data pipelines or rely on fragmented vendor consoles, Coding IQ connects directly to your GitHub organization and your AI coding tool admin APIs — Anthropic, GitHub Copilot, Cursor, Windsurf, OpenAI — and pulls the data together automatically.

The result is a unified view: cycle time comparison between AI-assisted and non-AI PRs, provider cost breakdown, developer adoption cohorts (Power, Casual, New, Idle), team-level benchmarks, and a cost-per-PR metric by provider. Questions that previously required a data engineering project — “which coding tool gives us the best ROI?”, “which teams have the lowest AI adoption?”, “what is our AI code ratio trending toward?” — become answerable in seconds.

Coding IQ also surfaces what the vendor dashboards cannot. Shadow AI in engineering is real: developers using personal API keys, unauthorized tools, or AI assistants outside sanctioned tools. A developer who builds on Claude’s API with a personal account doesn’t show up in your GitHub Copilot analytics. Coding IQ detects AI contribution patterns from the code itself — not just from vendor data — so the picture is complete rather than bounded by what each vendor chooses to report.

For organizations already using a dedicated engineering intelligence platform, the question worth asking is whether that platform can show you governance, shadow AI exposure, and the full cross-vendor cost picture alongside your engineering metrics. For most, the answer is no. Coding IQ was built to provide that layer.

The Question Worth Asking Now

Engineering organizations are entering a moment where AI coding tool budgets are large enough to require accountability. The days of “it feels productive” as sufficient justification are ending. CFOs are starting to ask for the data. Boards are asking whether AI investments across the organization are generating returns.

The organizations that will be able to answer those questions are the ones that started measuring before the question was forced on them. Not because the tools are failing — many of them are genuinely delivering value — but because value without measurement is invisible. And invisible value does not survive budget season.

If you are spending $400,000 per year on AI coding tools and cannot answer what your cost per PR is, which teams are in which adoption cohort, or whether your investment would be better concentrated in one tool over another, the issue is not the tools. The issue is measurement.

You have the data. You are probably just not looking at it yet.

Talk to an expert to see how Coding IQ gives engineering leaders the full picture on AI coding tool ROI.

March 26, 2026
Your AI Coding Tools Are Generating Code. Are They Generating Value?

Your engineering team just shipped 10,000 lines of code this sprint. Nearly half of it was written by AI. Do you know which half — and whether it was any good?

This isn’t a theoretical question anymore. According to the 2025 DORA Report, almost half of companies now have at least 50% AI-generated code, up from just 20% at the start of 2025. Ninety percent of engineering teams now use AI coding tools in their workflows. Cursor crossed $2 billion in annualized revenue by February 2026. Claude Code hit $2.5 billion. GitHub Copilot remains embedded in enterprises worldwide. The adoption question is settled.

The measurement question is not.

The Measurement Gap Nobody Talks About

Here’s what most engineering leaders are tracking: lines of code generated, completion acceptance rates, developer satisfaction surveys, and seat utilization. These are vanity metrics. They tell you that developers are using the tools. They don’t tell you whether the tools are making your organization better.

BCG found that 60% of companies have no defined financial KPIs for their AI initiatives — they’re counting pilots, celebrating deployments, and measuring model accuracy instead of actual business value. Bain’s 2025 Technology Report went further, finding that AI coding tools deliver only 10 to 15 percent productivity gains despite adoption by two-thirds of software firms. That’s a fraction of the 10x improvement vendors promised.

The gap between what companies measure and what actually matters is where millions disappear. Your board isn’t asking how many code completions your team accepted last quarter. They’re asking whether your $1.2 million in AI coding tool licenses is making your engineering organization faster, safer, and more competitive. If you can’t answer that question with data, you have a measurement problem — not a productivity problem.

What You Should Be Measuring Instead

The metrics that matter for AI coding tools aren’t about the tools themselves. They’re about what happens after the code ships.

Cycle time delta. How much faster do AI-assisted pull requests move from first commit to production compared to non-AI pull requests? This is the clearest signal of real productivity gain. Early data suggests AI-assisted PRs are 25 to 40 percent faster through the pipeline, but this varies wildly by team, codebase complexity, and tool. If you aren’t measuring the delta, you’re guessing.

Incident rate on AI-authored code. A Stanford study cited by CIO.com found that participants using coding assistants wrote less secure code in 80% of tasks — yet were 3.5 times more likely to believe their code was secure. That confidence gap is dangerous. If your AI-generated code is creating more production incidents, more security vulnerabilities, or more hotfixes, the productivity gains are illusory. You need to track post-deployment quality by code origin.

Cost per pull request by provider. Your team is probably using three or four AI coding tools simultaneously — Copilot on some repos, Cursor on others, Claude Code for complex refactors. Each has different pricing, different token consumption patterns, and different value profiles. Without a unified cost-per-PR metric across providers, you can’t make rational decisions about which tools to standardize and which licenses are going unused.

Deployment frequency. The DORA framework remains the gold standard for engineering performance, but AI introduces a wrinkle. Deployment frequency may rise slightly while lead times increase as review cycles grow longer to accommodate AI-generated code. Measuring deployment frequency in isolation misses this dynamic. You need to track it alongside review time and change failure rate to see the full picture.

The Shadow Coding Problem

There’s another dimension most CTOs haven’t confronted: developers using personal accounts for AI coding tools that your organization doesn’t manage, monitor, or govern.

A developer signs up for Cursor with a personal email. Another uses Claude Code through a personal API key. A third is running a locally hosted model for code generation. None of these show up in your IT asset inventory. None are covered by your data handling policies. And all of them are processing your proprietary source code through systems you don’t control.

This is shadow AI in the codebase — and it’s arguably more dangerous than shadow AI in other parts of the organization because the outputs become permanent parts of your software. Code generated through ungoverned tools gets committed, reviewed, merged, and deployed. It becomes your product. If that code was generated using a model that trained on GPL-licensed code, or if proprietary algorithms were sent to a third-party API without appropriate data handling agreements, the liability sits with your organization — not the developer.

According to HiddenLayer’s 2026 AI Threat Landscape Report, 76% of organizations now cite shadow AI as a definite or probable problem, a 15-point jump from the prior year. For engineering organizations, the stakes are uniquely high because the shadow doesn’t just create risk — it becomes part of the product.

The Adoption Cohort Blindspot

Aggregate metrics hide critical patterns. When engineering leaders report that “our team has 70% AI adoption,” they’re averaging over a distribution that looks nothing like a uniform curve.

In practice, adoption breaks into cohorts. Power users — developers with more than 70% of their pull requests AI-assisted — are producing dramatically different work than casual users at 20 to 40 percent. New adopters who started using AI tools within the past two weeks have different needs than idle users who tried a tool once and stopped. Each cohort requires different support, different training, and different expectations.

Without cohort-level visibility, you can’t identify which developers are getting genuine value, which ones need enablement, and which expensive licenses are sitting unused. You also can’t detect the productivity paradox that multiple studies have now documented: developers predict a 24% speedup from AI tools but some studies have measured a 19% slowdown, while those same developers still report a 20% perceived improvement afterward. The gap between perception and measurement is real, and only cohort-level data can surface it.

What the Competitors Miss

Engineering analytics platforms like Jellyfish have built impressive capabilities for measuring developer productivity. They can track DORA metrics, analyze PR throughput, and benchmark teams against each other. But they were built before AI coding became the default mode of software development, and their architecture reflects that.

Most engineering analytics tools work from metadata — commit timestamps, PR merge events, Jira ticket transitions. They can tell you that a developer merged 12 PRs this week. They can’t tell you which of those PRs were AI-assisted, what tool was used, how much it cost, or whether the AI-generated portions introduced quality issues. Without code-level detection that identifies AI co-author trailers, bot PR authors, and tool-specific markers, the attribution problem remains unsolvable.

Then there’s the governance dimension. Your CISO needs to know which AI tools are processing your source code and whether they comply with your data handling policies. Your CFO needs to know the total cost across all AI coding providers, not just the ones IT provisioned. Your compliance team needs an audit trail showing what code was AI-generated and by which model. Productivity analytics tools don’t cover any of this.

The measurement gap isn’t just about better dashboards. It’s about connecting AI ROI measurement with governance, cost control, and security in a single view — the same way organizations learned to manage cloud infrastructure by combining performance monitoring with cost optimization and compliance controls.

Building the Framework

If you’re spending six or seven figures on AI coding tools and can’t answer basic questions about their impact, here’s where to start.

First, establish a baseline. Before you can measure improvement, you need to know where you stand. What percentage of your pull requests are AI-assisted? What’s your current cycle time for AI-assisted versus non-AI code? What are you spending per developer, per provider, per month? Most engineering organizations can’t answer these questions today.

Second, segment by cohort. Stop reporting a single adoption number. Break your engineering organization into power users, casual users, new adopters, and idle license holders. Each cohort tells a different story, and each requires a different response.

Third, connect quality to origin. Track incident rates, security findings, and change failure rates by whether the code was AI-assisted or not. This is the data your board actually needs — not how many lines the AI generated, but whether those lines made your product better or worse.

Fourth, unify cost visibility. Aggregate spending across Copilot, Cursor, Claude Code, and every other tool your developers are using — including the ones they’re paying for themselves. The enterprise AI revenue gap starts with cost sprawl that nobody can see.

The organizations that will win the AI coding race aren’t the ones that adopt the most tools. They’re the ones that measure the right things, govern the risks, and make data-driven decisions about where to invest. Your AI coding tools are generating code. The question is whether they’re generating value.

Want to see how your engineering AI investment is actually performing? Talk to an expert to see Coding IQ in action — vendor-neutral analytics across every AI coding tool your team uses.

March 19, 2026
AI ROI EP. 4: ACT — From Approved Pilot to Enterprise-Wide Impact

A VP of Operations at a $4 billion manufacturer had the data. Three AI pilots had cleared the DECIDE gate with strong cost-to-value ratios. The CFO had approved scaling budgets. The board was expecting results by Q3. Six months later, all three initiatives were still running at pilot scale. One team couldn’t get IT to provision enterprise licenses. Another was waiting for “the right moment” to roll out to the full department. The third had scaled technically but hadn’t changed a single workflow — so the AI was running at production capacity with pilot-level impact.

Everyone was acting on AI. Nobody was acting systematically. And the gap between “approved for scaling” and “delivering enterprise-wide value” was growing wider every quarter.

This is the ACT problem — the fourth and final step in the SEE, MEASURE, DECIDE, ACT framework. You’ve mapped your AI ecosystem (SEE). You’ve connected activity to business outcomes (MEASURE). You’ve run structured pilots that produce scaling decisions (DECIDE). Now comes the hardest part: turning those decisions into enterprise-wide results that show up on the P&L.

The data says most organizations fail here. PwC’s 2025 Global CEO Survey found that nearly half of CEOs see no meaningful return from their generative AI investments. Not low returns — none. Meanwhile, Gartner projects worldwide AI spending will reach $644 billion in 2025 and continue accelerating. The money is flowing. The returns aren’t. And the difference between the enterprises that scale AI successfully and those that don’t isn’t better technology — it’s better execution frameworks for going from “this pilot works” to “this is how we operate.”

Why Scaling Is Harder Than Piloting

The pilot-to-production gap is where most AI investments die. S&P Global found that enterprises scrapped 46% of AI pilots before reaching production in 2025, and Bain reported that only 27% of companies successfully moved generative AI from testing to real-world implementation. But even among those that do scale, a separate challenge emerges: scaling the technology without scaling the impact.

This happens because organizations treat scaling as a deployment problem — more licenses, more compute, more users. But deployment without transformation just gives you a bigger pilot. The AI is running at scale. The workflows haven’t changed. The organizational structures haven’t adapted. And the business outcomes remain stubbornly similar to what you saw with 50 users, even though you now have 5,000.

Deloitte’s 2026 State of AI survey captured this precisely: while 74% of organizations want AI to drive revenue growth, only about one in five have redesigned workflows around AI capabilities. McKinsey’s data reinforces the point — AI high performers are 2.8 times more likely to redesign workflows than other organizations. Dropping an AI tool into an existing process and hoping for different outcomes isn’t a scaling strategy. It’s wishful thinking at enterprise cost.

The ACT step addresses this with three frameworks that take organizations from “approved pilot” to “operating at scale”: the CFO Conversation, the Cloning Playbook, and the Operating Rhythm.

Framework 1: The CFO Conversation

Every scaling decision eventually becomes a budget conversation. And budget conversations require a language that most AI teams don’t speak fluently: operational economics.

The CFO doesn’t want to hear that the AI agent “saves time.” She wants to know four things, in this order:

What’s the operational cost structure? Total cost of ownership at scale: licensing, compute, integration, support, training, and the ongoing cost of maintaining the system. Not the pilot cost extrapolated — the actual production cost model, including volume discounts, infrastructure scaling curves, and the hidden costs that only appear at scale (data quality maintenance, model drift monitoring, edge case handling).

What’s the counterfactual? What would the organization spend doing this work without AI? This isn’t a theoretical exercise. It’s a concrete comparison: headcount cost, error rates, cycle time, and customer impact in the current state versus the AI-augmented state. The counterfactual is what makes AI ROI defensible. Without it, every efficiency claim is an assertion. With it, it’s arithmetic.

What’s the scaling math? If the pilot showed a 3:1 return with 50 users, what does the model look like with 5,000? Scaling math isn’t linear. Some costs decrease at scale (per-unit licensing), while others increase (integration complexity, change management, support volume). The CFO wants to see the curve, not just the current point. And she wants to see sensitivity analysis — what happens to the return if adoption is 60% instead of 90%, or if the efficiency gain is 25% instead of the 40% the pilot showed.

What are the 90-day gates? Enterprise CFOs don’t write blank checks. They fund in stages, with checkpoints tied to measurable outcomes. A 90-day gate structure might look like: month one, deploy to the first full department and validate that pilot-level performance holds at 10x scale; month two, measure the workflow redesign impact and compare against the counterfactual; month three, present the production economics to the executive committee with a recommendation for the next stage of expansion. Each gate has a defined KPI, a target, and a decision: continue, adjust, or stop.

The enterprises that get CFO buy-in for scaling don’t present dashboards. They present business cases with operational economics, counterfactuals, scaling curves, and stage gates. Building this financial frame before asking for scaling budget is the single most effective way to accelerate AI investment.

Framework 2: The Cloning Playbook

Once the first AI initiative scales successfully, the question becomes: how do you replicate that success across the organization? This is where most enterprises lose momentum. Each new AI project starts from scratch — new vendors, new integrations, new measurement frameworks, new governance reviews. The result is that scaling the second initiative takes almost as long as scaling the first.

The Cloning Playbook treats your first successful AI deployment as a template. It identifies the five elements that made it work — what we call the success DNA — and systematically replicates them in adjacent use cases.

The business case structure. Not just “we saved money” but the specific format: counterfactual baseline, measured outcome, cost-to-value ratio, risk profile. When the first deployment proved value using this structure, don’t reinvent the wheel for deployment two. Use the same template. The CFO already trusts it.

The measurement infrastructure. The hardest part of proving AI ROI is building the instrumentation that connects AI activity to business outcomes. If you built that infrastructure for customer service AI, most of it translates to sales AI or operations AI with minor modifications. The data pipelines, the KPI frameworks, the reporting cadences — these are organizational assets, not project artifacts.

The governance framework. Your governance approach — data classification, security review, compliance validation, risk assessment — was designed and tested during the first deployment. Applying the same framework to deployment two eliminates months of security and legal review. The governance team already knows what “good” looks like.

The change management pattern. How did you train users? How did you redesign workflows? How did you handle resistance? What worked and what didn’t? The human side of AI deployment is where most organizations lose the most time. Cloning the change management playbook that worked — right down to the communication cadence and the training format — compresses rollout timelines dramatically.

The executive sponsorship model. Who championed the first deployment? What organizational authority did they need? How did they maintain momentum through obstacles? The sponsorship structure that works for one AI initiative typically works for others, because the organizational dynamics are the same: competing priorities, resource constraints, and stakeholder skepticism that only yields to demonstrated results.

The math is compelling. Organizations that clone their success DNA from first deployment to second see 70-80% reduction in time-to-value compared to starting from scratch. The first initiative might take nine months to prove ROI. The second takes two to three months, because the infrastructure, governance, measurement, and organizational muscle are already built. By the third and fourth, you’re operating with a repeatable scaling engine.

The key is identifying adjacent workflows — use cases that share enough similarity with your proven deployment that the success DNA transfers cleanly. If your customer service AI succeeded, the adjacent workflows might be internal helpdesk, partner support, or onboarding. If your sales AI proved value, adjacent workflows might be account management, renewals, or lead qualification. Start with the 70-80% that transfers directly and customize only the 20-30% that’s unique to the new context.

Framework 3: The Operating Rhythm

Scaling AI isn’t a project. It’s an operating discipline. The enterprises that sustain AI value over time build measurement and governance into their regular business cadence rather than treating it as a separate workstream.

The Operating Rhythm runs on three cycles:

Monthly: Performance Review. Every AI initiative that has passed the DECIDE gate gets reviewed monthly against its defined business KPIs. Not technical metrics — business outcomes. Revenue influenced, costs avoided, risk events prevented, cycle time reduced. This is the same review cadence your organization already uses for other operational metrics. AI just gets added to the agenda. The monthly review catches performance degradation early, identifies optimization opportunities, and keeps executive attention on AI value rather than AI activity. If an initiative’s KPIs are declining, the monthly review triggers investigation before the quarterly review.

Quarterly: Portfolio Assessment. Every quarter, the AI portfolio gets assessed as a whole. Which initiatives are exceeding their ROI targets? Which are underperforming? Where should the next investment go? This is where the portfolio view that CFOs want becomes actionable. The quarterly assessment looks across all AI investments and asks: given what we now know about performance, risk, and cost, is our portfolio allocation optimal? Should we shift resources from an underperforming initiative to one showing stronger returns? Should we expand a successful deployment to new business units or geographies?

Annual: Strategic Reset. Once a year, step back from operational metrics and assess the AI strategy against the business strategy. Are the use cases you’re scaling still aligned with where the business is heading? Has the competitive landscape changed in ways that require new AI capabilities? Are there emerging technologies — new model architectures, new vendor offerings, new integration patterns — that create opportunities your current portfolio doesn’t capture? The annual reset prevents the common trap of optimizing last year’s AI strategy while the business has moved on to new priorities.

The Operating Rhythm does something that ad hoc AI management cannot: it creates organizational accountability. When AI performance is reviewed monthly alongside other business metrics, it signals that AI is a business function, not an experiment. When portfolio allocation is assessed quarterly, it prevents the resource fragmentation that kills scaling momentum. And when strategy is reset annually, it keeps AI investment aligned with business direction.

The Convergence of Measurement and Governance

Here’s what becomes clear at the ACT stage: measurement and governance aren’t separate disciplines. They’re two faces of the same capability.

The enterprises with the strongest AI ROI data are also the ones with the most rigorous governance frameworks. Not because governance is a compliance exercise, but because governance forces the discipline that measurement requires. Defining what AI is allowed to do means defining what it should be doing. Instrumenting how AI performs for compliance also instruments how it performs for ROI. Maintaining audit trails for regulators also maintains the data trails that prove business value.

This convergence is Olakai’s thesis: that unified visibility across measurement and governance enables enterprises to scale AI with confidence rather than scaling AI and hoping for the best. When you can see every AI system, measure its business impact, govern its risk profile, and control its costs from a single platform, the ACT step becomes dramatically simpler. You’re not stitching together data from five different tools to answer a board question. You’re looking at one dashboard that shows value, risk, and cost together.

The SEE, MEASURE, DECIDE, ACT playbook isn’t just a methodology. It’s an operating system for enterprise AI. And the ACT step is where that operating system proves its worth — not in a pilot, not in a board presentation, but in sustained, measurable business outcomes that compound quarter over quarter.

Start Acting With Data

The 74% of enterprises that want AI revenue growth but can’t prove it share a common failure mode: they act without the infrastructure to know whether their actions are working. They scale without counterfactuals. They expand without cloning success patterns. They operate without cadences that catch problems before they become write-offs.

The 20% who prove AI ROI do something different. They build the CFO conversation before they ask for scaling budget. They clone their success DNA rather than reinventing each deployment. And they embed AI measurement into their monthly, quarterly, and annual operating rhythms so that AI value isn’t a one-time proof point — it’s a continuous, visible, defensible track record.

That’s the ACT framework. And it’s the final step that turns AI from an investment line item into a measurable operating advantage.

Ready to scale your AI investments with confidence? Talk to an expert and we’ll show you how Olakai’s measurement and governance platform turns the SEE, MEASURE, DECIDE, ACT playbook into an operating system for enterprise AI.

March 5, 2026
What Is AI Analytics? The Definitive Enterprise Guide
Gartner predicts that 40% of agentic AI projects will be canceled by 2027 due to unclear business value. BCG’s 2025 AI Radar survey of 1,803 C-suite executives found that only 25% of companies report realizing significant value from their AI investments. Thomson Reuters reported in 2026 that just 18% of organizations formally track AI ROI.

These are not isolated findings. They describe a structural gap in how enterprises manage AI: the gap between deploying AI and actually measuring whether it works. AI analytics is the discipline that closes that gap.

The measurement gap: most enterprises invest in AI but cannot prove it works.

What Is AI Analytics?

AI analytics is the practice of measuring the usage, performance, cost, and business impact of artificial intelligence tools across an enterprise. It answers the questions that every CIO, CFO, and board member is now asking: What AI are we using? How much is it costing us? And what are we getting back?

Traditional business intelligence measures the outputs of human processes. AI analytics measures the outputs of AI-augmented and AI-automated processes. This includes everything from how often employees use a chatbot like ChatGPT or Copilot, to the success rate and cost-per-execution of autonomous agents running multi-step workflows in production.

The distinction matters because AI adoption has outpaced AI measurement by years. Most enterprises now have dozens of AI tools in active use, each with its own vendor dashboard or no analytics at all. AI analytics provides a unified, vendor-neutral view across all of them.

Why AI Analytics Matters Now

The urgency is driven by three converging forces.

The ROI reckoning. Deloitte’s State of AI 2026 survey of 3,235 business and IT leaders found that 74% of organizations want AI to grow revenue, but only 20% have actually seen it happen. PwC’s 2026 Global CEO Survey found that 56% of CEOs report no revenue increase from AI. Boards are no longer willing to fund AI programs on faith. They want numbers. AI analytics provides those numbers.

The agentic AI wave. Deloitte projects that agentic AI usage will surge from 23% to 74% of enterprises within two years. Unlike chatbots that wait for human prompts, agentic AI takes autonomous actions: executing workflows, calling APIs, making decisions. An ungoverned chatbot gives a bad answer. An ungoverned agent executes a bad decision at scale. Measuring agent performance is not optional. It is the difference between a controlled deployment and an operational risk.

The shadow AI problem. Employees are adopting AI tools faster than IT can track them. Shadow AI creates blind spots in security, compliance, and cost management. AI analytics starts with visibility: discovering what AI is actually being used, by whom, and for what purpose.

The Four Pillars of AI Analytics

A complete AI analytics practice spans four areas. Each one addresses a different question that enterprise leaders need answered.

The four pillars of a complete AI analytics practice.

1. Usage and Adoption Analytics

This is the foundation: understanding what AI tools are in use across the organization and how deeply they are being adopted. Usage analytics answers questions like: How many employees actively use ChatGPT? Which teams have adopted Copilot? What percentage of licensed AI tools are actually being used?

Without usage data, enterprises operate blind. They cannot optimize license spend because they do not know which tools are underutilized. They cannot identify shadow AI because they do not have a baseline of sanctioned usage to compare against. According to Deloitte, workforce access to sanctioned AI tools expanded from under 40% to roughly 60% of employees in a single year. That growth rate makes continuous usage tracking essential.

2. Performance and Quality Analytics

Beyond knowing that AI is being used, enterprises need to know whether it is performing well. Performance analytics measures the quality and reliability of AI outputs across tools and use cases.

For assistive AI (chatbots and copilots), this includes response accuracy, user satisfaction, and task completion rates. For agentic AI, it includes execution success rates, failure analysis, and decision quality. A custom agent that processes insurance claims might have a 94% success rate, but the 6% failure rate could represent millions in incorrectly handled claims. Performance analytics surfaces these patterns before they become problems.

3. Cost and ROI Analytics

This is where AI analytics becomes strategic. Cost analytics tracks the total cost of AI operations: API calls, compute, licensing, and human oversight time. ROI analytics ties those costs to business outcomes: revenue influenced, time saved, cost avoided, error reduction.

BCG found that 60% of enterprises do not track financial KPIs for their AI programs. This means the majority of organizations cannot answer the most basic question their CFO will ask: Is our AI investment paying off? AI ROI measurement is the capability that separates enterprises scaling AI from those stuck in pilot purgatory.

The math is straightforward but requires instrumentation. If a customer service AI handles 10,000 tickets per month at $0.12 per interaction and replaces a process that previously cost $8.50 per ticket with human agents, the monthly savings are $83,800. Without AI analytics, that number is an estimate. With it, that number is auditable and provable to a board.

4. Risk and Governance Analytics

The fourth pillar connects analytics to governance. Risk analytics monitors AI usage for policy violations, data exposure, bias indicators, and compliance gaps. It answers questions like: Are employees sharing sensitive data with AI tools? Are autonomous agents operating within defined guardrails? Are AI outputs meeting regulatory requirements?

This pillar is increasingly non-negotiable. The EU AI Act mandates risk-based oversight. The NIST AI Risk Management Framework provides voluntary guidance that is rapidly becoming the de facto standard in the United States. Companies in regulated industries such as financial services, healthcare, and government cannot scale AI without demonstrating continuous risk monitoring.

AI Analytics vs. Traditional Observability

Engineering teams are familiar with observability tools like Datadog, New Relic, and Splunk. These tools monitor infrastructure: server uptime, latency, error rates, and throughput. They are necessary but insufficient for AI programs.

AI analytics differs from traditional observability in three fundamental ways.

It measures business outcomes, not just technical metrics. Datadog can tell you that an API call to GPT-4 took 1.2 seconds. AI analytics tells you that the same call saved a sales rep 14 minutes of research and contributed to a deal worth $240,000. The audience is the CIO and CFO, not only the engineering team.

It spans tools and vendors. Each AI vendor provides metrics for its own tool. Microsoft shows Copilot usage. OpenAI shows ChatGPT usage. Salesforce shows Einstein usage. But no vendor will ever show you the cross-vendor picture, because that is not in their interest. AI analytics provides vendor-neutral visibility across the entire AI ecosystem.

It connects usage to governance. Traditional observability does not care whether an employee pasted customer PII into a chatbot. AI analytics does. The integration of usage data, risk signals, and governance policy into a single platform is what makes AI analytics a strategic capability rather than just another dashboard.

What to Measure: Key AI Analytics Metrics

The specific metrics that matter depend on the type of AI being measured and the audience consuming the data. Here is a framework organized by stakeholder.

For the CIO and Board
- AI ROI by business unit: Revenue influenced, cost saved, and time recovered, broken down by department or function
- Adoption rate: Percentage of employees actively using AI tools, tracked over time
- AI maturity score: A composite metric reflecting how effectively the organization uses AI across adoption, measurement, and governance
- Risk posture: Number and severity of policy violations, shadow AI instances, and compliance gaps
For the CFO
- Total cost of AI: All-in spend across licensing, API usage, compute, and personnel
- Cost per AI interaction: What each chatbot conversation, agent execution, or copilot suggestion costs
- License utilization: Percentage of paid AI licenses that are actively used. Low utilization signals wasted spend.
- ROI by AI initiative: For each major AI program, what is the measurable return relative to the investment?
For the CISO
- Shadow AI inventory: Unauthorized AI tools in use, how many users, what data they access
- Data exposure incidents: Instances of sensitive data shared with AI tools
- Policy compliance rate: Percentage of AI interactions that comply with content and data policies
- Agent guardrail adherence: For autonomous agents, how often do they operate within defined boundaries?
For Engineering and AI Teams
- Agent success rate: Percentage of agent executions that complete successfully
- Latency and throughput: Response times and processing capacity
- Error classification: Types and frequency of AI failures, broken down by cause
- Model comparison: Performance and cost differences across AI models and vendors for the same task
How to Build an AI Analytics Practice

Organizations typically progress through four stages when building an AI analytics capability. Understanding where you are today helps determine the right next step.

Building AI analytics capability: from visibility to governance at scale.

Stage 1: Visibility

The first step is simply knowing what AI is in use. Most enterprises are surprised by the results of an AI visibility audit. Shadow AI is nearly universal: employees are using AI tools that IT has not sanctioned, often with company data. Stage 1 focuses on discovery and inventory: building a complete picture of the AI tools, users, and data flows across the organization.

Stage 2: Measurement

Once you have visibility, you can start measuring. This means defining the metrics that matter for each AI initiative and instrumenting systems to capture them. The key shift at this stage is moving from vanity metrics (number of prompts, number of users) to value metrics (time saved, revenue influenced, cost avoided). Olakai’s SEE, MEASURE, DECIDE, ACT framework provides a structured approach to this transition.

Stage 3: Optimization

With measurement in place, enterprises can make data-driven decisions about their AI programs. Which tools deliver the highest ROI? Which pilots should scale to production? Which agents should be retired? Structured pilot programs with clear success criteria replace the ad hoc experimentation that traps most organizations in pilot purgatory. Optimization also includes cost management: identifying redundant tools, right-sizing API usage, and negotiating vendor contracts with actual usage data.

Stage 4: Governance at Scale

The final stage integrates analytics with governance. As AI programs grow from a handful of pilots to hundreds of production deployments, the analytics framework must support policy enforcement, compliance reporting, and risk management at scale. This is where organizations move from reactive oversight (responding to incidents) to proactive governance (preventing them). Analytics provides the continuous monitoring that makes proactive governance possible.

The Vendor-Neutral Imperative

One of the most common mistakes enterprises make is relying on AI vendors to provide their own analytics. Microsoft offers Copilot usage dashboards. OpenAI offers a usage portal for ChatGPT Enterprise. Salesforce shows Einstein adoption metrics. Each provides useful data about its own tool. None will ever provide the cross-vendor picture.

This is not a criticism of those vendors. It is a structural limitation. Microsoft has no incentive to show you that a competitor’s tool outperforms Copilot for a given use case. OpenAI has no incentive to help you discover that your team stopped using ChatGPT and switched to Claude. The only way to get an honest, complete picture of AI performance across your organization is through a vendor-neutral analytics platform that sits above individual tools.

Olakai was built specifically for this purpose. The platform provides unified visibility across chatbots, copilots, agents, and AI-enabled SaaS, with custom KPIs tied to business outcomes rather than vendor-specific metrics.

Frequently Asked Questions

What is the difference between AI analytics and AI observability?

AI observability focuses on the technical performance of AI systems: latency, error rates, model accuracy, and infrastructure health. AI analytics extends beyond technical metrics to include business outcomes, ROI measurement, cost analysis, and governance. Observability tells you whether the system is running. Analytics tells you whether it is delivering value.

How do you measure AI ROI?

AI ROI is measured by comparing the total cost of an AI initiative (licensing, compute, API calls, implementation, and human oversight) against the measurable business value it creates (time saved, revenue influenced, cost avoided, error reduction). The key is instrumenting AI systems to capture both sides of this equation continuously, not just during quarterly reviews. Olakai’s AI ROI measurement capability automates this process across all AI tools.

What is shadow AI and why does it matter for analytics?

Shadow AI refers to AI tools used by employees without IT approval or oversight. It matters for analytics because you cannot measure what you cannot see. If 30% of your AI usage is happening in unsanctioned tools, your analytics are incomplete, your cost estimates are wrong, and your security posture has blind spots. Shadow AI detection is typically the first step in building an AI analytics practice.

Do you need a dedicated platform for AI analytics?

For organizations with one or two AI tools, vendor-provided dashboards may suffice. For enterprises using multiple AI tools across multiple teams, vendor dashboards create fragmented, siloed views. A dedicated AI analytics platform provides the unified, vendor-neutral perspective needed to make strategic decisions about the AI program as a whole, not just individual tools in isolation.

What industries benefit most from AI analytics?

Every industry deploying AI at scale benefits from analytics, but the urgency is highest in regulated industries. Financial services, healthcare, and government face regulatory requirements that demand continuous monitoring and audit-ready evidence. Technology companies benefit from the ROI optimization angle: understanding which AI investments deliver the highest return.

Key Takeaways
- AI analytics is the practice of measuring AI usage, performance, cost, and business impact across an enterprise
- Only 25% of companies report significant value from AI (BCG), and only 18% formally track AI ROI (Thomson Reuters). The measurement gap is the primary barrier to scaling AI programs.
- The four pillars are usage analytics, performance analytics, cost and ROI analytics, and risk and governance analytics
- AI analytics differs from traditional observability by measuring business outcomes, spanning vendors, and integrating governance
- Vendor-neutral analytics is essential because no AI vendor will provide an honest cross-vendor picture
- Building an AI analytics practice follows four stages: visibility, measurement, optimization, and governance at scale
Talk to an expert to see how Olakai provides vendor-neutral AI analytics across your entire AI ecosystem.
March 2, 2026
The 30-Day AI Pilot That Actually Proves Value

Seventeen active AI pilots. $2.3 million in annual spend. Zero measurable business outcomes. That was the state of AI at a mid-market professional services firm when their CFO finally asked the question everyone had been avoiding: “Which of these should we actually scale?”

Nobody could answer. Not because the pilots weren’t working — several were. But none had been designed to produce the data needed to make a scaling decision. They were experiments without exit criteria, running indefinitely on the premise that “we’ll figure out ROI later.” Later never came.

This is pilot purgatory — and MIT’s 2025 State of AI research found that 95% of enterprise AI pilots deliver zero measurable financial return. Not low returns. Zero. That’s roughly $30-40 billion in destroyed shareholder value from AI pilots running worldwide without the measurement infrastructure to prove they’re worth continuing.

The Pilot Purgatory Problem

The data on AI pilot failure is stark. S&P Global Market Intelligence found that the average enterprise scrapped 46% of AI pilots before they ever reached production in 2025. Bain’s executive survey reported that only 27% of companies successfully moved generative AI from testing to real-world implementation. And McKinsey’s State of AI report found that nearly two-thirds of organizations remain stuck in pilot phase, unable to scale projects across the enterprise despite significant adoption.

The financial toll is substantial. Industry analysis estimates that pilot purgatory costs the average enterprise $15-25 million annually in wasted development resources, infrastructure spending, and opportunity costs. Individual pilot failures run $500,000 to $2 million each. And the cost grows every month a pilot runs without producing decision-quality data, because the organization continues investing without the information needed to decide whether that investment is justified.

The root cause isn’t technical. Most AI pilots work from a technical standpoint — the models perform, the integrations function, the users adopt the tools. The root cause is that pilots are designed to test technology, not prove business value. They answer “can this AI tool do the thing?” when the question the organization needs answered is “should we invest more in this AI tool?”

Why 30 Days Is the Right Timeframe

Enterprise best practice points to a 30-to-45-day window as the optimal pilot duration. Short enough to maintain executive attention and organizational momentum. Long enough to generate statistically meaningful data on business outcomes.

Shorter pilots (under three weeks) don’t capture enough data to distinguish signal from noise, especially for use cases where business outcomes lag behind AI activity — like lead qualification, where the revenue impact shows up when leads close, not when they’re scored. Longer pilots (three to four months) generate more data but introduce a different risk: losing stakeholder attention. By month three, the executive sponsor has moved on, the team working on the pilot has been pulled to other priorities, and the pilot drifts into that twilight zone where it’s too expensive to kill and too poorly measured to champion.

The 30-day pilot isn’t about speed for its own sake. It’s about creating a forcing function — a defined moment where the organization must decide: scale, fix, or kill. That decision point is what separates pilots that generate value from pilots that generate costs.

Pre-Pilot: Setting Up for a Decision

The 30-day clock doesn’t start when the AI tool gets deployed. It starts when the measurement infrastructure is in place. Before the pilot begins, four things must be defined:

The business outcome KPI. Not “accuracy” or “adoption” — the business outcome that this AI initiative should change. Revenue influenced, costs reduced, time recovered, errors prevented. This is the metric that will appear in the scaling decision. If you can’t name it before the pilot starts, you’re not ready for the pilot. Our AI ROI framework provides a methodology for identifying the right success KPI by use case.

The baseline. What is the current performance on that KPI without AI? If the AI agent is supposed to reduce customer support resolution time, what’s the current average? If it’s supposed to improve lead conversion, what’s the current conversion rate? Without a baseline, there is no counterfactual, and without a counterfactual, there’s no way to attribute improvement to AI versus other factors.

The success threshold. How much improvement constitutes a “scale” decision? What range triggers a “fix” decision? What level triggers a “kill” decision? These thresholds must be agreed upon before the data comes in. Post-hoc threshold setting is subject to confirmation bias — teams will unconsciously set the bar wherever the data lands.

The decision authority. Who makes the scale/fix/kill call on day 30? If this isn’t defined upfront, the pilot’s data will be debated indefinitely by stakeholders with competing interests. The decision authority needs to be a single individual (typically the executive sponsor) with the organizational power to allocate or reallocate budget based on the results.

During the Pilot: What to Measure

Once the pilot is running, measurement operates on two tracks.

The outcome track measures the business KPI you defined pre-pilot. This is the number that matters for the scaling decision. Track it weekly so you can see trend direction, but don’t make decisions based on week-one data. Enterprise AI use cases need at least two to three weeks for patterns to stabilize, especially in workflows with downstream dependencies like sales pipeline or compliance review.

The diagnostic track measures operational and technical metrics that help you understand why the outcome KPI is moving (or not). If resolution time is dropping, the diagnostic track tells you whether that’s because the AI is providing better answers, because agents are spending less time searching for information, or because the easiest tickets are being routed to AI first. If the outcome KPI isn’t improving, the diagnostic track tells you where to look: data quality issues, workflow integration problems, user adoption gaps, or a fundamental mismatch between the AI capability and the business need.

McKinsey’s research is clear on the value of this approach: organizations that define and track AI-specific KPIs see nearly two-thirds meet or exceed their targets. The measurement itself doesn’t cause success — the discipline of defining what matters and instrumenting it creates organizational clarity that makes success more likely.

Day 30: The Decision Point

This is where most enterprises fail — not because they lack data, but because they lack a framework for using it. The day-30 decision uses four inputs:

Outcome KPI performance vs. threshold. Did the AI initiative hit the success threshold you defined pre-pilot? If yes, the data supports scaling. If it’s in the “fix” range, the diagnostic data tells you what to change. If it’s below the “kill” threshold, the data supports sunsetting the initiative and reallocating resources. The threshold was set before the data arrived, so this isn’t a subjective judgment. It’s a data-driven decision.

Cost-to-value ratio. What was the total cost of the pilot (tooling, infrastructure, team time, opportunity cost) versus the total value generated? Even at pilot scale, this ratio signals whether scaling will be financially viable. If the cost-to-value ratio is favorable at pilot scale, it typically improves at production scale due to economies.

Governance and risk profile. Can the AI initiative operate within your organization’s risk tolerance at production scale? Data security concerns, compliance requirements, and governance gaps that are manageable at pilot scale can become critical at production scale. If the governance profile isn’t ready for scaling, the decision might be “fix governance first, then scale.”

Operational readiness. Does the organization have the operational capacity to absorb the change at scale? User training, workflow integration, support infrastructure, and change management all need to be assessed. A pilot that works with 50 engaged early adopters may perform differently when deployed to 5,000 users with varying levels of enthusiasm and technical proficiency.

What Successful Enterprises Do Differently

The enterprises that escape pilot purgatory share three characteristics. First, they secure executive sponsorship with decision authority, not just endorsement. Organizations with top-level executive mandate scale AI three times faster and achieve significantly higher revenue impact compared to those stuck at pilot stage.

Second, they instrument measurement from day one, not after the pilot shows promising results. This means defining KPIs, establishing baselines, and deploying tracking before the AI tool goes live — not retrofitting measurement after the fact. Retrofitting measurement costs three to four times more than building it in from the start and produces lower-quality data because the baseline period is missing.

Third, they redesign workflows rather than just deploying tools. McKinsey found that AI high performers are 2.8 times more likely to redesign workflows (55% versus 20%) compared to other organizations. Dropping an AI tool into an existing workflow and measuring whether the workflow speeds up is the lowest-value form of AI measurement. Redesigning the workflow around AI capabilities and measuring the redesigned outcome is where the step-change improvements come from.

Breaking Free

Pilot purgatory isn’t a technology problem. It’s a measurement problem. The AI works. The organization just can’t prove it — because it never built the measurement infrastructure to generate decision-quality data in a defined timeframe.

The 30-day structured pilot is the DECIDE step in the SEE, MEASURE, DECIDE, ACT playbook. (This is the third of four companion deep-dives — see also SEE, MEASURE, and ACT.) It takes the visibility data from SEE and the business metrics from MEASURE and converts them into a concrete decision: scale, fix, or kill. No more indefinite experiments. No more “let’s give it another quarter.” No more pilot purgatory.

The enterprises moving from AI experimentation to business impact are the ones that commit to structured measurement before the pilot starts and structured decisions when the data comes in. The framework isn’t complicated. The discipline is what’s hard. And the cost of avoiding it — $15-25 million per year in wasted pilot investment — far exceeds the cost of getting it right.

Ready to run an AI pilot that actually produces a decision? Talk to an expert and we’ll show you how Olakai instruments AI measurement from day one — so your 30-day pilot generates the data your board needs to say yes.

February 26, 2026
The Enterprise Leader’s Toolkit for Navigating Agentic AI

Last quarter, a CIO at a mid-market financial services firm told me something that stuck: “I have 14 browser tabs open right now—vendor whitepapers, analyst reports, a McKinsey deck from 2024, three Medium posts about agent architectures. None of them agree on anything, and none of them tell me what to actually do on Monday morning.”

He’s not alone. According to McKinsey’s 2025 State of AI survey, 62% of organizations are experimenting with AI agents—but in any given business function, no more than 10% have actually scaled them. The gap between “we’re exploring agentic AI” and “we’re getting value from agentic AI” has become the defining challenge for enterprise leaders this year.

The Practical Resource Gap

The information problem isn’t a lack of content—it’s a lack of useful content. Vendor guides are biased toward their own platforms. Academic research is fascinating but rarely translates to a Monday morning action plan. And the consulting firms that produce genuinely practical frameworks charge $50,000 or more for the privilege of reading them.

Meanwhile, Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Their analysts note that most agentic AI propositions today “lack significant value or return on investment, as current models don’t have the maturity and agency to autonomously achieve complex business goals.” When 40% of projects are headed for cancellation, the difference between success and failure often comes down to whether leaders had the right planning tools before they started.

Enterprise leaders need something in between a sales pitch and an academic paper—practical, vendor-neutral resources that help them evaluate, plan, and govern agentic AI with clear eyes. That’s exactly what we built.

Introducing Future of Agentic

Future of Agentic is a free, comprehensive research site designed for enterprise leaders navigating agentic AI. No gating, no lead forms, no vendor spin. It’s the resource we wished existed when we started building Olakai—and the one we kept hearing customers ask for. Here’s what’s inside.

A KPI Library Built for Business Leaders, Not Data Scientists

One of the most common questions we hear is deceptively simple: “How do I know if my AI agent is actually working?” The interactive KPI library provides 18 metrics across agentic, chatbot, and AI application categories—each with definitions, calculation methods, benchmarks, and guidance on when to use them. These aren’t abstract metrics. They’re the specific measurements that separate organizations scaling AI successfully from those stuck in pilot purgatory. Think agent task completion rate, autonomous resolution percentage, and cost per automated decision—KPIs that connect directly to business outcomes your CFO will understand.

ROI Calculators That Go Beyond Napkin Math

Every enterprise leader considering agentic AI needs to answer two financial questions: What will this actually cost, and what happens when agents stop delivering value? The Agent Economics section includes two interactive calculators. The Agent TCO vs. FTE calculator models the real total cost of ownership—infrastructure, maintenance, monitoring, and iteration—against human equivalents over time. The Zombie Agent Cost calculator tackles a problem most vendors don’t want to discuss: the ongoing expense of agents that are deployed but no longer delivering meaningful results. Both tools produce shareable outputs, so you can bring data-backed projections to budget conversations instead of guesswork.

Hundreds of Enterprise Use Cases, Sorted by What Matters

The use case library catalogs hundreds of enterprise applications of agentic AI, each with architecture context and complexity ratings. What makes this different from a typical “top 10 use cases” listicle is the filtering: sort by department, by implementation complexity, or by business function to find the applications that match your organization’s maturity and priorities. Whether you’re a head of customer success exploring automated escalation workflows or a CISO evaluating security operations agents, the library narrows the field to what’s relevant.

Governance Frameworks for the Enterprise, Not the Lab

The Deloitte State of AI 2026 report found that only 21% of organizations have mature AI governance models in place—even as 38% are actively piloting AI agents. That governance gap is a ticking clock. The governance section on Future of Agentic provides risk assessment frameworks, compliance checklists, and decision-making guides built for enterprise reality. These aren’t theoretical policy templates — they complement our own CISO governance checklist and are structured around the actual decisions leaders face: What level of autonomy should this agent have? What happens when it fails? Who’s accountable? How do we audit it?

An AI Readiness Quiz (30 Seconds to Your Roadmap)

Sometimes the most valuable tool is the simplest. The AI readiness assessment takes about 30 seconds, asks targeted questions about your organization’s current AI maturity, and produces a customized roadmap with recommended next steps. It’s not a lead-gen funnel—it runs entirely in the browser and gives you immediate, actionable output. We’ve seen leaders use it to align executive teams on where they actually stand versus where they think they stand, which often turns out to be a more productive conversation than any strategy offsite.

The Enterprise AI Unlocked Podcast

Research and frameworks are essential, but there’s no substitute for hearing how other leaders are navigating these challenges in practice. Enterprise AI Unlocked features in-depth conversations with enterprise leaders and practitioners—from Fortune 500 AI playbooks to the real economics of voice AI deployments. Six episodes are live, with new conversations publishing regularly. Each episode is enriched with chapters and participant context so you can jump directly to the topics that matter most to you.

Who This Is For

We built Future of Agentic for the people making decisions about AI in their organizations: CIOs evaluating agent architectures, CISOs building governance frameworks, CFOs modeling AI agent ROI, and Heads of AI or Data leading implementation. But it’s equally valuable for the product managers, directors, and team leads who need to build informed business cases and present them upward. Everything on the site is free and ungated—because we believe better-informed leaders make better decisions, regardless of whether they ever become Olakai customers.

Where Olakai Fits

Future of Agentic is the research and planning phase—understanding what’s possible, modeling the economics, and building a governance framework before you deploy. Olakai is the execution and measurement phase—tracking ROI, governing risk, controlling costs, and securing AI usage once agents are live in production. The two are complementary by design: plan with Future of Agentic, then measure and govern with Olakai.

Start Exploring

If your team is navigating agentic AI decisions right now—or preparing to—explore Future of Agentic. Start with the KPI library if you need measurement frameworks, the use case library if you’re evaluating where agents fit, or the readiness quiz if you want a quick pulse on organizational maturity. And when you’re ready to move from planning to production, schedule a demo of Olakai to see how measurement and governance work in practice.

February 25, 2026