Google’s Mueller Calls Markdown-For-Bots Idea ‘A Stupid Idea’ via @sejournal, @MattGSouthern

Some developers have been experimenting with bot-specific Markdown delivery as a way to reduce token usage for AI crawlers.

Google Search Advocate John Mueller pushed back on the idea of serving raw Markdown files to LLM crawlers, raising technical concerns on Reddit and calling the concept “a stupid idea” on Bluesky.

What’s Happening

A developer posted on r/TechSEO, describing plans to use Next.js middleware to detect AI user agents such as GPTBot and ClaudeBot. When those bots hit a page, the middleware intercepts the request and serves a raw Markdown file instead of the full React/HTML payload.

The developer claimed early benchmarks showed a 95% reduction in token usage per page, which they argued should increase the site’s ingestion capacity for retrieval-augmented generation (RAG) bots.

Mueller responded with a series of questions.

“Are you sure they can even recognize MD on a website as anything other than a text file? Can they parse & follow the links? What will happen to your site’s internal linking, header, footer, sidebar, navigation? It’s one thing to give it a MD file manually, it seems very different to serve it a text file when they’re looking for a HTML page.”

On Bluesky, Mueller was more direct. Responding to technical SEO consultant Jono Alderson, who argued that flattening pages into Markdown strips out meaning and structure,

Mueller wrote:

“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”

Alderson argued that collapsing a page into Markdown removes important context and structure, and framed Markdown-fetching as a convenience play rather than a lasting strategy.

Other voices in the Reddit thread echoed the concerns. One commenter questioned whether the effort could limit crawling rather than enhance it. They noted that there’s no evidence that LLMs are trained to favor documents that are less resource-intensive to parse.

The original poster defended the theory, arguing LLMs are better at parsing Markdown than HTML because they’re heavily trained on code repositories. That claim is untested.

Why This Matters

Mueller has been consistent on this. In a previous exchange, he responded to a question from Lily Rayabout creating separate Markdown or JSON pages for LLMs. His position then was the same. He said to focus on clean HTML and structured data rather than building bot-only content copies.

That response followed SE Ranking’s analysis of 300,000 domains, which found no connection between having an llms.txt file and how often a domain gets cited in LLM answers. Additionally, Mueller has compared llms.txt to the keywords meta tag, a format major platforms haven’t documented as something they use for ranking or citations.

So far, public platform documentation hasn’t shown that bot-only formats, such as Markdown versions of pages, improve ranking or citations. Mueller raised the same objections across multiple discussions, and SE Ranking’s data found nothing to suggest otherwise.

Looking Ahead

Until an AI platform publishes a spec requesting Markdown versions of web pages, the best practice remains as it is. Keep HTML clean, reduce unnecessary JavaScript that blocks content parsing, and use structured data where platforms have documented schemas.

WordPress Announces AI Agent Skill For Speeding Up Development via @sejournal, @martinibuster

WordPress announced wp-playground, a new AI agent skill designed to be used with the Playground CLI so AI agents can run WordPress for testing and check their work as they write code. The skill helps agents test code quickly while they work.

Playground CLI

Playground is a WordPress sandbox that enables users to run a full WordPress site without setting it all up on a traditional server. It is used for testing plugins, creating and adjusting themes, and experimenting safely without affecting a live site.

The new AI agent skill is for use with Playground CLI, which runs locally and requires knowledge of terminal commands, Node.js, and npm to manage local WordPress environments.

The wp-playground skill starts WordPress automatically and determines where generated code should exist inside the installation. The skill then mounts the code into the correct directory, which allows the agent to move directly from generated code to a running the WordPress site without manual setup.

Once WordPress is running, the agent can test behavior and verify results using common tools. In testing, agents interacted with WordPress through tools like curl and Playwright, checked outcomes, applied fixes, and then re-tested using the same environment. This process creates a repeatable loop where the agent can confirm whether a change works before making further changes.

The skill also includes helper scripts that manage startup and shutdown. These scripts reduce the time it takes for WordPress to become ready for testing from about a minute to only a few seconds. The Playground CLI can also log into WP-Admin automatically, which removes another manual step during testing.

The creator of the AI agent skill, Brandon Payton, is quoted explaining how it works:

“AI agents work better when they have a clear feedback loop. That’s why I made the wp-playground skill. It gives agents an easy way to test WordPress code and makes building and experimenting with WordPress a lot more accessible.”

The WordPress AI agent skill release also introduces a new GitHub repository dedicated to hosting WordPress agent skill. Planned ideas include persistent Playground sites tied to a project directory, running commands against existing Playground instances, and Blueprint generation.

Featured Image by Shutterstock/Here

AI Recommendations Change With Nearly Every Query: Sparktoro via @sejournal, @MattGSouthern

AI tools produce different brand recommendation lists nearly every time they answer the same question, according to a new report from SparkToro.

The data showed a <1-in-100 chance that ChatGPT or Google>

Rand Fishkin, SparkToro co-founder, conducted the research with Patrick O’Donnell from Gumshoe.ai, an AI tracking startup. The team ran 2,961 prompts across ChatGPT, Claude, and Google Search AI Overviews (with AI Mode used when Overviews didn’t appear) using hundreds of volunteers over November and December.

What The Data Found

The authors tested 12 prompts requesting brand recommendations across categories, including chef’s knives, headphones, cancer care hospitals, digital marketing consultants, and science fiction novels.

Each prompt was run 60-100 times per platform. Nearly every response was unique in three ways: the list of brands presented, the order of recommendations, and the number of items returned.

Fishkin summarized the core finding:

“If you ask an AI tool for brand/product recommendations a hundred times nearly every response will be unique.”

Claude showed slightly higher consistency in producing the same list twice, but was less likely to produce the same ordering. None of the platforms came close to the authors’ definition of reliable repeatability.

The Prompt Variability Problem

The authors also examined how real users write prompts. When 142 participants were asked to write their own prompts about headphones for a traveling family member, almost no two prompts looked similar.

The semantic similarity score across those human-written prompts was 0.081. Fishkin compared the relationship to:

“Kung Pao Chicken and Peanut Butter.”

The prompts shared a core intent but little else.

Despite the prompt diversity, the AI tools returned brands from a relatively consistent consideration set. Bose, Sony, Sennheiser, and Apple appeared in 55-77% of the 994 responses to those varied headphone prompts.

What This Means For AI Visibility Tracking

The findings question the value of “AI ranking position” as a metric. Fishkin wrote: “any tool that gives a ‘ranking position in AI’ is full of baloney.”

However, the data suggests that how often a brand appears across many runs of similar prompts is more consistent. In tight categories like cloud computing providers, top brands appeared in most responses. In broader categories like science fiction novels, the results were more scattered.

This aligns with other reports we’ve covered. In December, Ahrefs published data showing that Google’s AI Mode and AI Overviews cite different sources 87% of the time for the same query. That report focused on a different question: the same platform but with different features. This SparkToro data examines the same platform and prompt, but with different runs.

The pattern across these studies points in the same direction. AI recommendations appear to vary at every level, whether you’re comparing across platforms, across features within a platform, or across repeated queries to the same feature.

Methodology Notes

The research was conducted in partnership with Gumshoe.ai, which sells AI tracking tools. Fishkin disclosed this and noted that his starting hypothesis was that AI tracking would prove “pointless.”

The team published the full methodology and raw data on a public mini-site. Survey respondents used their normal AI tool settings without standardization, which the authors said was intentional to capture real-world variation.

The report is not peer-reviewed academic research. Fishkin acknowledged methodological limitations and called for larger-scale follow-up work.

Looking Ahead

The authors left open questions about how many prompt runs are needed to obtain reliable visibility data and whether API calls yield the same variation as manual prompts.

When assessing AI tracking tools, the findings suggest you should ask providers to demonstrate their methodology. Fishkin wrote:

“Before you spend a dime tracking AI visibility, make sure your provider answers the questions we’ve surfaced here and shows their math.”


Featured Image: NOMONARTS/Shutterstock

Chrome Updated With 3 AI Features Including Nano Banana via @sejournal, @martinibuster

Gemini in Chrome has just been refreshed with three new features that integrate more Gemini capabilities within Chrome for Windows, MacOS, and Chromebook Plus. The update adds an AI side panel, agentic AI Auto Browse, and Nano Banana image editing of whatever image is in the browser window.

AI Side Panel For Multitasking

Chrome adds a new side panel that enables users to slide open a side panel to open up a session with Gemini without having to jump around across browser tabs. The feature is described as a way to save time by making it easier to multitask.

Google explains:

“Our testers have been using it for all sorts of things: comparing options across too-many-tabs, summarizing product reviews across different sites, and helping find time for events in even the most chaotic of calendars.”

Opt-In Requirement For AI Chat

Before enabling the side panel AI chat feature, a user must first consent to sending their URLs and browser data back to Google.

Screenshot Of Opt-In Form

Nano Banana In Chrome

Using the AI side panel, users can tell it to update and change an image in the browser window without having to do any copying, downloading, or uploading. Nano banana will change it right there in the open browser window.

Chrome Autobrowse (Agentic AI)

This feature is for subscribers of Google’s AI Pro and Ultra tiers. Autobrowse enables an agentic AI to take action on behalf of the user. It’s described as being able to researching hotel and flights and doing cost comparisons across a given range of dates, obtaining quotes for work, and checking if bills are paid.

Autobrowse is multimodal which means that it can identify items in a photo then go out and find where they can be purchased and add them to a cart, including adding any relevant discount codes. If given permission, the AI agent can also access passwords and log in to online stores and services.

Adds More Features To Existing Ones

Google announced on January 12, 2026 that Chrome’s AI was upgraded with app connections, able to connect to Calendar, Gmail,Google Shopping, Google Flights, Maps, and YouTube. This is part of Google’s Personal Intelligence initiative, which it said is Google’s first step toward a more personalized AI assistant.

Personalization And User Intent Extraction For AI Chat And Agents

On a related note, Google recently published a research paper that shows how an on-device and in-browser AI can extract a user’s intent so as to provide better personalized and proactive responses, pointing to how on-device AI may be used in the near future. Read Google’s New User Intent Extraction Method.

Featured Image by Shutterstock/f11photo

Google May Let Sites Opt Out Of AI Search Features via @sejournal, @MattGSouthern

Google says it’s exploring updates that could let websites opt out of AI-powered search features specifically.

The blog post came the same day the UK’s Competition and Markets Authority opened a consultation on potential new requirements for Google Search, including controls for websites to manage their content in Search AI features.

Ron Eden, Principal, Product Management at Google, wrote:

“Building on this framework, and working with the web ecosystem, we’re now exploring updates to our controls to let sites specifically opt out of Search generative AI features.”

Google provided no timeline, technical specifications, or firm commitment. The post frames this as exploration, not a product roadmap.

What’s New

Google currently offers several controls for how content appears in Search, but none cleanly separate AI features from traditional results.

Google-Extended lets publishers block their content from training Gemini and Vertex AI models. But Google’s documentation states Google-Extended doesn’t impact inclusion in Google Search and isn’t a ranking signal. It controls AI training, not AI Overviews appearance.

The nosnippet and max-snippet directives do apply to AI Overviews and AI Mode. But they also affect traditional snippets in regular search results. Publishers wanting to limit AI feature exposure currently lose snippet visibility everywhere.

Google’s post acknowledges this gap exists. Eden wrote:

“Any new controls need to avoid breaking Search in a way that leads to a fragmented or confusing experience for people.”

Why This Matters

I wrote in SEJ’s SEO Trends 2026 ebook that people would have more influence on the direction of search than platforms do. Google’s post suggests that dynamic is playing out.

Publishers and regulators have spent the past year pushing back on AI Overviews. The UK’s Independent Publishers Alliance, Foxglove, and Movement for an Open Web filed a complaint with the CMA last July, asking for the ability to opt out of AI summaries without being removed from search entirely. The US Department of Justice and South African Competition Commission have proposed similar measures.

The BuzzStream study we covered earlier this month found 79% of top news publishers block at least one AI training bot, and 71% block retrieval bots that affect AI citations. Publishers are already voting with their robots.txt files.

Google’s post suggests it’s responding to pressure from the ecosystem by exploring controls it previously didn’t offer.

Looking Ahead

Google’s language is cautious. “Exploring” and “working with the web ecosystem” are not product commitments.

The CMA consultation will gather input on potential requirements. Regulatory processes move slowly, but they do produce outcomes. The EU’s Digital Markets Act investigations have already pushed Google to make changes in Europe.

For now, publishers wanting to limit AI feature exposure can use nosnippet or max-snippet directives, but note that these affect traditional snippets as well. Google’s robots meta tag documentation covers the current options.

If Google follows through on specific opt-out controls, the technical implementation will matter. Whether it’s a new robots directive, a Search Console setting, or something else will determine how practical it is for publishers to use.


Featured Image: ANDRANIK HAKOBYAN/Shutterstock

New Yahoo Scout AI Search Delivers The Classic Search Flavor People Miss via @sejournal, @martinibuster

Yahoo has announced Yahoo Scout, a new AI-powered answer engine now available in beta to users in the United States, providing a clean Classic Search experience with the power of personalized AI. The launch also includes the Yahoo Scout Intelligence Platform, which brings AI features across Yahoo’s core products, including Mail, News, Finance, and Sports.

Screenshot Of Yahoo Scout

Yahoo’s Existing Products and User Reach

Yahoo’s announcement states that it operates some of the most popular websites and services in the United States, reaching what they say is 90% of all internet users in the United States (based on Comscore data), through its email, news, finance, and sports properties. The company says that Yahoo Scout builds on the foundation of decades of search behavior and user interaction data.

How Yahoo Scout Generates Answers

Yahoo has partnered with Anthropic to use the Claude model as the primary AI system behind Yahoo Scout. Yahoo’s announcement said it selected Claude for speed, clarity, judgment, and safety, which it described as essential qualities for a consumer-facing answer engine. Yahoo also continues its partnership with Microsoft by using Microsoft Bing’s grounding API, which connects AI-generated answers to information from across the open web. Yahoo said this approach ensures that answers are informed by authoritative sources rather than unsupported text generation.

According to Yahoo, Scout relies on a combination of traditional web search and generative AI to produce answers that are grounded using Microsoft Bing’s grounding API and informed by sources from across the open web.

According to  Yahoo:

“It’s informed by 500 million user profiles, a knowledge graph spanning more than 1 billion entities, and 18 trillion consumer events that occur annually across Yahoo, which allow Yahoo Scout to provide effective and personalized answers and suggested actions.”

Yahoo’s announcement says that this data, its use of Claude, and reliance on Bing for grounding work together to provide responses to answers that are personalized and helpful for researching and making decisions in the “moments that matter” to people.

They explain:

“Yahoo Scout continues Yahoo’s focus on the moments that matter to people’s daily lives, such as understanding upcoming weather patterns before a vacation, getting details about an important game, tracking stock price movements after earnings, comparing products before buying, or fact-checking a news story.”

Where Yahoo Scout Appears Inside Yahoo Products

The Yahoo Scout Intelligence Platform embeds these AI capabilities directly into Yahoo’s existing services.

For example:

  • In Yahoo Mail, Scout supports AI-generated message summaries.
  • In Yahoo Sports, it produces game breakdowns.
  • In Yahoo News, it surfaces key takeaways.
  • In Yahoo Finance, Scout adds interactive tools for analysis that allow readers to explore market news and stock performance context through AI-powered questions.

According to Eric Feng, Senior Vice President and General Manager of Yahoo Research Group:

“Yahoo’s deep knowledge base, 30 years in the making, allows us to deliver guidance that our users can trust and easily understand, and will become even more personalized over the coming months. Yahoo Scout now powers a new generation of intelligence experiences across Yahoo, seamlessly integrated into the products people use every day.”

What Yahoo Says Comes Next

Yahoo said Scout will continue to develop over the coming months. Planned updates include deeper personalization, expanded capabilities within specific verticals, and new formats for search advertising designed to work in generative AI search. The company did not provide a timeline for when the beta period will end or when additional features will move beyond testing.

Yahoo explained:

“Yahoo Scout will continue to evolve in the months ahead, expanding to power new products across Yahoo. In particular, the new answer engine will become more personalized, will add new capabilities focused on deeper experiences within key verticals, and will introduce new, improved opportunities for search advertisers to effectively cross the chasm to generative AI search advertising. “

Yahoo’s Search Experience

Something that’s notable about Yahoo’s AI answer engine experience is how clean and straightforward it is. It’s like a throwback to classic search but with the sophistication of AI answers.

For example, I asked it to give me information on where I can buy an esoteric version of a Levi’s trucker jacket in a specific color (Midnight Harvest) and it presented a clean summary of where to get it, a table with a list of retailers ordered by the lowest prices.

Screenshot Of Yahoo Scout

Notice that there are no product images? It’s just giving me the prices. I don’t know if that’s because they don’t have a product feed but I already know what the jacket looks like in the color I specified so images aren’t really necessary.  This is what I mean when I say that Yahoo Scout offers that Classic Search flavor without the busy overly fussy search experience that Google has been providing lately.

With Yahoo Scout, the company is applying AI systems to tasks its users perform when they search for, read, or compare information online. Rather than positioning AI as a replacement for search or content platforms, Yahoo is using it as a tool that organizes, summarizes, and explains information in a clean and easy to read format.

Yahoo Scout is easy to like because it delivers the clean and uncluttered search experience that many people miss.

Check out Yahoo Scout at scout.yahoo.com

The Yahoo Scout app is available for Android and Apple devices.

Google AI Overviews Now Powered By Gemini 3 via @sejournal, @MattGSouthern

Google is making Gemini 3 the default model for AI Overviews in markets where the feature is available and adding a direct path into AI Mode conversations.

The updates, shared in a Google blog post, bring Gemini 3’s reasoning capabilities to AI Overviews. Google says the feature now reaches over one billion users.

What’s New

Gemini 3 For AI Overviews

The Gemini 3 upgrade brings the same reasoning capabilities to AI Overviews that previously powered AI Mode.

Robby Stein, VP of Product for Google Search, wrote:

“We’re rolling out Gemini 3 as the default model for AI Overviews globally, so even more people will be able to access best-in-class AI responses, directly in the results page for questions where it’s helpful.”

Gemini 3 launched in November, and Google shipped it to AI Mode on release day. This expands Gemini 3 from AI Mode into AI Overviews as the default.

AI Overview To AI Mode Transition

You can now ask a follow-up question right from an AI Overview and continue into AI Mode. The context from the original response carries into the conversation, so you don’t start over.

Stein described the thinking behind the change:

“People come to Search for an incredibly wide range of questions – sometimes to find information quickly, like a sports score or the weather, where a simple result is all you need. But for complex questions or tasks where you need to explore a topic deeply, you should be able to seamlessly tap into a powerful conversational AI experience.”

He called the result “one fluid experience with prominent links to continue exploring.”

An earlier test of this flow ran globally on mobile back in December.

In testing, Google found people prefer this kind of natural flow into conversation. The company also found that keeping AI Overview context in follow-ups makes Search more helpful.

Why This Matters

The pattern has held since AI Overviews launched. Each update makes it easier to stay within AI-powered responses.

When Gemini 3 arrived in AI Mode, it brought deeper query fan-out and dynamic response layouts. AI Overviews running on the same model could produce different citation patterns.

That makes today’s update an important one to monitor. Model changes can affect which pages get cited and how responses are structured.

Looking Ahead

Google says the updates are rolling out starting today, though availability may vary by market.

Google previously indicated plans to add automatic model selection that routes complex questions to Gemini 3 while using faster models for simpler tasks. Whether that affects AI Overviews beyond today’s default model change isn’t specified.


Featured Image: Darshika Maduranga/Shutterstock

Sam Altman Says OpenAI “Screwed Up” GPT-5.2 Writing Quality via @sejournal, @MattGSouthern

Sam Altman said OpenAI “screwed up” GPT-5.2’s writing quality during a developer town hall Monday evening.

When asked about user feedback that GPT-5.2 produces writing that’s “unwieldy” and “hard to read” compared to GPT-4.5, Altman was blunt.

He said:

“I think we just screwed that up. We will make future versions of GPT 5.x hopefully much better at writing than 4.5 was.”

Altman explained that OpenAI made a deliberate choice to focus GPT-5.2’s development on technical capabilities:

“We did decide, and I think for good reason, to put most of our effort in 5.2 into making it super good at intelligence, reasoning, coding, engineering, that kind of thing. And we have limited bandwidth here, and sometimes we focus on one thing and neglect another.”

How OpenAI Positioned Each Model

The contrast between GPT-4.5 and GPT-5.2 shows where OpenAI focused its resources.

When OpenAI introduced GPT-4.5 in February 2025, the company emphasized natural interaction and writing. OpenAI said interacting with GPT-4.5 “feels more natural” and called it “useful for tasks like improving writing.”

GPT-5.2’s announcement took a different direction. OpenAI positioned it as the most capable model series yet for professional knowledge work, with improvements in creating spreadsheets, building presentations, writing code, and handling complex, multi-step projects.

The release post spotlights spreadsheets, presentations, tool use, and coding. Writing appears more briefly, with technical writing noted as an improvement for GPT-5.2 Instant. But Altman’s comments suggest the overall writing experience still fell short for users comparing it to GPT-4.5.

Why This Matters

We’ve covered the iterative changes to ChatGPT since GPT-5 launched in August, including updates to warmth and tone and the GPT-5.1 instruction-following improvements. OpenAI regularly adjusts model behavior based on user feedback, and regressions in one area while improving another aren’t new.

What’s unusual is hearing Altman acknowledge a tradeoff this directly. For anyone using ChatGPT output in client-facing work, drafts, or polished writing, this explains why outputs may have changed. Model upgrades don’t guarantee improvement across every capability.

If you rely on ChatGPT for writing, treat model updates like any other dependency change. Re-test your prompts when defaults change, and keep a fallback if output quality matters for your workflow.

Looking Ahead

Altman said he believes “the future is mostly going to be about very good general purpose models” and that even coding-focused models should “write well, too.”

No timeline was given for when GPT-5.x writing improvements will ship. OpenAI typically iterates on model behavior through point releases, so changes could arrive gradually rather than in a single update.

Hear Altman’s full statement in the video below:


Featured Image: FotoField/Shutterstock

Why Google Gemini Has No Ads Yet: ‘Trust In Your Assistant’ via @sejournal, @MattGSouthern

Google DeepMind CEO Demis Hassabis said Google doesn’t have any current plans to introduce advertising into its Gemini AI assistant, citing unresolved questions about user trust.

Speaking at the World Economic Forum in Davos, Hassabis said AI assistants represent a different product than search. He believes Gemini should be built for users first.

“In the realm of assistants, if you think of the chatbot as an assistant that’s meant to be helpful and ideally in my mind, as they become more powerful, the kind of technology that works for you as the individual,” Hassabis said in an interview with Axios. “That’s what I’d like to see with these systems.”

He said no one in the industry has figured out how advertising fits into that model.

“There is a question about how does ads fit into that model, where you want to have trust in your assistant,” Hassabis said. “I think no one’s really got a full answer to that yet.”

When asked directly about Google’s plans, Hassabis said: “We don’t have any current plans to do it ourselves.”

What Hassabis Said About OpenAI

The comments came days after OpenAI said it plans to begin testing ads in ChatGPT in the coming weeks for logged-in adults in the U.S. on free and Go tiers.

Hassabis said he was “a little bit surprised they’ve moved so early into that.”

He acknowledged advertising has funded much of the consumer internet and can be useful to users when done well. But he warned that poor execution in AI assistants could damage user relationships.

“I think it can be done right, but it can also be done in a way that’s not good,” Hassabis said. “In the end, what we want to do is be the most useful we can be to our users.”

Search Is Different

Hassabis drew a line between AI assistants and search when discussing advertising.

When asked whether his comments applied to Google Search, where the company already shows ads in AI Overviews, he said the two products work differently.

“But there it’s completely different use case because you’ve already just like how it’s always worked with search, you’ve already, you know, we know what your intent is basically and so we can be helpful there,” Hassabis said. “That’s a very different construct.”

Google began rolling out ads in AI Overviews in October 2024 and has continued expanding them since. The company claims AI Overviews generate ad revenue equal to traditional search results.

Why This Matters

This is the second time in two months that a Google executive has said Gemini ads aren’t currently planned.

In December, Google Ads VP Dan Taylor disputed an Adweek report claiming the company had told advertisers to expect Gemini ads in 2026. Taylor called that report “inaccurate” and said Google has “no current plans” to monetize the Gemini app.

Hassabis’s comments reinforce that position but go further by explaining the reasoning. His “technology that works for you” framing suggests Google sees a tension between advertising and the assistant relationship it wants Gemini to build.

Looking Ahead

Google is comfortable expanding ads where user intent is explicit, like search queries triggering AI Overviews. The company is holding back where intent is less defined and the relationship is more personal.

How long Google maintains its current position depends in part on how users respond to advertising in rival assistants.


Featured Image: Screenshot from: youtube.com/@axios, January 2026. 

Why CFOs Are Cutting AI Budgets (And The 3 Metrics That Save Them) via @sejournal, @purnavirji

Every AI vendor pitch follows the same script: “Our tool saves your team 40% of their time on X task.”

The demo looks impressive. The return on investment (ROI) calculator backs it up, showing millions in labor cost savings. You get budget approval. You deploy.

Six months later, your CFO asks: “Where’s the 40% productivity gain in our revenue?”

You realize the saved time went to email and meetings, not strategic work that moves the business forward.

This is the AI measurement crisis playing out in enterprises right now.

According to Fortune’s December 2025 report, 61% of CEOs report increasing pressure to show returns on AI investments. Yet most organizations are measuring the wrong things.

There’s a problem with how we’ve been tracking AI’s value.

Why ‘Time Saved’ Is A Vanity Metric

Time saved sounds compelling in a business case. It’s concrete, measurable, and easy to calculate.

But time saved doesn’t equal value created.

Anthropic’s November 2025 research analyzing 100,000 real AI conversations found that AI reduces task completion time by approximately 80%. Sounds transformative, right?

What that stat doesn’t capture is the Jevons Paradox of AI.

In economics, the Jevons Paradox occurs when technological progress increases the efficiency with which a resource is used, but the rate of consumption of that resource rises rather than falls.

In the corporate world, this is the Reallocation Fallacy. Just because AI completes a task faster doesn’t mean your team is producing more value. It means they’re producing the same output in less time, but then filling that saved time with lower-value work. Think more meetings, longer email threads, and administrative drift.

Google Cloud’s 2025 ROI of AI report, surveying 3,466 business leaders, found that 74% report seeing ROI within the first year, most commonly through productivity and efficiency gains rather than outcome improvements.

But when you dig into what they’re measuring, it’s primarily efficiency gains, and not outcome improvements.

CFOs understand this intuitively. That’s why “time saved” metrics don’t convince finance teams to increase AI budgets.

What does convince them is measuring what AI enables you to do that you couldn’t do before.

The Three Types Of AI Value Nobody’s Measuring

Recent research from Anthropic, OpenAI, and Google reveals a pattern: The organizations seeing real AI ROI are measuring expansion.

Three types of value actually matter:

Type 1: Quality Lift

AI can make work faster, and it makes good work better.

A marketing team using AI for email campaigns can send emails quicker. And they also have time to A/B test multiple subject lines, personalize content by segment, and analyze results to improve the next campaign.

The metric isn’t “time saved writing emails.” The metric is “15% higher email conversion rate.”

OpenAI’s State of Enterprise AI report, based on 9,000 workers across almost 100 enterprises, found that 85% of marketing and product users report faster campaign execution. But the real value shows up in campaign performance, not campaign speed.

How to measure quality lift:

  • Conversion rate improvements (not just task completion speed).
  • Customer satisfaction scores (not just response time).
  • Error reduction rates (not just throughput).
  • Revenue per campaign (not just campaigns launched).

One B2B SaaS company I talked to deployed AI for content creation.

  • Their old metric was “blog posts published per month.”
  • Their new metric became “organic traffic from AI-assisted content vs. human-only content.”

The AI-assisted content drove 23% more organic traffic because the team had time to optimize for search intent, not just word count.

That’s quality lift.

Type 2: Scope Expansion (The Shadow IT Advantage)

This is the metric most organizations completely miss.

Anthropic’s research on how their own engineers use Claude found that 27% of AI-assisted work wouldn’t have been done otherwise.

More than a quarter of the value AI creates isn’t from doing existing work faster; it’s from doing work that was previously impossible within time and budget constraints.

What does scope expansion look like? It often looks like positive Shadow IT.

The “papercuts” phenomenon: Small bugs that never got prioritized finally get fixed. Technical debt gets addressed. Internal tools that were “someday” projects actually get built because a non-engineer could scaffold them with AI.

The capability unlock: Marketing teams doing data analysis they couldn’t do before. Sales teams creating custom materials for each prospect instead of using generic decks. Customer success teams proactively reaching out instead of waiting for problems.

Google Cloud’s data shows 70% of leaders report productivity gains, with 39% seeing ROI specifically from AI enabling work that wasn’t part of the original scope.

How to measure scope expansion:

  • Track projects completed that weren’t in the original roadmap.
  • Ratio of backlog features cleared by non-engineers.
  • Measure customer requests fulfilled that would have been declined due to resource constraints.
  • Document internal tools built that were previously “someday” projects.

One enterprise software company used this metric to justify its AI investment. It tracked:

  • 47 customer feature requests implemented that would have been declined.
  • 12 internal process improvements that had been on the backlog for over a year.
  • 8 competitive vulnerabilities addressed that were previously “known issues.”

None of that shows up in “time saved” calculations. But it showed up clearly in customer retention rates and competitive win rates.

Type 3: Capability Unlock (The Full-Stack Employee)

We used to hire for deep specialization. AI is ushering in the era of the “Generalist-Specialist.”

Anthropic’s internal research found that security teams are building data visualizations. Alignment researchers are shipping frontend code. Engineers are creating marketing materials.

AI lowers the barrier to entry for hard skills.

A marketing manager doesn’t need to know SQL to query a database anymore; she just needs to know what question to ask the AI. This goes well beyond speed or time saved to removing the dependency bottleneck.

When a marketer can run their own analysis without waiting three weeks for the Data Science team, the velocity of the entire organization accelerates. The marketing generalist is now a front-end developer, a data analyst, and a copywriter all at once.

OpenAI’s enterprise data shows 75% of users report being able to complete new tasks they previously couldn’t perform. Coding-related messages increased 36% for workers outside of technical functions.

How to measure capability unlock:

  • Skills accessed (not skills owned).
  • Cross-functional work completed without handoffs.
  • Speed to execute on ideas that would have required hiring or outsourcing.
  • Projects launched without expanding headcount.

A marketing leader at a mid-market B2B company told me her team can now handle routine reporting and standard analyses with AI support, work that previously required weeks on the analytics team’s queue.

Their campaign optimization cycle accelerated 4x, leading to 31% higher campaign performance.

The “time saved” metric would say: “AI saves two hours per analysis.”

The capability unlock metric says: “We can now run 4x more tests per quarter, and our analytics team tackles deeper strategic work.”

Building A Finance-Friendly AI ROI Framework

CFOs care about three questions:

  • Is this increasing revenue? (Not just reducing cost.)
  • Is this creating competitive advantage? (Not just matching competitors.)
  • Is this sustainable? (Not just a short-term productivity bump.)

How to build an AI measurement framework that actually answers those questions:

Step 1: Baseline Your “Before AI” State

Don’t skip this step, or else it will be impossible to prove AI impact later. Before deploying AI, document current throughput, quality metrics, and scope limitations.

Step 2: Define Leading Vs. Lagging Indicators

You need to track both efficiency and expansion, but you need to frame them correctly to Finance.

  • Leading Indicator (Efficiency): Time saved on existing tasks. This predicts potential capacity.
  • Lagging Indicator (Expansion): New work enabled and revenue impact. This proves the value was realized.

Step 3: Track AI Impact On Revenue, Not Just Cost

Connect AI metrics directly to business outcomes:

  • If AI helps customer success teams → Track retention rate changes.
  • If AI helps sales teams → Track win rate and deal velocity changes.
  • If AI helps marketing teams → Track pipeline contribution and conversion rate changes.
  • If AI helps product teams → Track feature adoption and customer satisfaction changes.

Step 4: Measure The “Frontier” Gap

OpenAI’s enterprise research revealed a widening gap between “frontier” workers and median workers. Frontier firms send 2x more messages per seat.

This means identifying the teams extracting real value versus the teams just experimenting.

Step 5: Build The Measurement Infrastructure First

PwC’s 2026 AI predictions warn that measuring iterations instead of outcomes falls short when AI handles complex workflows.

As PwC notes: “If an outcome that once took five days and two iterations now takes fifteen iterations but only two days, you’re ahead.”

The infrastructure you need before you deploy AI involves baseline metrics, clear attribution models, and executive sponsorship to act on insights.

The Measurement Paradox

The organizations best positioned to measure AI ROI are the ones who already had good measurement infrastructure.

According to Kyndryl’s 2025 Readiness Report, most firms aren’t positioned to prove AI ROI because they lack the foundational data discipline.

Sound familiar? This connects directly to the data hygiene challenge I’ve written about previously. You can’t measure AI’s impact if your data is messy, conflicting, or siloed.

The Bottom Line

The AI productivity revolution is well underway. According to Anthropic’s research, current-generation AI could increase U.S. labor productivity growth by 1.8% annually over the next decade, roughly doubling recent rates.

But capturing that value requires measuring the right things.

Forget asking: “How much time does this save?”

Instead, focus on:

  • “What quality improvements are we seeing in output?”
  • “What work is now possible that wasn’t before?”
  • “What capabilities can we access without expanding headcount?”

These are the metrics that convince CFOs to increase AI budgets. These are the metrics that reveal whether AI is actually transforming your business or just making you busy faster.

Time saved is a vanity metric. Expansion enabled is the real ROI.

Measure accordingly.

More Resources:


Featured Image: SvetaZi/Shutterstock