Your Brand Is Being Cited By AI. Here’s How To Measure It via @sejournal, @DuaneForrester

Search has never stood still. Every few years, a new layer gets added to how people find and evaluate information. Generative AI systems like ChatGPT, Copilot Search, and Perplexity haven’t replaced Google or Bing. They’ve added a new surface where discovery happens earlier, and where your visibility may never show up in analytics.

Call it Generative Engine Optimization, call it AI visibility work, or just call it the next evolution of SEO. Whatever the label, the work is already happening. SEO practitioners are already tracking citations, analyzing which content gets pulled into AI responses, and adapting strategies as these platforms evolve weekly.

This work doesn’t replace SEO, rather it builds on top of it. Think of it as the “answer layer” above the traditional search layer. You still need structured content, clean markup, and good backlinks, among the other usual aspects of SEO. That’s the foundation assistants learn from. The difference is that assistants now re-present that information to users directly inside conversations, sidebars, and app interfaces.

If your work stops at traditional rankings, you’ll miss the visibility forming in this new layer. Tracking when and how assistants mention, cite, and act on your content is how you start measuring that visibility.

Your brand can appear in multiple generative answers without you knowing. These citations don’t show up in any analytics tool until someone actually clicks.

Image Credi: Duane Forrester

Perplexity explains that every answer it gives includes numbered citations linking to the original sources. OpenAI’s ChatGPT Search rollout confirms that answers now include links to relevant sites and supporting sources. Microsoft’s Copilot Search does the same, pulling from multiple sources and citing them inside a summarized response. And Google’s own documentation for AI overviews makes it clear that eligible content can be surfaced inside generative results.

Each of these systems now has its own idea of what a “citation” looks like. None of them report it back to you in analytics.

That’s the gap. Your brand can appear in multiple generative answers without you knowing. These are the modern zero-click impressions that don’t register in Search Console. If we want to understand brand visibility today, we need to measure mentions, impressions, and actions inside these systems.

But there’s yet another layer of complexity here: content licensing deals. OpenAI has struck partnerships with publishers including the Associated Press, Axel Springer, and others, which may influence citation preferences in ways we can’t directly observe. Understanding the competitive landscape, not just what you’re doing, but who else is being cited and why, becomes essential strategic intelligence in this environment.

In traditional SEO, impressions and clicks tell you how often you appeared and how often someone acted. Inside assistants, we get a similar dynamic, but without official reporting.

  • Mentions are when your domain, name, or brand is referenced in a generative answer.
  • Impressions are when that mention appears in front of a user, even if they don’t click.
  • Actions are when someone clicks, expands, or copies the reference to your content.

These are not replacements for your SEO metrics. They’re early indicators that your content is trusted enough to power assistant answers.

If you read last week’s piece, where I discussed how 2026 is going to be an inflection year for SEOs, you’ll remember the adoption curve. During 2026, assistants are projected to reach around 1 billion daily active users, embedding themselves into phones, browsers, and productivity tools. But that doesn’t mean they’re replacing search. It means discovery is happening before the click. Measuring assistant mentions is about seeing those first interactions before the analytics data ever arrives.

Let’s be clear. Traditional search is still the main driver of traffic. Google handles over 3.5 billion searches per day. In May 2025, Perplexity processed 780 million queries in a full month. That’s roughly what Google handles in about five hours.

The data is unambiguous. AI assistants are a small, fast-growing complement, not a replacement (yet).

But if your content already shows up in Google, it’s also being indexed and processed by the systems that train and quote inside these assistants. That means your optimization work already supports both surfaces. You’re not starting over. You’re expanding what you measure.

Search engines rank pages. Assistants retrieve chunks.

Ranking is an output-aligned process. The system already knows what it’s trying to show and chooses the best available page to match that intent. Retrieval, on the other hand, is pre-answer-aligned. The system is still assembling the information that will become the answer and that difference can change everything.

When you optimize for ranking, you’re trying to win a slot among visible competitors. When you optimize for retrieval, you’re trying to be included in the model’s working set before the answer even exists. You’re not fighting for position as much as you’re fighting for participation.

That’s why clarity, attribution, and structure matter so much more in this environment. Assistants pull only what they can quote cleanly, verify confidently, and synthesize quickly.

When an assistant cites your site, it’s doing so because your content met three conditions:

  1. It answered the question directly, without filler.
  2. It was machine-readable and easy to quote or summarize.
  3. It carried provenance signals the model trusted: clear authorship, timestamps, and linked references.

Those aren’t new ideas. They’re the same best practices SEOs have worked with for years, just tested earlier in the decision chain. You used to optimize for the visible result. Now you’re optimizing for the material that builds the result.

One critical reality to understand: citation behavior is highly volatile. Content cited today for a specific query may not appear tomorrow for that same query. Assistant responses can shift based on model updates, competing content entering the index, or weighting adjustments happening behind the scenes. This instability means you’re tracking trends and patterns, not guarantees (not that ranking was guaranteed, but they are typically more stable). Set expectations accordingly.

Not all content has equal citation potential, and understanding this helps you allocate resources wisely. Assistants excel at informational queries (”how does X work?” or “what are the benefits of Y?”). They’re less relevant for transactional queries like “buy shoes online” or navigational queries like “Facebook login.”

If your content serves primarily transactional or branded navigational intent, assistant visibility may matter less than traditional search rankings. Focus your measurement efforts where assistant behavior actually impacts your audience and where you can realistically influence outcomes.

The simplest way to start is manual testing.

Run prompts that align with your brand or product, such as:

  • “What is the best guide on [topic]?”
  • “Who explains [concept] most clearly?”
  • “Which companies provide tools for [task]?”

Use the same query across ChatGPT Search, Perplexity, and Copilot Search. Document when your brand or URL appears in their citations or answers.

Log the results. Record the assistant used, the prompt, the date, and the citation link if available. Take screenshots. You’re not building a scientific study here; you’re building a visibility baseline.

Once you’ve got a handful of examples, start running the same queries weekly or monthly to track change over time.

You can even automate part of this. Some platforms now offer API access for programmatic querying, though costs and rate limits apply. Tools like n8n or Zapier can capture assistant outputs and push them to a Google Sheet. Each row becomes a record of when and where you were cited. (To be fair, it’s more complicated than 2 short sentences make it sound, but it’s doable by most folks, if they’re willing to learn some new things.)

This is how you can create your first “ai-citation baseline“ report if you’re willing to just stay manual in your approach.

But don’t stop at tracking yourself. Competitive citation analysis is equally important. Who else appears for your key queries? What content formats do they use? What structural patterns do their cited pages share? Are they using specific schema markup or content organization that assistants favor? This intelligence reveals what assistants currently value and where gaps exist in the coverage landscape.

We don’t have official impression data yet, but we can infer visibility.

  • Look at the types of queries where you appear in assistants. Are they broad, informational, or niche?
  • Use Google Trends to gauge search interest for those same queries. The higher the volume, the more likely users are seeing AI answers for them.
  • Track assistant responses for consistency. If you appear across multiple assistants for similar prompts, you can reasonably assume high impression potential.

Impressions here don’t mean analytics views. They mean assistant-level exposure: your content seen in an answer window, even if the user never visits your site.

Actions are the most difficult layer to observe, but not because assistant ecosystems hide all referrer data. The tracking reality is more nuanced than that.

Most AI assistants (Perplexity, Copilot, Gemini, and paid ChatGPT users) do send referrer data that appears in Google Analytics 4 as perplexity.ai / referral or chatgpt.com / referral. You can see these sources in your standard GA4 Traffic Acquisition reports. (useful article)

The real challenges are:

Free-tier users don’t send referrers. Free ChatGPT traffic arrives as “Direct” in your analytics, making it impossible to distinguish from bookmark visits, typed URLs, or other referrer-less traffic sources. (useful article)

No query visibility. Even when you see the referrer source, you don’t know what question the user asked the AI that led them to your site. Traditional search gives you some query data through Search Console. AI assistants don’t provide this.

Volume is still small but growing. AI referral traffic typically represents 0.5% to 3% of total website traffic as of 2025, making patterns harder to spot in the noise of your overall analytics. (useful article)

Here’s how to improve tracking and build a clearer picture of AI-driven actions:

  1. Set up dedicated AI traffic tracking in GA4. Create a custom exploration or channel group using regex filters to isolate all AI referral sources in one view. Use a pattern like the excellent example in this Orbit Media article to capture traffic from major platforms ( ^https://(www.meta.ai|www.perplexity.ai|chat.openai.com|claude.ai|gemini.google.com|chatgpt.com|copilot.microsoft.com)(/.*)?$ ). This separates AI referrals from generic referral traffic and makes trends visible.
  2. Add identifiable UTM parameters when you control link placement. In content you share to AI platforms, in citations you can influence, or in public-facing URLs. Even platforms that send referrer data can benefit from UTM tagging for additional attribution clarity. (useful article)
  3. Monitor “Direct” traffic patterns. Unexplained spikes in direct traffic, especially to specific landing pages that assistants commonly cite, may indicate free-tier AI users clicking through without referrer data. (useful article)
  4. Track which landing pages receive AI traffic. In your AI traffic exploration, add “Landing page + query string” as a dimension to see which specific pages assistants are citing. This reveals what content AI systems find valuable enough to reference.
  5. Watch for copy-paste patterns in social media, forums, or support tickets that match your content language exactly. That’s a proxy for text copied from an assistant summary and shared elsewhere.

Each of these tactics helps you build a more complete picture of AI-driven actions, even without perfect attribution. The key is recognizing that some AI traffic is visible (paid tiers, most platforms), some is hidden (free ChatGPT), and your job is to capture as much signal as possible from both.

Machine-Validated Authority (MVA) isn’t visible to us as it’s an internal trust signal used by AI systems to decide which sources to quote. What we can measure are the breadcrumbs that correlate with it:

  • Frequency of citation
  • Presence across multiple assistants
  • Stability of the citation source (consistent URLs, canonical versions, structured markup)

When you see repeat citations or multi-assistant consistency, you’re seeing a proxy for MVA. That consistency is what tells you the systems are beginning to recognize your content as reliable.

Perplexity reports almost 10 billion queries a year across its user base. That’s meaningful visibility potential even if it’s small compared to search.

Microsoft’s Copilot Search is embedded in Windows, Edge, and Microsoft 365. That means millions of daily users see summarized, cited answers without leaving their workflow.

Google’s rollout of AI Overviews adds yet another surface where your content can appear, even when no one clicks through. Their own documentation describes how structured data helps make content eligible for inclusion.

Each of these reinforces a simple truth: SEO still matters, but it now extends beyond your own site.

Start small. A basic spreadsheet is enough.

Columns:

  • Date.
  • Assistant (ChatGPT Search, Perplexity, Copilot).
  • Prompt used.
  • Citation found (yes/no).
  • URL cited.
  • Competitor citations observed.
  • Notes on phrasing or ranking position.

Add screenshots and links to the full answers for evidence. Over time, you’ll start to see which content themes or formats surface most often.

If you want to automate, set up a workflow in n8n that runs a controlled set of prompts weekly and logs outputs to your sheet. Even partial automation will save time and let you focus on interpretation, not collection. Use this sheet and its data to augment what you can track in sources like GA4.

Before investing heavily in assistant monitoring, consider resource allocation carefully. If assistants represent less than 1% of your traffic and you’re a small team, extensive tracking may be premature optimization. Focus on high-value queries where assistant visibility could materially impact brand perception or capture early-stage research traffic that traditional search might miss.

Manual quarterly audits may suffice until the channel grows to meaningful scale. This is about building baseline understanding now so you’re prepared when adoption accelerates, not about obsessive daily tracking of negligible traffic sources.

Executives understand and prefer dashboards, not debates about visibility layers, so show them real-world examples. Put screenshots of your brand cited inside ChatGPT or Copilot next to your Search Console data. Explain that this is not a new algorithm update but a new front end for existing content. It’s up to you to help them understand this critical difference.

Frame it as additive reach. You’re showing leadership that the company’s expertise is now visible in new interfaces before clicks happen. That reframing keeps support for SEO strong and positions you as the one tracking the next wave.

It’s worth noting that citation practices exist within a shifting legal landscape. Publishers and content creators have raised concerns about copyright and fair use as AI systems train on and reproduce web content. Some platforms have responded with licensing agreements, while legal challenges continue to work through courts.

This environment may influence how aggressively platforms cite sources, which sources they prioritize, and how they balance attribution with user experience. The frameworks we build today should remain flexible as these dynamics evolve and as the industry establishes clearer norms around content usage and attribution.

AI assistant visibility is not yet a major traffic source. It’s a small but growing signal of trust.

By measuring mentions and citations now, you build an early-warning system. You’ll see when your content starts appearing in assistants long before any of your analytics tools do. This means that when 2026 arrives and assistants become a daily habit, you won’t be reacting to the curve. You’ll already have data on how your brand performs inside these new systems.

If you extend the concept here of “data” to a more meta level, you could say it’s already telling us that the growth is starting, it’s explosive, and it’s about to have an impact in consumer’s behaviors. So now is the moment to take that knowledge and focus it on the more day-to-day work you do and start to plan for how those changes impact that daily work.

Traditional SEO remains your base layer. Generative visibility sits above it. Machine-Validated Authority lives inside the systems. Watching mentions, impressions, and actions is how we start making what’s in the shadows measurable.

We used to measure rankings because that’s what we could see. Today, we can measure retrieval for the same reason. This is just the next evolution of evidence-based SEO. Ultimately, you can’t fix what you can’t see. We cannot see how trust is assigned inside the system, but we can see the outputs of each system.

The assistants aren’t replacing search (yet). They’re simply showing you how visibility behaves when the click disappears. If you can measure where you appear in those layers now, you’ll know when the slope starts to change and you’ll already be ahead of it.

More Resources:


Featured Image: Anton Vierietin/Shutterstock


This post was originally published on Duane Forrester Decodes.

Google’s John Mueller Flags SEO Issues In Vibe Coded Website via @sejournal, @MattGSouthern

Google Search Advocate John Mueller provided detailed technical SEO feedback to a developer on Reddit who vibe coded a website in two days and launched it on Product Hunt.

The developer posted in r/vibecoding that they built a Bento Grid Generator for personal use, published it on Product Hunt, and received over 90 upvotes within two hours.

Mueller responded with specific technical issues affecting the site’s search visibility.

Mueller wrote:

“I love seeing vibe-coded sites, it’s cool to see new folks make useful & self-contained things for the web, I hope it works for you.

This is just a handful of the things I noticed here. I’ve seen similar things across many vibe-coded sites, so perhaps this is useful for others too.”

Mueller’s Technical Feedback

Mueller identified multiple issues with the site.

The homepage stores key content in a llms.txt JavaScript file. Mueller noted that Google doesn’t use this file, and he’s not aware of other search engines using it either.

Mueller wrote:

“Generally speaking, your homepage should have everything that people and bots need to understand what your site is about, what the value of your service / app / site is.”

He recommended adding a popup-welcome-div in HTML that includes the information to make it immediately available to bots.

For meta tags, Mueller said the site only needs title and description tags. The keywords, author, and robots meta tags provide no SEO benefit.

The site includes hreflang tags despite having just one language version. Mueller said these aren’t necessary for single-language sites.

Mueller flagged the JSON-LD structured data as ineffective, noting:

“Check out Google’s ‘Structured data markup that Google Search supports’ for the types supported by Google. I don’t think anyone else supports your structured data.”

He called the hidden h1 and h2 tags “cheap & useless.” Mueller suggested using a visible, dismissable banner in the HTML instead.

The robots.txt file contains unnecessary directives. Mueller recommended skipping the sitemap if it’s just one page.

Mueller suggested adding the domain to Search Console and making it easier for visitors to understand what the app or site does.

Setting Expectations

Mueller closed his feedback with realistic expectations about the impact of technical SEO fixes.

He said:

“Will you automatically get tons of traffic from just doing these things? No, definitely not. However, it makes it easier for search engines to understand your site, so that they could be sending you traffic from search.”

He noted that implementing these changes now sets you up for success later.

Mueller added:

“Doing these things sets you up well, so that you can focus more on the content & functionality, without needing to rework everything later on.”

The Vibe Coding Trade-Off

This exchange highlights a tension with vibe coding and search visibility.

The developer built a functional product that generated immediate user engagement. The site works, looks polished, and achieved success on Product Hunt within hours.

None of the flagged issues affects user experience. But every implementation choice Mueller criticized shares the same characteristic. It works for visitors while providing nothing to search engines.

Sites built for rapid launch can achieve product success without search visibility. But the technical debt adds up.

The fixes aren’t too challenging, but they require addressing issues that seemed fine when the goal was to ship fast rather than rank well.


Featured Image: Panchenko Vladimir/Shutterstock

How Structured Data Shapes AI Snippets And Extends Your Visibility Quota via @sejournal, @cyberandy

When conversational AIs like ChatGPT, Perplexity, or Google AI Mode generate snippets or answer summaries, they’re not writing from scratch, they’re picking, compressing, and reassembling what webpages offer. If your content isn’t SEO-friendly and indexable, it won’t make it into generative search at all. Search, as we know it, is now a function of artificial intelligence.

But what if your page doesn’t “offer” itself in a machine-readable form? That’s where structured data comes in, not just as an SEO gig, but as a scaffold for AI to reliably pick the “right facts.” There has been some confusion in our community, and in this article, I will:

  1. walk through controlled experiments on 97 webpages showing how structured data improves snippet consistency and contextual relevance,
  2. map those results into our semantic framework.

Many have asked me in recent months if LLMs use structured data, and I’ve been repeating over and over that an LLM doesn’t use structured data as it has no direct access to the world wide web. An LLM uses tools to search the web and fetch webpages. Its tools – in most cases – greatly benefit from indexing structured data.

Image by author, October 2025

In our early results, structured data increases snippet consistency and improves contextual relevance in GPT-5. It also hints at extending the effective wordlim envelope – this is a hidden GPT-5 directive that decides how many words your content gets in a response. Imagine it as a quota on your AI visibility that gets expanded when content is richer and better-typed. You can read more about this concept, which I first outlined on LinkedIn.

Why This Matters Now

  • Wordlim constraints: AI stacks operate with strict token/character budgets. Ambiguity wastes budget; typed facts conserve it.
  • Disambiguation & grounding: Schema.org reduces the model’s search space (“this is a Recipe/Product/Article”), making selection safer.
  • Knowledge graphs (KG): Schema often feeds KGs that AI systems consult when sourcing facts. This is the bridge from web pages to agent reasoning.

My personal thesis is that we want to treat structured data as the instruction layer for AI. It doesn’t “rank for you,” it stabilizes what AI can say about you.

Experiment Design (97 URLs)

While the sample size was small, I wanted to see how ChatGPT’s retrieval layer actually works when used from its own interface, not through the API. To do this, I asked GPT-5 to search and open a batch of URLs from different types of websites and return the raw responses.

You can prompt GPT-5 (or any AI system) to show the verbatim output of its internal tools using a simple meta-prompt. After collecting both the search and fetch responses for each URL, I ran an Agent WordLift workflow [disclaimer, our AI SEO Agent] to analyze every page, checking whether it included structured data and, if so, identifying the specific schema types detected.

These two steps produced a dataset of 97 URLs, annotated with key fields:

  • has_sd → True/False flag for structured data presence.
  • schema_classes → the detected type (e.g., Recipe, Product, Article).
  • search_raw → the “search-style” snippet, representing what the AI search tool showed.
  • open_raw → a fetcher summary, or structural skim of the page by GPT-5.

Using a “LLM-as-a-Judge” approach powered by Gemini 2.5 Pro, I then analyzed the dataset to extract three main metrics:

  • Consistency: distribution of search_raw snippet lengths (box plot).
  • Contextual relevance: keyword and field coverage in open_raw by page type (Recipe, E-comm, Article).
  • Quality score: a conservative 0–1 index combining keyword presence, basic NER cues (for e-commerce), and schema echoes in the search output.

The Hidden Quota: Unpacking “wordlim

While running these tests, I noticed another subtle pattern, one that might explain why structured data leads to more consistent and complete snippets. Inside GPT-5’s retrieval pipeline, there’s an internal directive informally known as wordlim: a dynamic quota determining how much text from a single webpage can make it into a generated answer.

At first glance, it acts like a word limit,  but it’s adaptive. The richer and better-typed a page’s content, the more room it earns in the model’s synthesis window.

From my ongoing observations:

  • Unstructured content (e.g., a standard blog post) tends to get about ~200 words.
  • Structured content (e.g., product markup, feeds) extends to ~500 words.
  • Dense, authoritative sources (APIs, research papers) can reach 1,000+ words.

This isn’t arbitrary. The limit helps AI systems:

  1. Encourage synthesis across sources rather than copy-pasting.
  2. Avoid copyright issues.
  3. Keep answers concise and readable.

Yet it also introduces a new SEO frontier: your structured data effectively raises your visibility quota. If your data isn’t structured, you’re capped at the minimum; if it is, you grant AI more trust and more space to feature your brand.

While the dataset isn’t yet large enough to be statistically significant across every vertical, the early patterns are already clear – and actionable.

Figure 1 – How Structured Data Affects AI Snippet Generation (Image by author, October 2025)

Results

Figure 2 – Distribution of Search Snippet Lengths (Image by author, October 2025)

1) Consistency: Snippets Are More Predictable With Schema

In the box plot of search snippet lengths (with vs. without structured data):

  • Medians are similar → schema doesn’t make snippets longer/shorter on average.
  • Spread (IQR and whiskers) is tighter when has_sd = True → less erratic output, more predictable summaries.

Interpretation: Structured data doesn’t inflate length; it reduces uncertainty. Models default to typed, safe facts instead of guessing from arbitrary HTML.

2) Contextual Relevance: Schema Guides Extraction

  • Recipes: With Recipe schema, fetch summaries are far likelier to include ingredients and steps. Clear, measurable lift.
  • Ecommerce: The search tool often echoes JSON‑LD fields (e.g., aggregateRating, offer, brand) evidence that schema is read and surfaced. Fetch summaries skew to exact product names over generic terms like “price,” but the identity anchoring is stronger with schema.
  • Articles: Small but present gains (author/date/headline more likely to appear).

3) Quality Score (All Pages)

Averaging the 0–1 score across all pages:

  • No schema → ~0.00
  • With schema → positive uplift, driven mostly by recipes and some articles.

Even where means look similar, variance collapses with schema. In an AI world constrained by wordlim and retrieval overhead, low variance is a competitive advantage.

Beyond Consistency: Richer Data Extends The Wordlim Envelope (Early Signal)

While the dataset isn’t yet large enough for significance tests, we observed this emerging pattern:
Pages with richer, multi‑entity structured data tend to yield slightly longer, denser snippets before truncation.

Hypothesis: Typed, interlinked facts (e.g., Product + Offer + Brand + AggregateRating, or Article + author + datePublished) help models prioritize and compress higher‑value information – effectively extending the usable token budget for that page.
Pages without schema more often get prematurely truncated, likely due to uncertainty about relevance.

Next step: We’ll measure the relationship between semantic richness (count of distinct Schema.org entities/attributes) and effective snippet length. If confirmed, structured data not only stabilizes snippets – it increases informational throughput under constant word limits.

From Schema To Strategy: The Playbook

We structure sites as:

  1. Entity Graph (Schema/GS1/Articles/ …): products, offers, categories, compatibility, locations, policies;
  2. Lexical Graph: chunked copy (care instructions, size guides, FAQs) linked back to entities.

Why it works: The entity layer gives AI a safe scaffold; the lexical layer provides reusable, quotable evidence. Together they drive precision under thewordlim constraints.

Here’s how we’re translating these findings into a repeatable SEO playbook for brands working under AI discovery constraints.

  1. Ship JSON‑LD for core templates
    • Recipes → Recipe (ingredients, instructions, yields, times).
    • Products → Product + Offer (brand, GTIN/SKU, price, availability, ratings).
    • Articles → Article/NewsArticle (headline, author, datePublished).
  2. Unify entity + lexical
    Keep specs, FAQs, and policy text chunked and entity‑linked.
  3. Harden snippet surface
    Facts must be consistent across visible HTML and JSON‑LD; keep critical facts above the fold and stable.
  4. Instrument
    Track variance, not just averages. Benchmark keyword/field coverage inside machine summaries by template.

Conclusion

Structured data doesn’t change the average size of AI snippets; it changes their certainty. It stabilizes summaries and shapes what they include. In GPT-5, especially under aggressive wordlim conditions, that reliability translates into higher‑quality answers, fewer hallucinations, and greater brand visibility in AI-generated results.

For SEOs and product teams, the takeaway is clear: treat structured data as core infrastructure. If your templates still lack solid HTML semantics, don’t jump straight to JSON-LD: fix the foundations first. Start by cleaning up your markup, then layer structured data on top to build semantic accuracy and long-term discoverability. In AI search, semantics is the new surface area.

More Resources:


Featured Image: TierneyMJ/Shutterstock

Are LLM Visibility Trackers Worth It?

TL;DR

  1. When it comes to LLM visibility, not all brands are created equal. For some, it matters far more than others.
  2. LLMs give different answers to the same question. Trackers combat this by simulating prompts repeatedly to get an average visibility/citation score.
  3. While simulating the same prompts isn’t perfect, secondary benefits like sentiment analysis are not SEO-specific issues. Which right now is a good thing.
  4. Unless a visibility tracker offers enough scale at a reasonable price, I would be wary. But if the traffic converts well and you need to know more, get tracking.
(Image Credit: Harry Clarkson-Bennett)

A small caveat to start. This really depends on how your business makes money and whether LLMs are a fundamental part of your audience journey. You need to understand how people use LLMs and what it means for your business.

Brands that sell physical products have a different journey from publishers that sell opinion or SaaS companies that rely more deeply on comparison queries than anyone else.

Or a coding company destroyed by one snidey Reddit moderator with a bone to pick…

For example, Ahrefs made public some of its conversion rate data from LLMs. 12.1% of their signups came from LLMs from just 0.5% of their total traffic. Which is huge.

AI search visitors convert 23x better than traditional organic search visitors for Ahrefs. (Image Credit: Harry Clarkson-Bennett)

But for us, LLM traffic converts significantly worse. It is a fraction of a fraction.

Honestly, I think LLM visibility trackers at this scale are a bit here today and gone tomorrow. If you can afford one, great. If not, don’t sweat it. Take it all with a pinch of salt. AI search is just a part of most journeys, and tracking the same prompts day in, day out has obvious flaws.

They’re just aggregating what someone said about you on Reddit while they’re taking a shit in 2016.

What Do They Do?

Trackers like Profound and Brand Radar are designed to show you how your brand is framed and recommended in AI answers. Over time, you can measure yours and your competitors’ visibility in the platforms.

Image Credit: Harry Clarkson-Bennett

But LLM visibility is smoke and mirrors.

Ask a question, get an answer. Ask the same question, to the same machine, from the same computer, and get a different answer. A different answer with different citations and businesses.

It has to be like this, or else we’d never use the boring ones.

To combat the inherent variance determined by their temperature setting, LLM trackers simulate prompts repeatedly throughout the day. In doing so, you get an average visibility and citation score alongside some other genuinely useful add-ons like your sentiment score and some competitor benchmarking.

“Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.”

OpenAI Documentation

Simulate a prompt 100 times. If your content was used in 70 of the responses and you were cited seven times, you would have a 70% visibility score and a 7% citation score.

Trust me, that’s much better than it sounds… These engines do not want to send you traffic.

In Brian Balfour’s excellent words, they have identified the moat and the gates are open. They will soon shut. As they shut, monetization will be hard and fast. The likelihood of any referral traffic, unless it’s monetized, is low.

Like every tech company ever.

If you aren’t flush with cash, I’d say most businesses just do not need to invest in them right now. They’re a nice-to-have rather than a necessity for most of us.

How Do They Work?

As far as I can tell, there are two primary models.

  1. Pay for a tool that tracks specific synthetic prompts that you add yourself.
  2. Purchase an enterprise-like tool that tracks more of the market at scale.

Some tools, like Profound, offer both. The cheaper model (the price point is not for most businesses) lets you track synthetic prompts under topics and/or tags. The enterprise model gives you a significantly larger scale.

Whereas tools like Ahrefs Brand Radar provide a broader view of the entire market. As the prompts are all synthetic, there are some fairly large holes. But I prefer broad visibility.

I have not used it yet, but I believe Similarweb have launched their own LLM visibility tracker, which includes real user prompts from Clickstream data.

This makes for a far more useful version of these tools IMO and goes some way to answering the synthetic elephant in the room. And it helps you understand the role LLMs play in the user journey. Which is far more valuable.

The Problem

Does doing good SEO improve your chances of improving your LLM visibility?

Certainly looks like it…

GPT-5 no longer needs to train on more information. It is as well-versed as its overlords now want to pay for. It’s bored of ravaging the internet’s detritus and reaches out to a search index using RAG to verify a response. A response, it does not quite have the appropriate level of confidence to answer effectively.

But I’m sure we will need to modify it somewhat if your primary goal is to increase LLM visibility. Increase expenditure on TOFU and digital PR campaigns being a notable point.

Image Credit: Harry Clarkson-Bennett

Right now, LLMs have an obvious spam problem. One I don’t expect they’ll be willing to invest in solving anytime soon. The AI bubble and gross valuation of these companies will dictate how they drive revenue. And quickly.

It sure as hell won’t be sorting out their spam problem. When you have a $300 billion contract to pay and revenues of $12 billion, you need some more money. Quickly.

So anyone who pays for best page link inclusions or adds hidden and footer text to their websites will benefit in the short-term. But most of us should still build things actual, breathing, snoring people.

With the new iterations of LLM trackers calling search instead of formulating an answer for prompts based on learned ‘knowledge’, it becomes even harder to create an ‘LLM optimization strategy.’

As a news site, I know that most prompts we would vaguely show up in would trigger the web index. So I just don’t quite see the value. It’s very SEO-led.

If you don’t believe me, Will Reynolds is an inarguably better source of information (Image Credit: Harry Clarkson-Bennett)

How You Can Add Value With Sentiment Analysis

I found almost zero value to be had from tracking prompts in LLMs at a purely answer level. So, let’s forget all that for a second and use them for something else. Let’s start with some sentiment analysis.

These trackers give us access to:

  • A wider online sentiment score.
  • Review sources LLMs called upon (at a prompt level).
  • Sentiment scores by topics.
  • Prompts and links to on and off-site information sources.

You can identify where some of these issues start. Which, to be fair, is basically Trustpilot and Reddit.

I won’t go through everything, but a couple of quick examples:

  1. LLMs may be referencing some not-so-recently defunct podcasts and newsletters as “reasons to subscribe.”
  2. Your cancellation process may be cited as the most serious issues for most customers.

Unless you have explicitly stated that these podcasts and newsletters have finished, it’s all fair game. You need to tighten up your product marketing and communications strategy.

For people first. Then for LLMs.

These are not SEO specific projects. We’re moving into an era where solely SEO projects will be difficult to get pushed through. A fantastic way of getting buy-in is to highlight projects with benefits outside of search.

Highlighting serious business issues – poor reviews, inaccurate, out-of-date information et al. – can help get C-suite attention and support for some key brand reputation projects.

Profound’s sentiment analysis tab (Image Credit: Harry Clarkson-Bennett)
Here it is broken down by topic. You can see individual prompts and responses to each topic (Image Credit: Harry Clarkson-Bennett)

To me, this has nothing to do with LLMs. Or what our audience might ask an ill-informed answer engine. They are just the vessel.

It is about solving problems. Problems that drive real value to your business. In your case, this could be about increasing the LTV of a customer. Increasing their retention rate, reducing churn, and increasing the chance of a conversion by providing an improved experience.

If you’ve worked in SEO for long enough, someone will have floated the idea of improving your online sentiment and reviews past you.

“But will this improve our SEO?”

Said Jeff, a beleaguered business owner.

Who knows, Jeff. It really depends on what is holding you back compared to your competition. And like it or not, search is not very investible right now.

But that doesn’t matter in this instance. This isn’t a search-first project. It’s an audience-first project. It encompasses everyone. From customer service to SEO and editorial. It’s just the right thing to do for the business.

A quick hark back to the Google Leak shows you just how many review and sentiment-focused metrics may affect how you rank.

There are nine alone that mention review or sentiment in the title

There are nine alone that mention review or sentiment in the title (Image Credit: Harry Clarkson-Bennett)

For a long time, search has been about brands and trust. Branded search volume, outperforming expected CTR (a Bayesian type predictive model), direct traffic, and general user engagement and satisfaction.

This isn’t because Google knows better than people. It’s because they have stored how we feel about pages and brands in relation to queries and used that as a feedback loop. Google trusts brands because we do.

Most of us have never had to worry about reviews and sentiment. But this is a great time to fix any issues you may have under the guise of AEO, GEO, SEO, or whatever you want to call it.

Lars Lofgren’s article titled How a Competitor Crippled a $23.5M Bootcamp By Becoming a Reddit Moderator is an incredible look at how Codesmith was nobbled by negative PR. Negative PR started and maintained by one Reddit Mod. One.

So keeping tabs on your reputation and identifying potentially serious issues is never a bad thing.

Could I Just Build My Own?

Yep. For starters, you’d need an estimation of monthly LLM API costs based on the number of monthly tokens required. Let’s use Profound’s lower-end pricing tier as an estimate and our old friend Gemini to figure out some estimated costs.

  • 200 prompts × 10 runs × 12 days (approx.) × 3 models = 24,000 monthly runs.
  • 24,000 runs × 1,000 tokens/query (conservative est.) = 24,000,000 tokens.

Based on this, here’s a (hopefully) accurate cost estimate per model from our robot pal.

Image Credit: Harry Clarkson-Bennett

Right then. You now need some back-end functionality, data storage, and some front-end visualization. I’ll tot up as we go.

$21 per month

Back-End

  • A Scheduler/Runner like Render VPS to execute 800 API calls per day.
  • A data orchestrater. Essentially, some Python code to parse raw JSON and extract relevant citation and visibility data.

$10 per month

Data Storage

  • A database, like Supabase (which you can integrate directly through Lovable), to store raw responses and structured metrics.
  • Data storage (which should be included as part of your database).

$15 per month

Front-End Visualization

  • A web dashboard to create interactive, shareable dashboards. I unironically love Lovable. It’s easy to connect directly to databases. I have also used Streamlit previously. Lovable looks far sleeker but has its own challenges.
  • You may also need a visualization library to help generate time series charts and graphs. Some dashboards have this built in.

$50 per month

$96 all in. I think the likelihood is it’s closer to $50 than $100. No scrimping. At the higher end of budgets for tools I use (Lovable) and some estimates from Gemini, we’re talking about a tool that will cost under $100 a month to run and function very well.

This isn’t a complicated project or setup. It is, IMO, an excellent project to learn the vibe coding ropes. Which I will say is not all sunshine and rainbows.

So, Should I Buy One?

If you can afford it, I would get one. For at least a month or two. Review your online sentiment. See what people really say about you online. Identify some low lift wins around product marketing and review/reputation management, and review how your competitors fare.

This might be the most important part of LLM visibility. Set up a tracking dashboard via Google Analytics (or whatever dreadful analytics provider you use) and see a) how much traffic you get and b) whether it’s valuable.

The more valuable it is, the more value there will be in tracking your LLM visibility.

You could also make one. The joy of making one is a) you can learn a new skill and b) you can make other things for the same cost.

Frustrating, yes. Fun? Absolutely.

More Resources: 


This post was originally published on Leadership In SEO.


Featured Image: Viktoriia_M/Shutterstock

Google Answers What To Do For AEO/GEO via @sejournal, @martinibuster

Google’s VP of Product, Robby Stein, recently answered the question of what people should think about in terms of AEO/GEO. He provided a multi-part answer that began with how Google’s AI creates answers and ended with guidance on what creators should consider.

Foundations Of Google AI Search

The question asked was about AEO/GEO, which was characterized by the podcast host as the evolution of SEO. Google’s Robby Stein’s answer suggested thinking about the context of AI answers.

This is the question that was asked:

“What’s your take on this whole rise of AEO, GEO, which is kind of this evolution of SEO?

I’m guessing your answer is going to be just create awesome stuff and don’t worry about it, but you know, there’s a whole skill of getting to show up in these answers. Thoughts on what people should be thinking about here?”

Stein began his answer describing the foundations of how Google’s AI search works:

“Sure. I mean, I can give you a little bit of under the hood, like how this stuff works, because I do think that helps people understand what to do.

When our AI constructs a response, it’s actually trying to, it does something called query fan-out, where the model uses Google search as a tool to do other querying.

So maybe you’re asking about specific shoes. It’ll add and append all of these other queries, like maybe dozens of queries, and start searching basically in the background. And it’ll make requests to our data kind of backend. So if it needs real-time information, it’ll go do that.

And so at the end of the day, actually something’s searching. It’s not a person, but there’s searches happening.”

Robby Stein shows that Google’s AI still relies on conventional search engine retrieval, it’s just scaled and automated. The system performs dozens of background searches and evaluates the same quality signals that guide ordinary search rankings.

That means that “answer engine optimization” is basically the same as SEO because the underlying indexing, ranking and quality factors inherent to traditional SEO principles still apply to queries that the AI itself issues as part of the query fan-out process.

For SEOs, the insight is that visibility in AI answers depends less on gaming a new algorithm and more on producing content that satisfies intent so thoroughly that Google’s automated searches treat it as the best possible answer. As you’ll see later in this article, originality also plays a role.

Role Of Traditional Search Signals

An interesting part of this discussion is centered on the kinds of quality signals that Google describes in its Quality Raters Guidelines. Stein talks about originality of the content, for example.

Here’s what he said:

“And then each search is paired with content. So if for a given search, your webpage is designed to be extremely helpful.

And then you can look up Google’s human rater guidelines and read… what makes great information? This is something Google has studied more than anyone.

And it’s like:

  • Do you satisfy the user intent of what they’re trying to get?
  • Do you have sources?
  • Do you cite your information?
  • Is it original or is it repeating things that have been repeated 500 times?

And there’s these best practices that I think still do largely apply because it’s going to ultimately come down to an AI is doing research and finding information.

And a lot of the core signals, is this a good piece of information for the question, they’re still valid. They’re still extremely valid and extremely useful. And that will produce a response where you’re more likely to show up in those experiences now.”

Although Stein is describing AI Search results, his answer shows that Google’s AI Search still values the same underlying quality factors found in traditional search. Originality, source citations, and satisfying intent remain the foundation of what makes information “good” in Google’s view. AI has changed the interface of search and encouraged more complex queries, but the ranking factors continue to be the same recognizable signals related to expertise and authoritativeness.

More On How Google’s AI Search Works

The podcast host, Lenny, followed up with another question about how Google’s AI Search might follow a different approach from a strictly chatbot approach.

He asked:

“It’s interesting your point about how it goes in searches. When you use it, it’s like searching a thousand pages or something like that. Is that a just a different core mechanic to how other popular chatbots work because the others don’t go search a bunch of websites as you’re asking.”

Stein answered with more details about how AI search works, going beyond query fan-out, identifying factors it uses to surface what they feel to be the best answers. For example, he mentions parametric memory. Parametric memory is the knowledge that an AI has as part of its training. It’s essentially the knowledge stored within the model and not fetched from external sources.

Stein explained:

“Yeah, this is something that we’ve done uniquely for our AI. It obviously has the ability to use parametric memory and thinking and reasoning and all the things a model does.

But one of the things that makes it unique for designing it specifically for informational tasks, like we want it to be the best at informational needs. That’s what Google’s all about.

  • And so how does it find information?
  • How does it know if information is right?
  • How does it check its work?

These are all things that we built into the model. And so there is a unique access to Google. Obviously, it’s part of Google search.

So it’s Google search signals, everything from spam, like what’s content that could be spam and we don’t want to probably use in a response, all the way to, this is the most authoritative, helpful piece of information.

We’re going link to it and we’re going to explain, hey, according to this website, check out that information and you’re going to probably go see that yourself.

So that’s how we’ve thought about designing this.”

Stein’s explanation makes it clear that Google’s AI Search is not designed to mimic the conversational style of general chatbots but to reinforce the company’s core goal of delivering trustworthy information that’s authoritative and helpful.

Google’s AI Search does this by relying on signals from Google Search, such as spam detection and helpfulness, the system grounds its AI-generated answers in the same evaluation and ranking framework inherent in regular search ranking.

This approach positions AI Search as less a standalone version of search and more like an extension of Google’s information-retrieval infrastructure, where reasoning and ranking work together to surface factually accurate answers.

Advice For Creators

Stein at one point acknowledges that creators want to know what to do for AI Search. He essentially gives the advice to think about the questions people are asking. In the old days that meant thinking about what keywords searchers are using. He explains that’s no longer the case because people are using long conversational queries now.

He explained:

“I think the only thing I would give advice to would be, think about what people are using AI for.

I mentioned this as an expansionary moment, …that people are asking a lot more questions now, particularly around things like advice or how to, or more complex needs versus maybe more simple things.

And so if I were a creator, I would be thinking, what kind of content is someone using AI for? And then how could my content be the best for that given set of needs now?
And I think that’s a really tangible way of thinking about it.”

Stein’s advice doesn’t add anything new but it does reframe the basics of SEO for the AI Search era. Instead of optimizing for isolated keywords, creators should consider anticipating the fuller intent and informational journey inherent in conversational questions. That means structuring content to directly satisfy complex informational needs, especially “how to” or advice-driven queries that users increasingly pose to AI systems rather than traditional keyword search.

Takeaways

  • AI Is Search Still Built on Traditional SEO Signals
    Google’s AI Search relies on the same core ranking principles as traditional search—intent satisfaction, originality, and citation of sources.
  • How Query Fan-Out Works
    AI Search issues dozens of background searches per query, using Google Search as a tool to fetch real-time data and evaluate quality signals.
  • Integration of Parametric Memory and Search Signals
    The model blends stored knowledge (parametric memory) with live Google Search data, combining reasoning with ranking systems to ensure factual accuracy.
  • Google’s AI Search Is Like An Extension of Traditional Search
    AI Search isn’t a chatbot; it’s a search-based reasoning system that reinforces Google’s informational trust model rather than replacing it.
  • Guidance for Creators in the AI Search Era
    Optimizing for AI means understanding user intent behind long, conversational queries—focusing on advice- and how-to-style content that directly satisfies complex informational needs.

Google’s AI Search builds on the same foundations that have long defined traditional search, using retrieval, ranking, and quality signals to surface information that demonstrates originality and trustworthiness. By combining live search signals with the model’s own stored knowledge, Google has created a system that explains information and cites the websites that provided it. For creators, this means that success now depends on producing content that fully addresses the complex, conversational questions people bring to AI systems.

Watch the podcast segment starting at about the 15:30 minute mark:

Featured Image by Shutterstock/PST Vector

Google Adds AI Previews To Discover, Sports Feed Coming via @sejournal, @MattGSouthern

Google rolled out AI trending previews in Discover and will add a “What’s new” sports feed to U.S. mobile search in coming weeks.

  • AI trending previews in Discover are live in the U.S., South Korea, and India.
  • A sports “What’s new” button will begin rolling out in the U.S. in the coming weeks.
  • Both experiences show brief previews with links to publisher/creator content.
Timeline Of ChatGPT Updates & Key Events via @sejournal, @theshelleywalsh

At the end of 2022, OpenAI launched ChatGPT and opened up an easy-to-access interface to large language models (LLMs) for the first time. The uptake was stratospheric.

Since the explosive launch, ChatGPT hasn’t shown signs of slowing down in developing new features or maintaining worldwide user interest. As of September 2025, ChatGPT now has a reported 700 million weekly active users and hundreds of plugins.

The following is a timeline of all key events since the launch up to October 2025.

History Of ChatGPT: A Timeline Of Developments

June 16, 2016 – OpenAI published research on generative models, trained by collecting a vast amount of data in a specific domain, such as images, sentences, or sounds, and then teaching the model to generate similar data. (OpenAI)

Sept. 19, 2019 – OpenAI published research on fine-tuning the GPT-2 language model with human preferences and feedback. (OpenAI)

Jan. 27, 2022 – OpenAI published research on InstructGPT models, siblings of ChatGPT, that show improved instruction-following ability, reduced fabrication of facts, and decreased toxic output. (OpenAI)

Nov. 30, 2022 – OpenAI introduced ChatGPT using GPT-3.5 as a part of a free research preview. (OpenAI)

chatgpt free research previewScreenshot from ChatGPT, Dec 2022

Feb. 1, 2023 – OpenAI announced ChatGPT Plus, a premium subscription option for ChatGPT users offering less downtime and access to new features.

chatgpt plusScreenshot from ChatGPT, February 2023

Feb. 2, 2023 – ChatGPT reached 100 million users faster than TikTok, which made the milestone in nine months, and Instagram, which made it in two and a half years. (Reuters)

Feb. 7, 2023 – Microsoft announced ChatGPT-powered features were coming to Bing.

Feb. 22, 2023 – Microsoft released AI-powered Bing chat for preview on mobile.

March 1, 2023 – OpenAI introduced the ChatGPT API for developers to integrate ChatGPT functionality in their applications. Early adopters included Snapchat’s My AI, Quizlet Q-Chat, Instacart, and Shop by Shopify.

March 14, 2023 – OpenAI releases GPT-4 in ChatGPT and Bing, which promises better reliability, creativity, and problem-solving skills.

chatgpt gpt-4Screenshot from ChatGPT, March 2023

March 14, 2023 – Anthropic launched Claude, its ChatGPT alternative.

March 20, 2023 – A major ChatGPT outage affects all users for several hours.

March 21, 2023 – Google launched Bard, its ChatGPT alternative. (Rebranded to Gemini in February 2024.)

March 23, 2023 – OpenAI began rolling out ChatGPT plugin support, including Browsing and Code Interpreter.

March 31, 2023 – Italy banned ChatGPT for collecting personal data and lacking age verification during registration for a system that can produce harmful content.

April 25, 2023 – OpenAI added new ChatGPT data controls that allow users to choose which conversations OpenAI includes in training data for future GPT models.

April 28, 2023 – The Italian Garante released a statement that OpenAI met its demands and that the ChatGPT service could resume in Italy.

April 29, 2023 – OpenAI released ChatGPT plugins, GPT-3.5 with browsing, and GPT-4 with browsing in ALPHA.

Screenshot from ChatGPT, April 2023

May 12, 2023 – ChatGPT Plus users can now access over 200 ChatGPT plugins. (Open AI)

chatgpt pluginsScreenshot from ChatGPT, May 2023

May 16, 2023 – OpenAI CEO Sam Altman appears in a Senate subcommittee hearing on the Oversight of AI, where he discusses the need for AI regulation that doesn’t slow innovation.

May 18, 2023 – OpenAI launched the ChatGPT iOS app, allowing users to access GPT-3.5 for free. ChatGPT Plus users can switch between GPT-3.5 and GPT-4.

chatgpt ios app Screenshot from ChatGPT, May 2023

May 23, 2023 – Microsoft announced that Bing would power ChatGPT web browsing.

chatgpt browse with bingScreenshot from ChatGPT, May 2023

May 24, 2023 – Pew Research Center released data from a ChatGPT usage survey showing that only 59% of American adults know about ChatGPT, while only 14% have tried it.

May 25, 2023 – OpenAI, Inc. launched a program to award ten $100,000 grants to researchers to develop a democratic system for determining AI rules. (OpenAI)

July 3, 2023 – ChatGPT’s explosive growth shows a decline in traffic for the first time since launch. (Similarweb)

July 20, 2023 – OpenAI introduced custom instructions for ChatGPT, allowing users to personalize their interaction experience. (OpenAI)

Aug. 28, 2023 – OpenAI launched ChatGPT Enterprise, calling it “the most powerful version of ChatGPT yet.” Benefits included enterprise-level security and unlimited usage of GPT-4. (OpenAI)

Nov. 6, 2023 – OpenAI announced the arrival of custom GPTs, which enabled users to build their own custom GPT versions using specific skills, knowledge, etc. (OpenAI)

Jan. 10, 2024 – With the launch of the GPT Store, ChatGPT users could discover and use other people’s custom GPTs. On this day, OpenAI also introduced ChatGPT Team, a collaborative tool for the workspace. (OpenAI)

Jan. 25, 2024 – OpenAI released new embedding models: the text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model. (OpenAI)

Feb. 8, 2024 – Google’s Bard rebranded to Gemini. (Google – Gemini release notes)

April 9, 2024 – OpenAI announced that it would discontinue ChatGPT plugins in favor of custom GPTs. (Open AI Community Forum)

May 13, 2024 – A big day for OpenAI, when the company introduced the GPT-4o model, offering enhanced intelligence and additional features for free users. (OpenAI)

July 25, 2024 – OpenAI launched SearchGPT, an AI-powered search prototype designed to answer user queries with direct answers. Update: Elements from this prototype were rolled into ChatGPT and made available to all regions on Feb. 5, 2025. (OpenAI)

Aug. 29, 2024 – ChatGPT reaches 200 million weekly active users. (Reuters)

Sept. 12, 2024 – OpenAI unveiled the GPT o1 model, which it claims “can reason like a human.”

Oct. 31, 2024 – OpenAI announces ChatGPT Search. It became available to logged-in users starting Dec. 16, 2024, and on Feb. 5, 2025, it was rolled out to be available for all ChatGPT users wherever ChatGPT is available. (OpenAI)

ChatGPT Search featureScreenshot from ChatGPT, September 2025

Jan. 31, 2025 – OpenAI releases o3-mini (smaller reasoning model; first in the o3 family). (Open AI)

April 16, 2025 – OpenAI introduces o3 and o4-mini (fast, cost-efficient reasoning; strong AIME performance). (OpenAI)

June 10, 2025 – o3-pro is made available to Pro users in both ChatGPT and API. (OpenAI)

Aug. 4, 2025 – ChatGPT approached 700 million weekly active users.

Screenshot from Nick Turley, VP and head of the ChatGPT app, X (Twitter) post, September 2025

Sept. 15, 2025 – A New OpenAI study reveals that it reached 700 million weekly active users and how they use ChatGPT. (OpenAI)

Last update: October 01, 2025


Read More:


Featured image: Tada Images/Shutterstock

2026: When AI Assistants Become The First Layer via @sejournal, @DuaneForrester

What I’m about to say will feel uncomfortable to a lot of SEOs, and maybe even some CEOs. I’m not writing this to be sensational, and I know some of my peers will still look sideways at me for it. That’s fine. I’m sharing what the data suggests to me, and I want you to look at the same numbers and decide for yourself.

Too many people in our industry have slipped into the habit of quoting whatever guidance comes out of a search engine or AI vendor as if it were gospel. That’s like a soda company telling you, “Our drink is refreshing, you should drink more.” Maybe it really is refreshing. Maybe it just drives their margins. Either way, you’re letting the seller define what’s “best.”

SEO used to be a discipline that verified everything. We tested. We dug as deep as we could. We demanded evidence. Lately, I see less of that. This article is a call-back to that mindset. The changes coming in 2026 are not hype. It’s visible in the adoption curves, and those curves don’t care if we believe them or not. These curves aren’t about what I say, what you say, or what 40 other “SEO experts” say. These curves are about consumers, habits, and our combined future.

ChatGPT is reaching mass adoption in 4 years. Google took 9. Tech adoption is accelerating.

The Shocking Ramp: Google Vs. ChatGPT

Confession: I nearly called this section things like “Ramp-ocalypse 2026” or “The Adoption Curve That Will Melt Your Rank-Tracking Dashboard.” I had a whole list of ridiculous options that would have looked at home on a crypto shill blog. I finally dialed it back to the calmer “The Shocking Ramp: Google Vs. ChatGPT” because that, at least, sounds like something an adult would publish. But you get the idea: The curve really is that dramatic, but I just refuse to dress it up like a doomsday tabloid headline.

Image Credit: Duane Forrester

And before we really get into the details, let’s be clear that this is not comparing totals of daily active users today. This is a look at time-to-mass-adoption. Google achieved that a long time ago, whereas ChatGPT is going to do that, it seems, in 2026. This is about the vector. The ramp, and the speed. It’s about how consumer behavior is changing, and is about to be changed. That’s what the chart represents. Of course, when we reference ChatGPT-Class Assistants, we’re including Gemini here, so Google is front and center as these changes happen.

And Google’s pivot into this space isn’t accidental. If you believe Google was reacting to OpenAI’s appearance and sudden growth, guess again. Both companies have essentially been neck and neck in a thoroughbred horse race to be the leading next-gen information-parsing layer for humanity since day one. ChatGPT may have grabbed the headlines when they launched, but Google very quickly became their equal, and the gap at the top, that these companies are chasing, it’s vanishing quickly. Consumers soon won’t be able to say which is “the best” in any meaningful ways.

What’s most important here is that as consumers adopt, behavior changes. I cannot recommend enough that folks read Charles Duhigg’s “The Power of Habit” book (non-aff link). I first read it over a decade ago, and it still brings home the message – the impact that a single moment of habit-forming has on a product’s success and growth. And that is what the chart above is speaking to. New habits are about to be formed by consumers globally.

Let’s rewind to the search revolution most of us built our careers on.

  • Google launched in 1998.
  • By late 1999, it was handling about 3.5 million searches per day (Market.us, September 1999 data).
  • By 2001, Google crossed roughly 100 million searches a day (The Guardian, 2001).
  • It didn’t pass 50 % U.S. market share until 2007, about nine years after launch (Los Angeles Times, August 2007).

Now compare that to the modern AI assistant curve:

  • ChatGPT launched in November 2022.
  • It reached 100 million monthly active users in just two months (UBS analysis via Reuters, February 2023).
  • According to OpenAI’s usage study published Sept. 15, 2025, in the NBER working-paper series, by July 2025, ChatGPT had ~700 million users sending ~18 billion messages per week, or about 10 % of the world’s adults.
  • Barclays Research projects ChatGPT-class assistants will reach ~1 billion daily active users by 2026 (Barclays note, December 2024).

In other words: Google took ~9 years to reach its mass-adoption threshold. ChatGPT is on pace to do it in ~4.

That slope is a wake-up call.

Four converging forces explain why 2026 is the inflection year:

  1. Consumer scale: Barclays’ projection of 1 billion daily active users by 2026 means assistants are no longer a novelty; they’re a mainstream habit (Barclay’s).
  2. Enterprise distribution: Gartner forecasts that about 40 % of enterprise applications will ship with task-doing AI agents by 2026. Assistants will appear inside the software your customers already use at work (Gartner Hype Cycle report cited by CIO&Leader, August 2025).
  3. Infrastructure rails: Citi projects ≈ $490 billion in AI-related capital spending in 2026, building the GPUs and data-center footprint that drop latency and per-interaction cost (Citi Research note summarized by Reuters, September 2025).
  4. Capability step-change: Sam Altman has described 2026 as a “turning-point year” when models start “figuring out novel insights” and by 2027, become reliable task-doing agents (Sam Altman blog, June 2025). And yes, this is the soda salesman telling us what’s right here, but still, you get the point, I hope.

This isn’t a calendar-day switch-flip. It’s the slope of a curve that gets steep enough that, by late 2026, most consumers will encounter an assistant every day, often without realizing it.

What Mass Adoption Feels Like For Consumers

If the projections hold, the assistant experience by late 2026 will feel less like opening a separate chatbot app and more like ambient computing:

  • Everywhere-by-default: built into your phone’s OS, browser sidebars, TVs, cars, banking, and retail apps.
  • From Q&A to “do-for-me”: booking travel, filling forms, disputing charges, summarizing calls, even running small projects end-to-end.
  • Cheaper and faster: thanks to the $490 billion infrastructure build-out, response times drop and the habit loop tightens.

Consumers won’t think of themselves as “using an AI chatbot.” They’ll just be getting things done, and that subtle shift is where the search industry’s challenge begins. And when 1 billion daily users prefer assistants for [specific high-value queries your audience cares about], that’s not just a UX shift, it’s a revenue channel migration that will impact your work.

The SEO & Visibility Reckoning

Mass adoption of assistants doesn’t kill search; it moves it upstream.

When the first answer or action happens inside an assistant, our old SERP tactics start to lose leverage. Three shifts matter most:

1. Zero-Click Surfaces Intensify

Assistants answer in the chat window, the sidebar, the voice interface. Fewer users click through to the page that supplied the answer.

2. Chunk Retrievability Outranks Page Rank

Assistants lift the clearest, most verifiable chunks, not necessarily the highest-ranked page. OpenAI’s usage paper shows that three-quarters of consumer interactions already focus on practical guidance, information, and writing help (NBER working paper, September 2025). That means assistants favor well-structured task-led sections over generic blog posts. Instead of optimizing “Best Project Management Software 2026” as a 3,000-word listicle, for example, you need “How to set up automated task dependencies” as a 200-word chunk with a code sample and schema markup.

3. Machine-Validated Authority Wins

Systems prefer sources they can quote, timestamp, and verify: schema-rich pages, canonical PDFs/HTML with stable anchors, authorship credentials, inline citations.

The consumer adoption numbers grab headlines, but the enterprise shift may hit harder and faster.

When Gartner forecasts that 40% of workplace applications will ship with embedded agents by 2026, that’s not about adding a chatbot to your product; it’s about your buyer’s daily tools becoming information gatekeepers.

Picture this: A procurement manager asks their Salesforce agent, “What’s the best solution for automated compliance reporting?” The agent surfaces an answer by pulling from its training data, your competitor’s well-structured API documentation, and a case study PDF it can easily parse. Your marketing site with its video hero sections and gated whitepapers never enters the equation.

This isn’t hypothetical. Microsoft 365 Copilot, Salesforce Einstein, SAP Joule, these aren’t research tools. They’re decision environments. If your product docs, integration guides, and technical specifications aren’t structured for machine retrieval, you’re invisible at the moment of consideration.

The enterprise buying journey is moving upstream to the data layer before buyers ever land on your domain. Your visibility strategy needs to meet them there.

A 2026-Ready Approach For SEOs And Brands

Preparing for this shift isn’t about chasing a new algorithm update. It’s about becoming assistant-ready:

  1. Restructure content into assistant-grade chunks: 150-300-word sections with a clear claim > supporting evidence > inline citation, plus stable anchors so the assistant can quote cleanly.
  2. Tighten provenance and trust signals: rich schema (FAQ, HowTo, TechArticle, Product), canonical HTML + PDF versions, explicit authorship and last-updated stamps.
  3. Mirror canonical chunks in your help center, product manuals, developer docs to meet the assistants where they crawl.
  4. Expose APIs, sample data, and working examples so agents can act on your info, not just read it.
  5. Track attribution inside assistants to watch for brand or domain citations across ChatGPT, Gemini, Perplexity, etc., then double-down on the content that’s already surfacing.
  6. Get used to new tools that can help you surface new metrics and monitor in areas your original tools aren’t focused. (SERPReconRankbeeProfoundWaikayZipTie.dev, etc.)

Back To Verification

The mass-adoption moment in 2026 won’t erase SEO, but it will change what it means to be discoverable.

We can keep taking guidance at face value from the platforms that profit when we follow it, or we can go back to questioning why advice is given, testing what the machines actually retrieve, and trust. We used to have to learn, and we seem to have slipped into easy-button mode over the last 20 years.

Search is moving upstream to the data layer. If you want to stay visible when assistants become the first touch-point, start adapting now, because this time the curve isn’t giving you nine years to catch up.

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: Roman Samborskyi/Shutterstock

Microsoft Explains How To Optimize Content For AI Search Visibility via @sejournal, @MattGSouthern

Microsoft has shared guidance on structuring content to increase its likelihood of being selected for AI-generated answers across Bing-powered surfaces.

Much of the advice reiterates established SEO and UX practices such as clear titles and headings, structured layout, and appropriate schema.

The new emphasis is on how content is selected for answers. Microsoft stresses there is “no secret sauce” that guarantees selection, but says structure, clarity, and “snippability” improve eligibility.

As Microsoft puts it:

“In traditional search, visibility meant appearing in a ranked list of links. In AI search, ranking still happens, but it’s less about ordering entire pages and more about which pieces of content earn a place in the final answer.”

Key Differences In AI Search

AI assistants break down pages into manageable parts, carefully assessing each for authority and relevance, then craft responses by blending information from multiple sources.

Microsoft says fundamentals such as crawlability, metadata, internal links, and backlinks still matter, but they are the starting point. Selection increasingly depends on how well-structured and clear each section is.

Best Practices Microsoft Recommends

To help improve the chances of AI selecting your content, Microsoft recommends these best practices:

  • Align the title, meta description, and H1 to clearly communicate the page purpose.
  • Use descriptive H2/H3 headings that each cover one idea per section.
  • Write self-contained Q&A blocks and concise paragraphs that can be quoted on their own.
  • Use short lists, steps, and comparison tables when they improve clarity (without overusing them).
  • Add JSON-LD schema that matches the page type.

What To Avoid

Microsoft recommends avoiding these practices to improve the chances of your content appearing in AI search results:

  • Writing long walls of text that blur ideas together.
  • Hiding key content in tabs, accordions, or other elements that may not render.
  • Relying on PDFs for core information.
  • Putting important information only in images without alt text or HTML alternatives.
  • Making vague claims without providing specific details.
  • Overusing decorative symbols or long punctuation strings; keep punctuation simple.

Why This Matters

The key takeaway is that structure helps selection. When your titles, headings, and schema are aligned, Copilot and other Bing-powered tools can extract a complete idea from your page.

This connects traditional SEO principles to how AI assistants generate responses. For marketers, it’s more of an operational checklist than a new strategy.

Looking Ahead

Microsoft acknowledges there’s no guaranteed way to ensure inclusion in AI responses, but suggests that these practices can make content more accessible for its AI systems.


Featured Image: gguy/Shutterstock

What Our AI Mode User Behavior Study Reveals About The Future Of Search via @sejournal, @Kevin_Indig

Our new usability study of 37 participants across seven specific search tasks clearly shows that people:

  1. Read AI Mode
  2. Rarely click out, and
  3. Only leave when they are ready to transact.

From what we know, there isn’t another independent usability study that has explored AI Mode to this depth.

In May, I published an extensive two-part study of AI Overviews (AIOs) with Amanda, Eric Van Buskirk, and his team. Eric and I also collaborated on Propellic’s travel industry AI mode study.

We worked together again to bring you this week’s Growth Memo: a study that provides crucial insights and validation into the behaviors of people as they interact with Google’s AI Mode.

Since neither Google nor OpenAI (or anyone else) provides user data for their AI (Search) products, we’re filling a crucial gap.

We captured screen recordings and think-aloud sessions via remote study. The 250 unique tasks collected provide a robust data set for our analysis. (The complete methodology is provided at the end of this memo, including details about the seven search tasks.)

And you might be surprised by some of the findings. We were.

This is a longer post, so grab a drink and settle in.

Image Credit: Kevin Indig

Executive Summary

Our new usability study of Google’s AI Mode reveals how profoundly this feature changes user behavior.

  • AI Mode holds attention and keeps users inside. In roughly three‑quarters of the total user sessions, users never left the AI Mode pane – and 88 % of users’ first interactions were with the AI‑generated text. Engagement was high: The median time by task type was roughly 52-77 seconds.
  • Clicks are rare and mostly transactional. The median number of external clicks per task was zero. Yep. You read that right. Ze-ro. And 77.6% of sessions had zero external visits.
  • People skim but still make decisions in AI Mode. Over half of the tasks were classified as “skimmed quickly,” where users glance at the AI‑generated summary, form an opinion, and move on.
  • AI Mode delivers “site types” that match intent. It’s not just about meeting search query or prompt intents; AI Mode is citing sources that fit specific site categories (like marketplaces vs review sites vs brands).
  • Visibility, not traffic, is the emerging currency. Participants made their brand judgments directly from AI Mode outputs.

TL;DR? These are the core findings from this study:

  • AI Mode is sticky.
  • Clicks are reserved for transactions.
  • AI Mode matches site type with intent.
  • Product previews act like mini product detail pages (aka PDPs).

But before we dig in, a quick shout-out here to the team behind this study.

Together with Eric Van Buskirk’s team at Clickstream Solutions, I conducted the first broad usability study of Google’s AI Mode that uncovers not only crucial insights into how people interact with the hybrid search/AI chat engine, but also what kinds of branded sites AI Mode surfaces and when.

I want to highlight that Eric Van Buskirk was the research director. While we collaborated closely on shaping the research questions, areas of focus, and methodology, Eric managed the team, oversaw the study execution, and delivered the findings. Afterward, we worked side by side to interpret the data.

Click data is a great first pass for analysis on what’s happening in AI Mode, but with this usability study specifically, we essentially looked “over the shoulder” of real-life users as they completed tasks, which resulted in a robust collection of data to pull insights from.

Our testing platform was UXtweak.

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

Google’s own Sundar Pichai has been crystal clear: AI Mode isn’t a toy; it’s a proving ground for what the core search experience will look like in the future.

On the Lex Fridman podcast, Pichai said (bolding mine):

“Our current plan is AI Mode is going to be there as a separate tab for people who really want to experience that… But as features work, we’ll keep migrating it to the main page…” [1]

Google has argued these new AI-focused features are designed to point users to the web, but in practice, our data shows that users stick around and make decisions without clicking out. In theory, this could not only impact click-outs to organic results and citations, but also reduce external clicks to ads.

In August, I explored the reality behind Google’s own product cannibalization with AI Mode and AIOs:

Right now, according to Similarweb data, usage of the AI Mode tab on Google.com in the US has slightly dipped and now sits at just over 1%.

Google AIOs are now seen by more than 1.5 billion searchers every month, and they sit front and center. But engagement is falling. Users are spending less time on Google and clicking less pages.

But as Google rolls AI Mode out more broadly, it brings the biggest shift to Search (the biggest customer acquisition channel there is) ever.

Traditional SEO is highly effective in the new AI world, but if AI Mode really becomes the default, there is a chance we need to rethink our arsenal of tactics.

Preparing for the future of search means treating AI Mode as the destination (not the doorway), and figuring out how to show up there in ways that actually matter to real user behavior.

With this study, I sought out to discover and validate actual user behaviors within the AI Mode experience when undertaking a variety of tasks with differing search intents.

1. AI Mode Is Sticky

Image Credit: Kevin Indig

Key Stats

People read first and usually stay inside the AI Mode experience. Here’s what we found:

  • The majority of sessions had zero external visits: meaning, they didn’t leave AI Mode (at all).
  • ~88% of users’ first interaction* within the feature was with the AI Mode text.
  • Typical user engagement within AI Mode is roughly 50 to 80 seconds per task.

These three stats define the AI Mode search surface: It holds attention and resolves many tasks without sending traffic.

*Here’s what I mean by “interaction:”

  • An “interaction” within the user tasks = the participant meaningfully engaged with AI Mode after it loaded.
  • What counts as an interaction: Reading or scrolling the AI Mode body for more than a quick glance, including scanning a result block like the Shopping Pack or Right Pane, opening a merchant card, clicking an inline link, link icon, or image pack.
  • What doesn’t count as an interaction: Brief eye flicks, cursor passes, or hesitation before engaging.

Users are in AI Mode to read – not necessarily to browse or search – with ~88% of sessions interacting with the output’s text first and spending one minute or more within the AI Mode experience.

Plus, it’s interesting to see that users spend more than double the time in AI Mode compared to AIOs.

The overall engagement is much stronger.

Image Credit: Kevin Indig

Why It Matters

Treat the AI Mode panel like the primary reading surface, not a teaser for blue links.

AI Mode is a contained experience where sending clicks to websites is a low priority and giving users the best answer is the highest one.

As a result, it completely changes the value chain for content creators, companies, and publishers.

Insight

Why do other sources and/or AI Mode research analyses say that users don’t return to the AI Mode feature very often?

My theory here is that, because AI mode is a separate search experience (at least, for now), it’s not as visible as AIOs.

As AI Mode adoption increases with Google bringing Gemini (and AI Mode) into the browser, I expect our study findings to scale.

2. Clicks Are Reserved For Transactions

While clicks are scarce, purchase intent is not.

Participants in the study only clicked out when the task demanded it (e.g., “put an item in your shopping cart”) or if they browsed around a bit.

However, the browsing clicks were so few that we can safely assume AI Mode only leads to click-outs when users want to purchase.

Even prompts with a comparison and informational intent tend to keep users inside the feature.

  • Shopping prompts like [canvas bag] and [tidy desk cables] drive the highest AI Mode exit share.
  • Comparison prompts like [Oura vs Apple Watch] show the lowest exit share of the tasks.

When participants were encouraged to take action (“put an item in your shopping cart” or “find a product”), the majority of clicks went to shopping features like Shopping Packs or Merchant Cards.

Image Credit: Kevin Indig

18% of exits were caused by users exiting AI Mode and going directly to another site, making it much harder to reverse engineer what drove these visits in the first place.

Study transcripts confirm that participants often share out loud that they’ll “go to the seller’s page,” or “find the product on Amazon/ebay” for product searches.

Even when comparing products, whether software or physical goods, users barely click out.

Image Credit: Kevin Indig

In plain terms, AI mode eats up all TOFU and MOFU clicks. Users discover products and form opinions about them in AI Mode.

Key Stats

  • Out of 250 valid tasks, the median number of external clicks was zero!
  • The prompt task of [canvas bag] had 44 external clicks, and [tidy desk cables] had 31 clicks, accounting for two-thirds of all external clicks in this study.
  • Comparison tasks like [Oura Ring vs Apple Watch] or [Ramp vs Brex] had very few clicks (≤6 total across all tasks).

Here’s what’s interesting…

In the AIOs Overviews usability study, we found desktop users click out ~10.6% of the time compared to practically 0% in AI Mode.

However, AIOs have organic search results and SERP Features below them. (People click out less in AIOs, but they click on organic results and SERP features more often.)

Zero-Clicks

  • AI Overviews: 93%*
  • AI Mode: ~100%

*Keep in mind that participants of the AIO usability study clicked on regular organic search results. The 93% relates to zero clicks within the AI Overview.

On desktop, AI Mode produces roughly double the in-panel clickouts compared to the AIO panel. On AIO SERPs, total clickouts can still happen via organic results below the panel, so the page-level rate will sit between the AIO-panel figure and the classic baseline.

An important note here from Eric Van Kirk, the director of this study: When comparing the AI Mode and AI Overview study, we’re not exactly comparing apples to apples. In this study, participants were given tasks that would prompt them to leave AI Mode in 2/7 questions, and that accounts for the majority of outbound clicks (which were fewer than three external clicks). On the other hand, for the AIO study, the most transactional question was “Find a portable charger for phones under $15. Search as you typically would.” They were not told to “put it in a shopping cart.” However, the insights gathered regarding user behavior from this AI Mode study – and the pattern that users don’t feel the need to click out of AI Mode to make additional decisions – still stands as a solid finding.

The bigger picture here is that AIOs are like a fact sheet that steers users to sites eventually, but AI Mode is a closed experience that rarely has users clicking out.

What makes AI Mode (and ChatGPT, by the way) tricky is when users abandon the experience and go directly to websites. It messes with attribution models and our ability to understand what influences conversions.

3. AI Mode Matches Site Type With Intent

In the study, we assess what types of sites AI Mode shows for our seven tasks.

The types are:

  • Brands: Sellers/vendors.
  • Marketplaces: amazon.com, ebay.com, walmart.com, homedepot.com, bestbuy.com, target.com, rei.com.
  • Review sites: nerdwallet.com, pcmag.com, zdnet.com, nymag.com, usatoday.com, businessinsider.com.
  • Publishers: nytimes.com, nbcnews.com, youtube.com, thespruce.com.
  • Platform: Google.
Image Credit: Kevin Indig

Shopping prompts route to product pages:

  • Canvas Bag: 93% of exits go to Brand + Marketplace.
  • Tidy desk cables: 68% go to Brand + Marketplace, with a visible Publisher slice.

Comparisons route to reviews:

  • Ramp vs Brex: 83% Review.
  • Oura vs Apple Watch: split 50% Brand and 50% Marketplace.

When the user has to perform a reputation check, the result is split brand and publishers:

  • Liquid Death: 56% Brand, 44 % Publisher.

Google itself shows up on shopping tasks:

  • Store lookups to business.google.com appear on Canvas Bag (7%) and Tidy desk cables (11%).

Check out the top-clicked domains by task:

  • Canvas Bag: llbean.com, ebay.com, rticoutdoors.com, business.google.com.
  • Tidy desk cables: walmart.com, amazon.com, homedepot.com.
  • Subscription language apps vs free: pcmag.com, nytimes.com, usatoday.com.
  • Bottled Water (Liquid Death): reddit.com, liquiddeath.com, youtube.com.
  • Ramp vs Brex: nerdwallet.com, kruzeconsulting.com, airwallex.com.
  • Oura Ring 3 vs Apple Watch 9: ouraring.com, zdnet.com.
  • VR arcade or smart home: sandboxvr.com, business.google.com, yodobashi.com.

Companies need to understand the playing field. While classic SEO allowed basically any site to be visible for any user intent, AI Mode has strict rules:

  • Brands beat marketplaces when users know what product they want.
  • Marketplaces are preferred when options are broad or generic.
  • Review sites appear for comparisons.
  • Opinions highlight Reddit and publishers.
  • Google itself is most visible for local intent, and sometimes shopping.

As SEOs, we need to consider how Google classifies our site based on its page templates, reputation, and user engagement. But most importantly, we need to monitor prompts in AI Mode and look at the site mix to understand where we can play.

Sites can’t and won’t be visible for all types of queries in a topic anymore; you’ll need to filter your strategy by the intent that aligns with your site type because AI Mode only shows certain sites (like review sites or brands) for specific types of intent.

Product previews show up in about 25% of the AI Mode sessions, get ~9 seconds of attention, and people usually open only one.

Then? 45% stop there. Many opens are quick spec checks, not a clickout.

Image Credit: Kevin Indig

You can easily see how some product recommendations by AI Mode and on-site experiences are quite frustrating to users.

The post-click experience is critical: classic best practices like reviews have a big impact on making the most out of the few clicks we still get.

See this example:

“It looks like it has a lot of positive reviews. That’s one thing I would look at if I was going to buy this bag. So this would be the one I would choose.”

In shopping tasks, we found that brand sites take the majority of exits.

In comparison tasks, we discovered that review sites dominate. For reputation checks (like a prompt for [Liquid Death]), exits to brands and publishers were split.

  • For transactional intent prompts: Brands absorb most exits when the task is to buy one item now. [Canvas Bag] shows a strong tilt to brand PDPs.
  • For reputation intent prompts: Brand sites appear alongside publishers. A prompt for [Liquid Death] splits between liquiddeath.com and Reddit/YouTube/Eater.
  • For comparison prompts: Brands take a back seat. [Ramp vs Brex] exits go mostly to review sites like NerdWallet and Kruze.

Given users can now directly checkout on ChatGPT and AI Mode, shopping-related tasks might send even fewer clicks out.[23]

Therefore, AI Mode becomes a completely closed experience where even shopping intent is fulfilled right in the app.

Clicks are scarce. Influence is plentiful.

The data gives us a reality check: If users continue to adopt the new way of Googling, AI Mode will reshape search behavior in ways SEOs can’t afford to ignore.

  • Strategy shifts from “get the click” to “earn the citation.”
  • Comparisons are for trust, not traffic. They reduce exits because users feel informed inside the panel.
  • Merchants should optimize for decisive exits. Give prices, availability, and proof above the fold to convert the few exits you do get.

You’ll need to earn citations that answer the task, then win the few, high-intent exits that remain.

But our study doesn’t end here.

Today’s results reveal core insights into how people interact with AI Mode. We’ll unpack more to consider with Part 2 dropping next week.

But for those who love to dig into details, the methodology of the study is included below.

Methodology

Study Design And Objective

We conducted a mixed-methods usability study to quantify how Google’s new AI Mode changes searcher behavior. Each participant completed seven live Google search prompts via the AI Mode feature. This design allows us to observe both the mechanics of interaction (scrolls, clicks, dwell, trust) and the qualitative reasoning participants voiced while completing tasks.

The tasks:

  1. What do people say about Liquid Death, the beverage company? Do their drinks appeal to you?
  2. Imagine you’re going to buy a sleep tracker and the only two available are the Oura Ring 3 or the Apple Watch 9. Which would you choose, and why?
  3. You’re getting insights about the perks of a Ramp credit card vs. a Brex Card for small businesses. Which one seems better? What would make a business switch from another card: fee detail, eligibility fine print, or rewards?
  4. In the “Ask anything” box in AI Mode, enter “Help me purchase a waterproof canvas bag.” Select one that best fits your needs and you would buy (for example, a camera bag, tote bag, duffel bag, etc.).
    • Proceed to the seller’s page. Click to add to the shopping cart and complete this task without going further.
  5. Compare subscription language apps to free language apps. Would you pay, and in what situation? Which product would you choose?
  6. Suppose you are visiting a friend in a large city and want to go to either: 1. A virtual reality arcade OR 2. A smart home showroom. What’s the name of the city you’re visiting?
  7. 1. Suppose you work at a small desk and your cables are a mess. 2. In the “Ask anything” box in AI Mode, enter: “The device cables are cluttering up my desk space. What can I buy today to help?” 3. Then choose the one product you think would be the best solution. Put it in the shopping cart on the external website and end this task.

Thirty-seven English-speaking U.S. adults were recruited via Prolific between Aug. 20 and Sept. 1, 2025 (including participants in a small group who did pilot studies).*

Eligibility required a ≥ 95% Prolific approval rate, a Chromium-based browser, and a functioning microphone. Participants visited AI Mode and performed tasks remotely via their desktop computer; invalid sessions were excluded for technical failure or non-compliance. The final dataset contains over 250 valid task records across 37 participants.

*Pilot studies are conducted first in remote usability testing to identify and fix technical issues – like screen-sharing, task setup, or recording problems – before the main study begins. They help refine task wording, timing, and instructions to ensure participants interpret them correctly. Most importantly, pilot sessions confirm that the data collected will actually answer the research questions and that the methodology works smoothly in a real-world remote setting.

Sessions ran in UXtweak’s Remote unmoderated mode. Participants read a task prompt, clicked to Google.com/aimode, prompted AI Mode, and spoke their thoughts aloud while interacting with AI Mode. They were given the following directions: “Think aloud and briefly explain what draws your attention as you review the information. Speak aloud and hover your mouse to indicate where you find the information you are looking for.” Each participant completed seven task types designed to cover diverse intent categories, including comparison, transactional, and informational scenarios.

UXtweak recorded full-screen video, cursor paths, scroll events, and audio. Sessions averaged 20-25 minutes. Incentives were competitive. Raw recordings, transcripts, and event logs were exported for coding and analysis.

Three trained coders reviewed each video in parallel. A row was logged for UI elements that held attention for ~5 seconds or longer. Variables captured included:

  • Structural: Fields describing the setup, metadata, or structure of the study – not user behavior; include data like participant-ID, task-ID, device, query, order of UI elements clicked or visited during the task, type of site clicked (e.g., social, community, brand, platform), domain name of the external site visited, and more.
  • Feature: Fields describing UI elements or interface components that appeared or were available to the participant. Examples include UI element type, including shopping carousels, merchant cards, right panel, link icons, map embed, local pack, GMB card, merchant packs, and merchant cards.
  • Engagement: Fields that capture active user interaction, attention, or time investment. Includes reading and attention, chat and question behavior, along with click and interaction behavior.
  • Outcome: Fields representing user results, annotator evaluations, or interpretation of behavior. Annotator comments, effort rating, where info was found.

Coders also marked qualitative themes (e.g., “speed,” “skepticism,” “trust in citations”) to support RAG-based retrieval. The research director spot-checked ~10% of videos to validate consistency.

Annotations were exported to Python/pandas 2.2. Placeholder codes (‘999=Not Applicable’, ‘998=Not Observable’) were removed, and categorical variables (e.g., appearances, clicks, sentiment) were normalized. Dwell times and other time metrics were trimmed for extreme outliers. After cleaning, ~250 valid task-level rows remained.

Our retrieval-augmented generation (RAG) pipeline enabled three stages of analysis:

  • Data readiness (ingestion): We flattened every participant’s seven tasks into individual rows, cleaned coded values, and standardized time, click, and other metrics. Transcripts were retained so that structured data (such as dwell time) could be associated with what users actually said. Goal: create a clean, unified dataset that connects behavior with reasoning.
  • Relevance filtering (retrieval): We used structured fields and annotations to isolate patterns, such as users who left AI Mode, clicked a merchant card, or showed hesitation. We then searched the transcripts for themes such as trust, convenience, or frustration. Goal: combine behavior and sentiment to reveal real user intent.
  • Interpretation (quant + qual synthesis): For each group, we calculated descriptive stats (dwell, clicks, trust) and paired them with transcript evidence. That’s how we surfaced insights like: “external-site tasks showed higher satisfaction but more CTA confusion.” Goal: link what people did with what they felt inside AI Mode.

This pipeline allowed us to query the dataset hyperspecifically – e.g., “all participants who scrolled >50% in AI Mode but expressed distrust” – and link quantitative outcomes with qualitative reasoning.

In plain terms: We can pull up just the right group of participants or moments, like “all the people who didn’t trust AIO” or “everyone who scrolled more than 50%.”

We summarized user behavior using descriptive and inferential statistics across 250 valid task records. Each metric included the count, mean, median, standard deviation, standard error, and 95% confidence interval. Categorical outcomes, such as whether participants left AI Mode or clicked a merchant card, were reported as proportions.

Analyses covered more than 50 structured and behavioral fields – from device type and dwell time to UI interactions, sentiment. Confidence measures were derived from a JSON analysis of user sentiment via transcripts of all users.

Each task was annotated by a trained coder and spot-checked for consistency across annotators. Coder-level distributions were compared to confirm stable labeling patterns and internal consistency.

Thirty-seven participants completed seven tasks each, resulting in approximately 250 valid tasks. At that scale, proportions around 50% carry a margin of error of about six percentage points, giving the dataset enough precision to detect meaningful directional differences.

Sample size is smaller than our AI Overviews study (37 vs. 69 participants) and is meant to learn about U.S.-based users (all participants were living in the U.S.). All queries took place within AI Mode, meaning we did not directly compare AI vs non-AI conditions. Think-aloud may inflate dwell times slightly. RAG-driven coding is only as strong as its annotation inputs, though heavy spot-checks confirmed reliability.

Participants gave informed consent. Recordings were encrypted and anonymized; no personally identifying data were retained. The study conforms to Prolific’s ethics policy and UXtweak TOS.


Featured Image: Paulo Bobita/Search Engine Journal