Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!
For years, SEOs have operated on a simple assumption: The more ground your content covers, the more likely it is to surface in AI-generated answers. In fact, every “best practice” in classic SEO content pushes you toward more: more subtopics, more sections, more words. Build the “ultimate guide.”
An analysis of 815,000 query-page pairs across 16,851 queries and 353,799 pages says otherwise:
Fan-out coverage is nearly irrelevant to citation rates.
Two signals actually predict whether ChatGPT cites your page.
Six concrete changes to your existing content library help.
1. The Study
AirOps ran 16,851 queries through ChatGPT three times each through the UI, capturing every fan-out sub-query, every URL searched, every citation made, and every page scraped. Oshen Davidson built the pipeline. I analyzed the data.
Each query generates an average of two fan-out queries. ChatGPT retrieves roughly 10 URLs per sub-search, reads through them, then selects which ones to cite. We scored how well each page’s H2-H4 subheadings matched those fan-out queries using cosine similarity on bge-base-en-v1.5 embeddings. That score is what we call fan-out coverage: the share of subtopics a page addresses at a 0.80 similarity threshold. (The 0.80 similarity threshold cutoff was used to decide whether a subheading counts as a match to a fan-out query. Think of it as a relevance bar.)
The question: Do pages with higher fan-out coverage get cited more?
You’ll find even more information in the co-written AirOps report.
2. Density Barely Moves The Needle
Across 815,484 rows, the relationship between fan-out coverage and citation is weak.
Covering 100% of subtopics adds 4.6 percentage points over covering none. That gap shrinks further when you control for query match (how well the page’s best heading matches the original query). Among pages with strong query match (>= 0.80 cosine similarity):
Image Credit: Kevin Indig
Moderate coverage (26-50%) outperforms exhaustive coverage. Pages that cover everything score lower than pages that cover a quarter of the subtopics. The “ultimate guide” strategy produces worse results than a focused article that covers two to three related angles well.
3. What Actually Predicts Citation
These two signals dominate: retrieval rank and query match.
1. Retrieval rank is the strongest predictor by a wide margin. A page at position 0 in ChatGPT’s web search results (the first URL returned by its search tool) has a 58% citation rate. By position 10, that drops to 14%. We ran each prompt three times consecutively for this analysis, and pages cited in all three runs have a median retrieval rank of 2.5. Pages never cited: median rank 13.
Image Credit: Kevin Indig
2. Query match (cosine similarity between the query and the page’s best heading) is the strongest content signal. Pages with a 0.90+ heading match have a 41% citation rate compared to the 30% rate for pages below 0.50. Even among top-ranked pages (position 0-2), higher query match adds 19 percentage points.
Fan-out coverage, word count, heading count, domain authority: all secondary. Some are flat. Some are inversely correlated.
4. The Wikipedia Exception
One site type breaks the pattern. Wikipedia has the worst retrieval rank in the dataset (median 24) and the lowest query match score (0.576). It still achieves the highest citation rate: 59%.
Wikipedia pages average 4,383 words, 31 lists, and 6.6 tables. They are encyclopedic in the literal sense. ChatGPT cites Wikipedia from deep in the search results where every other site type gets ignored.
This is density working as a signal, but at a scale no publisher can replicate. Wikipedia’s content is exhaustive, richly structured, and cross-linked across millions of topics. A 3,000-word corporate blog post with 15 subheadings is not the same thing.
5. The Bimodal Reality
58% of pages retrieved by ChatGPT in this dataset are never cited. 25% are always cited when they appear. Only 17% fall in between.
The always-cited and never-cited groups look nearly identical on most content metrics: similar word counts (~2,200), similar heading counts (~20), similar readability scores (~12 FK grade), similar domain authority (~54). The on-page signals we can measure do not separate winners from losers.
What separates them is retrieval rank. Always-cited pages rank near the top when they surface. Never-cited pages rank in the bottom half. The retrieval system, whatever signals it uses internally, is the gatekeeper. Everything else is a tiebreaker.
6. What This Means For Your Content
Conventional SEO content writing wisdom says cover more subtopics, add more sections, build density. The data says the conventional approach produces “mixed” pages, the 17% in the middle that get cited sometimes and ignored other times.
Mixed pages have the highest word counts, the most headings, and the highest domain authority in the dataset. They are the “ultimate guides.” They are also the least reliable performers in ChatGPT.
The pages that win consistently are focused. They:
Match the query directly in their headings,
Tend to be shorter (the citation sweet spot is 500-2,000 words), and
Have enough structure (7-20 subheadings) to organize the content without diluting it.
Build the page that is the best answer to one question. Not the page that adequately answers 20.
Featured Image: Tero Vesalainen/Shutterstock; Paulo Bobita/Search Engine Journal
Google’s John Mueller answered a question on Reddit about why Google picks one web page over another when multiple pages have duplicate content, also explaining why Google sometimes appears to pick the wrong URL as the canonical.
Canonical URLs
The word canonical was previously mostly used in the religious sense to describe what writings or beliefs were recognized to be authoritative. In the SEO community, the word is used to refer to which URL is the true web page when multiple web pages share the same or similar content.
Google enables site owners and SEOs to provide a hint of which URL is the canonical with the use of an HTML attribute called rel=canonical. SEOs often refer to rel=canonical as an HTML element, but it’s not. Rel=canonical is an attribute of the element. An HTML element is a building block for a web page. An attribute is markup that modifies the element.
Why Google Picks One URL Over Another
A person on Reddit asked Mueller to provide a deeper dive on the reasons why Google picks one URL over another.
“Hey John, can I please ask you to go a little deeper on this? Let’s say I want to understand why Google thinks two pages are duplicate and it chooses one over the other and the reason is not really in plain sight. What can one do to better understand why a page is chosen over another if they cover different topics? Like, IDK, red panda and “regular” panda 🐼. TY!!”
Mueller answered with about nine different reasons why Google chooses one page over another, including the technical reasons why Google appears to get it wrong but in reality it’s someetimes due to something that the site owner over SEO overlooked.
Here are the nine reasons he cited for canonical choices:
Exact duplicate content The pages are fully identical, leaving no meaningful signal to distinguish one URL from another.
Substantial duplication in main content A large portion of the primary content overlaps across pages, such as the same article appearing in multiple places.
Too little unique main content relative to template content The page’s unique content is minimal, so repeated elements like navigation, menus, or layout dominate and make pages appear effectively the same.
URL parameter patterns inferred as duplicates When multiple parameterized URLs are known to return the same content, Google may generalize that pattern and treat similar parameter variations as duplicates.
Mobile version used for comparison Google may evaluate the mobile version instead of the desktop version, which can lead to duplication assessments that differ from what is manually checked.
Googlebot-visible version used for evaluation Canonical decisions are based on what Googlebot actually receives, not necessarily what users see.
Serving Googlebot alternate or non-content pages If Googlebot is shown bot challenges, pseudo-error pages, or other generic responses, those may match previously seen content and be treated as duplicates.
Failure to render JavaScript content When Google cannot render the page, it may rely on the base HTML shell, which can be identical across pages and trigger duplication.
Ambiguity or misclassification in the system In some cases, a URL may be treated as duplicate simply because it appears “misplaced” or due to limitations in how the system interprets similarity.
Here’s Mueller’s complete answer:
“There is no tool that tells you why something was considered duplicate – over the years people often get a feel for it, but it’s not always obvious. Matt’s video “How does Google handle duplicate content?” is a good starter, even now.
Some of the reasons why things are considered duplicate are (these have all been mentioned in various places – duplicate content about duplicate content if you will :-)): exact duplicate (everything is duplicate), partial match (a large part is duplicate, for example, when you have the same post on two blogs; sometimes there’s also just not a lot of content to go on, for example if you have a giant menu and a tiny blog post), or – this is harder – when the URL looks like it would be duplicate based on the duplicates found elsewhere on the site (for example, if /page?tmp=1234 and /page?tmp=3458 are the same, probably /page?tmp=9339 is too — this can be tricky & end up wrong with multiple parameters, is /page?tmp=1234&city=detroit the same too? how about /page?tmp=2123&city=chicago ?).
Two reasons I’ve seen people get thrown off are: we use the mobile version (people generally check on desktop), and we use the version Googlebot sees (and if you show Googlebot a bot-challenge or some other pseudo-error-page, chances are we’ve seen that before and might consider it a duplicate). Also, we use the rendered version – but this means we need to be able to render your page if it’s using a JS framework for the content (if we can’t render it, we might take the bootstrap HTML page and, chances are it’ll be duplicate).
It happens that these systems aren’t perfect in picking duplicate content, sometimes it’s also just that the alternative URL feels obviously misplaced. Sometimes that settles down over time (as our systems recognize that things are really different), sometimes it doesn’t.
If it’s similar content then users can still find their way to it, so it’s generally not that terrible. It’s pretty rare that we end up escalating a wrong duplicate – over the years the teams have done a fantastic job with these systems; most of the weird ones are unproblematic, often it’s just some weird error page that’s hard to spot.”
Takeaway
Mueller offered a deep dive into the reasons why Google chooses canonicals. He described the process of choosing canonicals as like a fuzzy sorting system built from overlapping signals, with Google comparing content, URL patterns, rendered output, and crawler-visible versions, while borderline classifications (“weird ones”) are given a pass because they don’t pose a problem.
The system outlined in the patent starts with evaluation. Google analyzes a query, the user’s context, and a set of candidate landing pages — likely the pages it would have ranked otherwise.
The system grades pages on several points. Low grades might result from missing product details, thin content, weak navigation, or poor engagement signals. The system could then generate new versions of those pages tailored to individual users.
Two searchers who enter identical queries for running shoes, for example, might see different landing pages: one shows product comparisons, while the other provides a direct path to purchase.
The AI-generated pages are not static. The patent describes feedback loops that measure user behavior, such as clicks, time on page, and conversions. Those signals go back into the system, refining future versions.
The result is a dynamic experience. Google could generate many pages and send each searcher to a unique, customized version. Shopping-related queries could conceivably land on a page with purchase options.
A likely path for dynamic pages is through AI Overviews, which already summarize information. A next step could expand those summaries into interactive experiences and, perhaps, new web pages.
Google increasingly provides on-page answers to search queries, separating businesses from would-be customers.
Trend
The patent — US12536233B1, issued by the U.S. Patent and Trademark Office on January 27, 2026 — has drawn significant attention.
For example, Greg Zakowicz, an ecommerce and marketing consultant, described the concept as “a new layer in the economics of search.”
That idea of a new layer points to the growing tension between website owners and the various platforms that index and ingest their pages.
Yet there has long been something of a give-and-take between search and content. Each party — platform and page owner — needed the other. But over the years, an evolving search industry has separated would-be customers from businesses.
Discovery. Early on, Google returned blue links that sent users to websites for answers and transactions.
Answers. Google introduced its Knowledge Graph in 2012 and began surfacing facts directly from its own entity database.
Evaluation. Rich results used structured data to display reviews, product details, and recipes, helping searchers with decisions.
Extraction. In 2014, Google rolled out featured snippets that extracted answers from websites, providing information without a click.
Interaction. Vertical search experiences, such as Shopping, Flights, and Hotels, introduced full interfaces for comparison and decision-making.
Synthesis. More recently, AI Overviews ingest content from external pages into a single response, guiding decisions in a more conversational format.
Experience. The patent described here suggests a next step wherein AI-generated pages get the clicks.
Each new layer changes the “economics of search,” as Zakowitz puts it.
Ecommerce Impact
Patents do not guarantee outcomes. Google may never introduce intermediary landing pages. But the concept aligns with a natural progression in search.
To a degree, each new layer lessens the influence of website owners, including ecommerce merchants, over layout, messaging, and product presentation. The experience becomes algorithmically assembled.
That shift places a premium on relationships that merchants control.
Owned audiences, such as email and SMS subscribers, are direct connections that search interfaces or AI layers do not mediate.
A shopper who arrives via a newsletter or a marketing message has chosen the brand, not an algorithmically assembled page. As more discovery happens within platforms, those direct channels become a form of insulation.
Conversely, data becomes important for search visibility. If systems as described in the patent rely on structured inputs, then product feeds, Schema.org markup, and clean attribute data may determine how and whether items appear in generated experiences. In effect, the merchant’s role shifts from designing pages to supplying quality inputs. The opportunity to garner clicks remains.
Thus the combined challenges of generating direct traffic and encouraging search discovery have familiar solutions: (i) own the customer relationship whenever possible, and (ii) optimize content so bots, programs, and algorithms can read it.
Google added a new section to its spam policies designating “back button hijacking” as an explicit violation under the malicious practices category. Enforcement begins on June 15, giving websites two months to make changes.
Google published a blog post explaining the policy. It also updated the spam policies documentation to list back-button hijacking alongside malware and unwanted software as a malicious practice.
What Is Back Button Hijacking
Back button hijacking occurs when a site interferes with browser navigation and prevents users from returning to the previous page. Google’s blog post describes several ways this can happen.
Users might be sent to pages they never visited. They might see unsolicited recommendations or ads. Or they might be unable to navigate back at all.
Google wrote in the blog post:
“When a user clicks the ‘back’ button in the browser, they have a clear expectation: they want to return to the previous page. Back button hijacking breaks this fundamental expectation.”
Why Google Is Acting Now
Google said it’s seen an increase in this behavior across the web. The blog post noted that Google has previously warned against inserting deceptive pages into browser history, referencing a 2013 post on the topic, and said the behavior “has always been against” Google Search Essentials.
Google wrote:
“People report feeling manipulated and eventually less willing to visit unfamiliar sites.”
What Enforcement Looks Like
Sites involved in back button hijacking risk manual spam penalties or automated demotions, both of which can lower their visibility in Google Search results.
Google is giving a two-month grace period before enforcement starts on June 15. This follows a similar pattern to the March 2024 spam policy expansion, which also gave sites two months to comply with the new site reputation abuse policy.
Third-Party Code As A Source
Google’s blog post acknowledges that some back-button hijacking may not originate from the site owner’s code.
Google wrote:
“Some instances of back button hijacking may originate from the site’s included libraries or advertising platform.”
Google’s wording indicates sites can be affected even if issues come from third-party libraries or ad platforms, placing responsibility on websites to review what runs on their pages.
How This Fits Into Google’s Spam Policy Framework
The addition falls under Google’s category of malicious practices. That section discusses behaviors causing a gap between user expectations and experiences, including malware distribution and unwanted software installation. Google expanded the existing spam policy category instead of creating a new one.
The March 2026 spam update completed its rollout less than three weeks ago. That update enforced existing policies without adding new ones. Today’s announcement adds new policy language ahead of the June 15 enforcement date.
Why This Matters
Sites using advertising scripts, content recommendation widgets, or third-party engagement tools should audit those integrations before June 15. Any script that manipulates browser history or prevents normal back-button navigation is now a potential spam violation.
The two-month window is the compliance period. After June 15, Google can take manual or automated action.
Sites that receive a manual action can submit a reconsideration request through Search Console after fixing the issue.
Looking Ahead
Google hasn’t indicated whether enforcement will come through a dedicated spam update or through ongoing SpamBrain and manual review.
Imagine you’re a news publisher. Your journalism is good, you write original stories, and your website is relatively popular within your editorial niche.
Revenue is earned primarily via advertising. Google search is your biggest source of visitors.
Management demands growth, and elevates traffic to the throne of all key performance indicators. Engagement, loyalty, subscriptions – these are now secondary objectives. Getting the click, that is the driving purpose.
You look at your channels to determine where growth is most likely to come from. Search seems the most viable channel. So, you make SEO a key focus area.
As part of your SEO efforts, you come across specific tactics that cause your stories to generate more clicks. These tactics are very effective. Applying them to your stories results in significantly more traffic than before.
You’ve caught the scent. The chase for clicks is on.
These tactics demand that your stories focus on clicks above all. Within the context of these SEO-first tactics, every story is a traffic opportunity.
At first, you manage to apply these tactics within the framework of your existing journalism. Your stories are still good and unique, and you apply SEO as best you can to ensure each gets the best chance of generating traffic. It works, and your traffic grows.
But the pressures of management demand more. More growth. More revenue. More ad impressions. More traffic.
The newsroom submits. Stories are commissioned only if they have sufficient traffic potential. Journalists learn to just write stories that generate clicks. Headlines are crafted to maximize click-through rates, not to inform readers. You write multiple stories about the exact same news, each with a slightly different angle. Articles bury the lede.
Everything is subject to the chase.
Your scope expands. You don’t just write stories within your established specialism – you branch out. Different topics. New sections. Product reviews and recommendations. Listicles.
Everything is fair game, as long as it generates clicks.
And it works. Oh boy, does it work.
Image Credit: Barry Adams
The flywheel gathers momentum. You learn exactly what people click on, how to craft the perfect headline, select the ideal image, find the precise angle that will make people stop scrolling and tap on your article.
Traffic keeps growing.
But, somehow, you don’t feel entirely at ease. Because you know that, when you look at your content objectively, something has been lost. Your site used to be about journalism, about informing readers, improving knowledge and awareness, and enabling policies and decisions. It used to be good.
Now, none of that really matters anymore. Your site is about clicks. Everything else is secondary.
But management is happy. Revenue is up. Profits surge. So it’s alright, isn’t it?
Isn’t it?
Image Credit: Barry Adams
Google rolls out a core algorithm update. You lose 20% of your search traffic overnight. It’s a shot across the bow. A warning. But you ignore it. You focus on the chase even more. Tighter content focus. More variations of the same stories. Better SEO.
Traffic stabilizes. No more growth, but you’re chugging along nicely. You maybe change a few things, try to get back onto a growth curve. Nothing works, but you’re not losing either. Things look stable. You can live with this.
Then the next Google core update hits. You lose 50% of your current search traffic. It’s code red in the newsroom. All hands on deck.
How do we recover? How do we get this traffic back? It’s our traffic, Google owes us!
You do what you’ve gotten very good at. You SEO the hell out of your site. Everything is optimized and maximized. Your technical SEO goes from “that will do” to a state of such perfection it could make a web nerd cry. Your content output becomes even more focused on areas with the biggest traffic potential.
In the chase for revenue, you try alternative monetization. Affiliate content. Gambling promos. Advertorials. More listicles. More product recommendations. More of everything.
Then the next update arrives. You lose again.
And the next one.
And the next one.
You lose, almost every single time.
Image Credit: Barry Adams
It worked. Until it didn’t.
And now your site is on Google’s shitlist. Your relentless focus on growth at the expense of quality has accumulated so many negative signals that Google will not allow you to return to your previous heights.
You know none of what you try will work. Those traffic graphs won’t go back up. Every Google core update causes a new surge of existential dread: How much will we lose this time?
And yet, you still chase. You’ve long since lost the scent. But the chase still rules. Because you know that, to stop the chase, something needs to change. Something big and profound. And making that change will be painful. Extremely painful.
But do you have a choice?
Hindsight
I wish this scenario was unique, a singular publisher making the mistake of focusing on traffic at the expense of quality. But it’s a tragically common theme, played out in digital newsrooms hundreds of times over the last 10 years.
In every instance, at some point, the seductive appeal of traffic began to outweigh the journalistic principles of the organization. Compromises were made so growth could be achieved.
And because these compromises had the intended result – at first – there was nothing to caution the publisher from traveling further down this path.
Well, nothing besides Google shouting at every opportunity that you should focus on quality, not clicks.
Besides every SEO professional that has ever dealt with a bad algorithm update saying you should focus on quality, not clicks.
Besides your best journalists abandoning ship in favor of a quality-focused outlet or their own Substack.
Besides your own loyal readers abandoning your site because you stopped focusing on quality and went after clicks.
The writing has been on the wall, in huge capital letters, for the better part of a decade. Arguably, since 2018, when Google began rolling out algorithm updates to penalize low-effort content. If you’d been paying attention, none of this would have been a surprise.
Hey, maybe you did see it coming. But you weren’t able to make the required changes, because the clicks were still there. You were never going to deliberately abandon growth for some vague promise of sustainable traffic and audience loyalty.
If only you’d known that, once the Google hammer came down, the damage would be permanent. Maybe you wouldn’t have started the chase in the first place.
If only you’d known.
Recovery
When a site is so heavily affected by consecutive Google core updates, is there any hope of recovery? Can a website climb its way back to those vaulted traffic heights?
We need to be realistic and accept that those halcyon days of near-limitless traffic growth are not coming back. The ecosystem has changed. Growth is harder to achieve, and online news is working under a lower ceiling than ever before.
But recovery is possible, to an extent. You will never achieve the same traffic peaks as in your prime days, but you can claw back a significant chunk. Providing you are willing to do what it takes.
The recipe is simple, on paper: Everything you do should be in service of the reader.
Every story needs to be crafted to deliver maximum value for your readers. Every design element on your site needs to be optimized for the best user experience. Every headline must be informative first and foremost. Every article must deliver on its headline’s promise in spades. Every piece of content should serve to inform, educate, and delight your audience.
In short, your entire output should revolve around audience loyalty.
Not growth. Not traffic.
Loyalty.
Build a news platform so good that your readers don’t ever think about going anywhere else.
Of course, you still need traffic, but this must be a secondary concern. Start with your audience, and then apply layers on top of your stories to aid their traffic potential.
Your output should be focused on original journalism – not rehashing the same stories that others are reporting. If all you do is take someone else’s story and write different angles on it, you’re not doing journalism.
Provide breaking news, expert commentary, detailed analysis, and a deep focus on your editorial specialties.
And accept that your audience isn’t a singular entity, but consumes news on multiple platforms and in multiple formats. Video, podcasts, newsletters, social media, you name it. Fire on all channels, as best you can.
Sounds simple. But very few publishers I’ve spoken with have the internal fortitude for such drastic cultural changes in their online newsroom. Most of the publishers I consult with that were affected by core updates just want a list of quick wins, some easy fixes they can implement, and get their traffic back.
They want busy-work. They’re not interested in meaningful change. Because meaningful change is hard, and painful.
But also absolutely necessary.
That’s it for another edition. As always, thanks for reading and subscribing, and I’ll see you at the next one!
Google’s Sundar Pichai recently said that the future of Search is agentic, but what does that really mean? A recent tweet from Google’s search product lead shows what the new kind of task-based search looks like. It’s increasingly apparent that the internet is transitioning to a model where every person has their own agent running tasks on their behalf, experiencing an increasingly personal internet.
Search Is Becoming Task-Oriented
The internet, with search as the gateway to it, is a model where websites are indexed, ranked, and served to users who basically use the exact same queries to retrieve virtually the same sets of web pages. AI is starting to break that model because users are transitioning to researching topics, where a link to a website does not provide the clear answers users are gradually becoming conditioned to ask for. The internet was built to serve websites that users could go to and read stuff and to connect with others via social media.
What’s changing is that now people can use that same search box to do things, exactly as Pichai described. For example, Google recently announced the worldwide rollout of the ability to describe the needs for a restaurant reservation, and AI agents go out and fetch the information, including booking information.
“Date nights and big group dinners just got a lot easier.
We’re thrilled to expand agentic restaurant booking in Search globally, including the UK and India!
Tell AI Mode your group size, time, and vibe—it scans multiple platforms simultaneously to find real-time, bookable spots.
No more app-switching. No more hassle. Just great food.”
That’s not search, that’s task completion. What was not stated is that restaurants will need to be able to interact with these agents, to provide information like available reservation slots, menu choices that evening, and at some point those websites will need to be able to book a reservation with the AI agent. This is not something that’s coming in the near future, it’s here right now.
“I feel like in search, with every shift, you’re able to do more with it.
…If I fast forward, a lot of what are just information seeking queries will be agentic search. You will be completing tasks, you have many threads running.”
When asked if search will still be around in ten years, Pichai answered:
“Search would be an agent manager, right, in which you’re doing a lot of things.
…And I can see search doing versions of those things, and you’re getting a bunch of stuff done.”
Everyone Has Their Own Personal Internet
Cloudflare recently published an article that says the internet was the first way for humans to interact with online content, and that cloud infrastructure was the second adaptation that emerged to serve the needs of mobile devices. The next adaptation is wild and has implications for SEO because it introduces a hyper-personalized version of the web that impacts local SEO, shopping, and information retrieval.
AI agents are currently forced to use an internet infrastructure that’s built to serve humans. That’s the part that Cloudflare says is changing. But the more profound insight is that the old way, where millions of people asked the same question and got the same indexed answer, is going away. What’s replacing it is a hyper-personal experience of the web, where every person can run their own agent.
“Unlike every application that came before them, agents are one-to-one. Each agent is a unique instance. Serving one user, running one task. Where a traditional application follows the same execution path regardless of who’s using it, an agent requires its own execution environment: one where the LLM dictates the code path, calls tools dynamically, adjusts its approach, and persists until the task is done.
Think of it as the difference between a restaurant and a personal chef. A restaurant has a menu — a fixed set of options — and a kitchen optimized to churn them out at volume. That’s most applications today. An agent is more like a personal chef who asks: what do you want to eat? They might need entirely different ingredients, utensils, or techniques each time. You can’t run a personal-chef service out of the same kitchen setup you’d use for a restaurant.”
Cloudflare’s angle is that they are providing the infrastructure to support the needs of billions of agents representing billions of humans. But that is not the part that concerns SEO. The part that concerns digital marketing is that the moment when search transforms into an “agent manager” is here, right now.
WordPress 7.0
Content management systems are rapidly adapting to this change. It’s very difficult to overstate the importance of the soon-to-be-released WordPress 7.0, as it is jam-packed with the capability to connect to AI systems that will enable the internet transition from a human-centered web to an increasingly agentic-centered web.
The current internet is built for human interaction. Agents are operating within that structure, but that’s going to change very fast. The search marketing community really needs to wrap its collective mind around this change and to really understand how content management systems fit into that picture.
What Sources Do The Agents Trust?
Search marketing professional Mike Stewart recently posted on Facebook about this change, reflecting on what it means to him.
“I let Claude take over my computer. Not metaphorically — it moved my mouse, opened apps, and completed tasks on its own. That’s when something clicked… This isn’t just AI assisting anymore. This is AI operating on your behalf.
Google’s CEO is already talking about “agentic search” — where AI doesn’t just return results, it manages the process. So the real questions become: 👉 Who controls the journey? 👉 What sources does the agent trust? 👉 Where does your business show up in that decision layer? Because you don’t get “agentic search” without the ecosystem feeding it — websites, content, businesses.
That part isn’t going away. But it is being abstracted.”
Task-Based Agentic Search
I think the part that I guess we need to wrap our heads around is that humans are still making the decision to click the “make the reservation” button, and at some point, at least at the B2B layer, making purchases will increasingly become automated.
I still have my doubts about the complete automation of shopping. It feels unnatural, but it’s easy to see that the day may rapidly be approaching when, instead of writing a shopping list, a person will just tell an AI agent to talk to the local grocery store AI agent to identify which one has the items in stock at the best price, dump it into a shopping cart, and show it to the human, who then approves it.
The big takeaway is that the web may be transitioning to the “everyone has a personal chef” model, and that’s a potentially scary level of personalization. How does an SEO optimize for that? I think that’s where WordPress 7.0 comes in, as well as any other content management systems that are agentic-web ready.
Ask ChatGPT or Claude to recommend a product in your market. If your brand does not appear, you have a problem that no amount of keyword optimization will fix.
Most SEO professionals, when faced with this, immediately think about content. More pages, more keywords, better on-page signals. But the reason your brand is absent from an AI recommendation may have nothing to do with pages or keywords. It has to do with something called relational knowledge, and a 2019 research paper that most marketers have never heard of.
The Paper Most Marketers Missed
In September 2019, Fabio Petroni and colleagues at Facebook AI Research and University College London published “Language Models as Knowledge Bases?” at EMNLP, one of the top conferences in natural language processing.
Their question was straightforward: Does a pretrained language model like BERT actually store factual knowledge in its weights? Not linguistic patterns or grammar rules, but facts about the world. Things like “Dante was born in Florence” or “iPod Touch is produced by Apple.”
To test this, they built a probe called LAMA (LAnguage Model Analysis). They took known facts, thousands of them drawn from Wikidata, ConceptNet, and SQuAD, and converted each one into a fill-in-the-blank statement. “Dante was born in ___.” Then they asked BERT to predict the missing word.
BERT, without any fine-tuning, recalled factual knowledge at a level competitive with a purpose-built knowledge base. That knowledge base had been constructed using a supervised relation extraction system with an oracle-based entity linker, meaning it had direct access to the sentences containing the answers. A language model that had simply read a lot of text performed nearly as well.
The model was not searching for answers. It had absorbed associations between entities and concepts during training, and those associations were retrievable. BERT had built an internal map of how things in the world relate to each other.
After this, the research community started taking seriously the idea that language models work as knowledge stores, not merely as pattern-matching engines.
What “Relational Knowledge” Means
Petroni tested what he and others called relational knowledge: facts expressed as a triple of subject, relation, and object. For example: (Dante, [born-in], Florence). (Kenya, [diplomatic-relations-with], Uganda). (iPod Touch, [produced-by], Apple).
What makes this interesting for brand visibility (and AIO) is that Petroni’s team discovered that the model’s ability to recall a fact depends heavily on the structural type of the relationship. They identified three types, and the accuracy differences between them were large.
1-To-1 Relations: One Subject, One Object
These are unambiguous facts. “The capital of Japan is ___.” There is one answer: Tokyo. Every time the model encountered Japan and capital in the training data, the same object appeared. The association built up cleanly over repeated exposure.
BERT got these right 74.5% of the time, which is high for a model that was never explicitly trained to answer factual questions.
N-To-1 Relations: Many Subjects, One Object
Here, many different subjects share the same object. “The official language of Mauritius is ___.” The answer is English, but English is also the answer for dozens of other countries. The model has seen the pattern (country → official language → English) many times, so it knows the shape of the answer well. But it sometimes defaults to the most statistically common object rather than the correct one for that specific subject.
Accuracy dropped to around 34%. The model knows the category but gets confused within it.
N-To-M Relations: Many Subjects, Many Objects
This is where things get messy. “Patrick Oboya plays in position ___.” A single footballer might play midfielder, forward, or winger depending on context. And many different footballers share each of those positions. The mapping is loose in both directions.
BERT’s accuracy here was only about 24%. The model typically predicts something of the correct type (it will say a position, not a city), but it cannot commit to a specific answer because the training data contains too many competing signals.
I find this super useful because it maps directly onto what happens when an AI tries to recommend a brand. Brands (without monopolies) operate in a “many-to-many” relationship. So “Recommend a [Brand] with a [feature]” is one of the hardest things for AI to “predict” with consistency. I will come back to that…
What Has Happened Since 2019
Petroni’s paper established that language models store relational knowledge. The obvious next question was: where, exactly?
In 2022, Damai Dai and colleagues at Microsoft Research published “Knowledge Neurons in Pretrained Transformers” at ACL. They introduced a method to locate specific neurons in BERT’s feed-forward layers that are responsible for expressing specific facts. When they activated these “knowledge neurons,” the model’s probability of producing the correct fact increased by an average of 31%. When they suppressed them, it dropped by 29%.
OMG! This is not a metaphor. Factual associations are encoded in identifiable neurons within the model. You can find them, and you can change them.
Later that year, Kevin Meng and colleagues at MIT published “Locating and Editing Factual Associations in GPT” at NeurIPS. This took the same ideas and applied them to GPT-style models, which is the architecture behind ChatGPT, Claude, and the AI assistants that buyers actually use when they ask for recommendations. Meng’s team found they could pinpoint the specific components inside GPT that activate when the model recalls a fact about a subject.
More importantly, they could change those facts. They could edit what the model “believes” about an entity without retraining the whole system.
That finding matters for SEOs. If the associations inside these models were fixed and permanent, there would be nothing to optimize for. But they are not fixed. They are shaped by what the model absorbed during training, and they shift when the model is retrained on new data. The web content, the technical documentation, the community discussions, the analyst reports that exist when the next training run happens will determine which brands the model associates with which topics.
So, the progress from 2019 to 2022 looks like this. Petroni showed that models store relational knowledge. Dai showed where it is stored. Meng showed it can be changed. That last point is the one that should matter most to anyone trying to influence how AI recommends brands.
What This Means For Brands In AI Search
Let me translate Petroni’s three relation types into brand positioning scenarios.
The 1-To-1 Brand: Tight Association
Think of Stripe and online payments. The association is specific and consistently reinforced across the web. Developer documentation, fintech discussions, startup advice columns, integration guides: They all connect Stripe to the same concept. When someone asks an AI, “What is the best payment processing platform for developers?” the model retrieves Stripe with high confidence, because the relational link is unambiguous.
This is Petroni’s 1-to-1 dynamic. Strong signal, no competing noise.
The N-To-1 Brand: Lost In The Category
Now consider being one of 15 cybersecurity vendors associated with “endpoint protection.” The model knows the category well. It has seen thousands of discussions about endpoint protection. But when asked to recommend a specific vendor, it defaults to whichever brand has the strongest association signal. Usually, that is the one most discussed in authoritative contexts: analyst reports, technical forums, standards documentation.
If your brand is present in the conversation but not differentiated, you are in an N-to-1 situation. The model might mention you occasionally, but it will tend to retrieve the brand with the strongest association instead.
The N-To-M Brand: Everywhere And Nowhere
This is the hardest position. A large enterprise software company operating across cloud infrastructure, consulting, databases, and hardware has associations with many topics, but each of those topics is also associated with many competitors. The associations are loose in both directions.
The result is what Petroni observed with N-to-M relations: The model produces something of the correct type but cannot commit to a specific answer. The brand appears occasionally in AI recommendations but never reliably for any specific query.
I see this pattern frequently when working with enterprise brands. They have invested heavily in content across many topics, but have not built the kind of concentrated, reinforced associations that the model needs to retrieve them with confidence for any single one.
Measuring The Gap
If you accept the premise, and the research supports it, that AI recommendations are driven by relational associations stored in the model’s weights, then the practical question is: Can you measure where your brand sits in that landscape?
AI Share of Voice is the metric most teams start with. It tells you how often your brand appears in AI-generated responses. That is useful, but it is a score without a diagnosis. Knowing your Share of Voice is 8% does not tell you why it is 8%, or which specific topics are keeping you out of the recommendations where you should appear.
Two brands can have identical Share of Voice scores for completely different structural reasons. One might be broadly associated with many topics but weakly on each. Another might be deeply associated with two topics but invisible everywhere else. These are different problems requiring different strategies.
This is the gap that a metric called AI Topical Presence, developed by Waikay, is designed to address. Rather than measuring whether you appear, it measures what the AI associates you with, and what it does not. [Disclosure: I am the CEO of Waikay]
Topical Presence is as important as Share of Voice (Image from author, March 2026)
The metric captures three dimensions. Depth measures how strongly the AI connects your brand to relevant topics, weighted by importance. Breadth measures how many of the core commercial topics in your market the AI associates with your brand. Concentration measures how evenly those associations are distributed, using a Herfindahl-Hirschman Index borrowed from competition economics.
A brand with high depth but low breadth is known well for a few things but invisible for many others. A brand with wide coverage but high concentration is fragile: One model update could change its visibility significantly. The component breakdown tells you which problem you have and which lever to pull.
In the chart above, we start to see how different brands are really competing with each other in a way we have not been able to see before. For example, Inlinks is competing much more closely with a product called Neuronwriter than previously understood. Neuronwriter has less share of voice (I probably helped them by writing this article… oops!), but they have a better topical presence around the prompt, “What are the best semantic SEO tools?” So all things being equal, a bit of marketing is all they need to take Inlinks. This, of course, assumes that Inlinks stands still. It won’t. By contrast, the threat of Ahrefs is ever-present, but by being a full-service offering, they have to spread their “share of voice” across all of their product offerings. So while their topical presence is high, the brand is not the natural choice for an LLM to choose for this prompt.
This connects back to Petroni’s framework. If your brand is in a 1-to-1 position for some topics but absent from others, topical presence shows you where the gaps are. If you are in an N-to-1 or N-to-M situation, it helps you identify which associations need strengthening and which topics competitors have already built dominant positions on.
From Ranking Pages To Building Associations
For 25 years, SEO has been about ranking pages. PageRank itself was a page-level algorithm; the clue was always in the name (IYKYK … No need to correct me…). Even as Google moved towards entities and knowledge graphs, the practical work of SEO remained rooted in keywords, links, and on-page optimization.
AI visibility requires something different. The models that generate brand recommendations are retrieving associations built during training, formed from patterns of co-occurrence across many contexts. A brand that publishes 500 blog posts about “zero trust” will not build the same association strength as a brand that appears in NIST documentation, peer discussions, analyst reports, and technical integrations.
This is fantastic news for brands that do good work in their markets. Content volume alone does not create strong relational associations. The model’s training process works as a quality filter: It learns from patterns across the entire corpus, not from any single page. A brand with real expertise, discussed across many contexts by many voices, will build stronger associations than a brand that simply publishes more.
The question to ask is not “Do we have a page about this topic?” It is: “If someone read everything the AI has absorbed about this topic, would our brand come across as a credible participant in the conversation?”
That is a harder question. But the research that began with Petroni’s fill-in-the-blank tests in 2019 has given us enough understanding of the mechanism to measure it. And what you can measure, you can improve.
Every major AI platform can now browse websites autonomously. Chrome’s auto browse scrolls and clicks. ChatGPT Atlas fills forms and completes purchases. Perplexity Comet researches across tabs. But none of these agents sees your website the way a human does.
This is Part 4 in a five-part series on optimizing websites for the agentic web. Part 1 covered the evolution from SEO to AAIO. Part 2 explained how to get your content cited in AI responses. Part 3 mapped the protocols forming the infrastructure layer. This article gets technical: how AI agents actually perceive your website, and what to build for them.
The core insight is one that keeps coming up in my research: The most impactful thing you can do for AI agent compatibility is the same work web accessibility advocates have been pushing for decades. The accessibility tree, originally built for screen readers, is becoming the primary interface between AI agents and your website.
According to the 2025 Imperva Bad Bot Report (Imperva is a cybersecurity company), automated traffic surpassed human traffic for the first time in 2024, constituting 51% of all web interactions. Not all of that is agentic browsing, but the direction is clear: the non-human audience for your website is already larger than the human one, and it’s growing. Throughout this article, we draw exclusively from official documentation, peer-reviewed research, and announcements from the companies building this infrastructure.
Three Ways Agents See Your Website
When a human visits your website, they see colors, layout, images, and typography. When an AI agent visits, it sees something entirely different. Understanding what agents actually perceive is the foundation for building websites that work for them.
The major AI platforms use three distinct approaches, and the differences have direct implications for how you should structure your website.
Vision: Reading Screenshots
Anthropic’s Computer Use takes the most literal approach. Claude captures screenshots of the browser, analyzes the visual content, and decides what to click or type based on what it “sees.” It’s a continuous feedback loop: screenshot, reason, act, screenshot. The agent operates at the pixel level, identifying buttons by their visual appearance and reading text from the rendered image.
Google’s Project Mariner follows a similar pattern with what Google describes as an “observe-plan-act” loop: observe captures visual elements and underlying code structures, plan formulates action sequences, and act simulates user interactions. Mariner achieved an 83.5% success rate on the WebVoyager benchmark.
The vision approach works, but it’s computationally expensive, sensitive to layout changes, and limited by what’s visually rendered on screen.
ChatGPT Atlas uses ARIA tags, the same labels and roles that support screen readers, to interpret page structure and interactive elements.
Atlas is built on Chromium, but rather than analyzing rendered pixels, it queries the accessibility tree for elements with specific roles (“button”, “link”) and accessible names. This is the same data structure that screen readers like VoiceOver and NVDA use to help people with visual disabilities navigate the web.
Microsoft’s Playwright MCP, the official MCP server for browser automation, takes the same approach. It provides accessibility snapshots rather than screenshots, giving AI models a structured representation of the page. Microsoft deliberately chose accessibility data over visual rendering for their browser automation standard.
Hybrid: Both At Once
In practice, the most capable agents combine approaches. OpenAI’s Computer-Using Agent (CUA), which powers both Operator and Atlas, layers screenshot analysis with DOM processing and accessibility tree parsing. It prioritizes ARIA labels and roles, falling back to text content and structural selectors when accessibility data isn’t available.
Perplexity’s research confirms the same pattern. Their BrowseSafe paper, which details the safety infrastructure behind Comet’s browser agent, describes using “hybrid context management combining accessibility tree snapshots with selective vision.”
Platform
Primary Approach
Details
Anthropic Computer Use
Vision (screenshots)
Screenshot, reason, act feedback loop
Google Project Mariner
Vision + code structure
Observe-plan-act with visual and structural data
OpenAI Atlas
Accessibility tree
Explicitly uses ARIA tags and roles
OpenAI CUA
Hybrid
Screenshots + DOM + accessibility tree
Microsoft Playwright MCP
Accessibility tree
Accessibility snapshots, no screenshots
Perplexity Comet
Hybrid
Accessibility tree + selective vision
The pattern is clear. Even platforms that started with vision-first approaches are incorporating accessibility data. And the platforms optimizing for reliability and efficiency (Atlas, Playwright MCP) lead with the accessibility tree.
Your website’s accessibility tree isn’t a compliance artifact. It’s increasingly the primary interface agents use to understand and interact with your website.
Last year, before the European Accessibility Act took effect, I half-joked that it would be ironic if the thing that finally got people to care about accessibility was AI agents, not the people accessibility was designed for. That’s no longer a joke.
The Accessibility Tree Is Your Agent Interface
The accessibility tree is a simplified representation of your page’s DOM that browsers generate for assistive technologies. Where the full DOM contains every div, span, style, and script, the accessibility tree strips away the noise and exposes only what matters: interactive elements, their roles, their names, and their states.
This is why it works so well for agents. A typical page’s DOM might contain thousands of nodes. The accessibility tree reduces that to the elements a user (or agent) can actually interact with: buttons, links, form fields, headings, landmarks. For AI models that process web pages within a limited context window, that reduction is significant.
Follow WAI-ARIA best practices by adding descriptive roles, labels, and states to interactive elements like buttons, menus, and forms. This helps ChatGPT recognize what each element does and interact with your site more accurately.
And:
Making your website more accessible helps ChatGPT Agent in Atlas understand it better.
Research data backs this up. The most rigorous data on this comes from a UC Berkeley and University of Michigan study published for CHI 2026, the premier academic conference on human-computer interaction. The researchers tested Claude Sonnet 4.5 on 60 real-world web tasks under different accessibility conditions, collecting 40.4 hours of interaction data across 158,325 events. The results were striking:
Condition
Task Success Rate
Avg. Completion Time
Standard (default)
78.33%
324.87 seconds
Keyboard-only
41.67%
650.91 seconds
Magnified viewport
28.33%
1,072.20 seconds
Under standard conditions, the agent succeeded nearly 80% of the time. Restrict it to keyboard-only interaction (simulating how screen reader users navigate) and success drops to 42%, taking twice as long. Restrict the viewport (simulating magnification tools), and success drops to 28%, taking over three times as long.
The paper identifies three categories of gaps:
Perception gaps: agents can’t reliably access screen reader announcements or ARIA state changes that would tell them what happened after an action.
Cognitive gaps: agents struggle to track task state across multiple steps.
Action gaps: agents underutilize keyboard shortcuts and fail at interactions like drag-and-drop.
The implication is direct. Websites that present a rich, well-labeled accessibility tree give agents the information they need to succeed. Websites that rely on visual cues, hover states, or complex JavaScript interactions without accessible alternatives create the conditions for agent failure.
Perplexity’s search API architecture paper from September 2025 reinforces this from the content side. Their indexing system prioritizes content that is “high quality in both substance and form, with information captured in a manner that preserves the original content structure and layout.” Websites “heavy on well-structured data in list or table form” benefit from “more formulaic parsing and extraction rules.” Structure isn’t just helpful. It’s what makes reliable parsing possible.
Semantic HTML: The Agent Foundation
The accessibility tree is built from your HTML. Use semantic elements, and the browser generates a useful accessibility tree automatically. Skip them, and the tree is sparse or misleading.
This isn’t new advice. Web standards advocates have been screaming “use semantic HTML” for two decades. Not everyone listened. What’s new is that the audience has expanded. It used to be about screen readers and a relatively small percentage of users. Now it’s about every AI agent that visits your website.
Use native elements. A element automatically appears in the accessibility tree with the role “button” and its text content as the accessible name. A
does not. The agent doesn’t know it’s clickable.
Search flights
Label your forms. Every input needs an associated label. Agents read labels to understand what data a field expects.
The autocomplete attribute deserves attention. It tells agents (and browsers) exactly what type of data a field expects, using standardized values like name, email, tel, street-address, and organization. When an agent fills a form on someone’s behalf, autocomplete attributes make the difference between confident field mapping and guessing.
Establish heading hierarchy. Use h1 through h6 in logical order. Agents use headings to understand page structure and locate specific content sections. Skip levels (jumping from h1 to h4) create confusion about content relationships.
Use landmark regions. HTML5 landmark elements (
, ,
,
,
) tell agents where they are on the page. A
element is unambiguously navigation. A
requires interpretation. Clarity for the win, always.
Flight Search
Microsoft’s Playwright test agents, introduced in October 2025, generate test code that uses accessible selectors by default. When the AI generates a Playwright test, it writes:
const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
Not CSS selectors. Not XPath. Accessible roles and names. Microsoft built its AI testing tools to find elements the same way screen readers do, because it’s more reliable.
The final slide of my Conversion Hotel keynote about optimizing websites for AI agents. (Image Credit: Slobodan Manic)
ARIA: Useful, Not Magic
OpenAI recommends ARIA (Accessible Rich Internet Applications), the W3C standard for making dynamic web content accessible. But ARIA is a supplement, not a substitute. Like protein shakes: useful on top of a real diet, counterproductive as a replacement for actual food.
If you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.
The fact that the W3C had to make “don’t use ARIA” the first rule of ARIA tells you everything about how often it gets misused.
Adrian Roselli, a recognized web accessibility expert, raised an important concern in his October 2025 analysis of OpenAI’s guidance. He argues that recommending ARIA without sufficient context risks encouraging misuse. Websites that use ARIA are generally less accessible according to WebAIM’s annual survey of the top million websites, because ARIA is often applied incorrectly as a band-aid over poor HTML structure. Roselli warns that OpenAI’s guidance could incentivize practices like keyword-stuffing in aria-label attributes, the same kind of gaming that plagued meta keywords in early SEO.
The right approach is layered:
Start with semantic HTML. Use , , , , and other native elements. These work correctly by default.
Add ARIA when native HTML isn’t enough. Custom components that don’t have HTML equivalents (tab panels, tree views, disclosure widgets) need ARIA roles and states to be understandable.
Use ARIA states for dynamic content. When JavaScript changes the page, ARIA attributes communicate what happened:
Keep aria-label descriptive and honest. Use it to provide context that isn’t visible on screen, like distinguishing between multiple “Delete” buttons on the same page. Don’t stuff it with keywords.
The principle is the same one that applies to good SEO: build for the user first, optimize for the system second. Semantic HTML is building for the user. ARIA is fine-tuning for edge cases where HTML falls short.
The Rendering Question
Browser-based agents like Chrome auto browse, ChatGPT Atlas, and Perplexity Comet run on Chromium. They execute JavaScript. They can render your single-page application.
But not everything that visits your website is a full browser agent.
AI crawlers (PerplexityBot, OAI-SearchBot, ClaudeBot) index your content for retrieval and citation. Many of these crawlers do not execute client-side JavaScript. If your page is a blank
until React hydrates, these crawlers see an empty page. Your content is invisible to the AI search ecosystem.
Part 2 of this series covered the citation side: AI systems select fragments from indexed content. If your content isn’t in the initial HTML, it’s not in the index. If it’s not in the index, it doesn’t get cited. Server-side rendering isn’t just a performance optimization.
It’s a visibility requirement.
Even for full browser agents, JavaScript-heavy websites create friction. Dynamic content that loads after interactions, infinite scroll that never signals completion, and forms that reconstruct themselves after each input all create opportunities for agents to lose track of state. The A11y-CUA research attributed part of agent failure to “cognitive gaps”: agents losing track of what’s happening during complex multi-step interactions. Simpler, more predictable rendering reduces these failures.
Microsoft’s guidance from Part 2 applies here directly: “Don’t hide important answers in tabs or expandable menus: AI systems may not render hidden content, so key details can be skipped.” If information matters, put it in the visible HTML. Don’t require interaction to reveal it.
Practical rendering priorities:
Server-side render or pre-render content pages. If an AI crawler can’t see it, it doesn’t exist in the AI ecosystem.
Avoid blank-shell SPAs for content pages. Frameworks like Next.js (which powers this website), Nuxt, and Astro make SSR straightforward.
Don’t hide critical information behind interactions. Prices, specifications, availability, and key details should be in the initial HTML, not behind accordions or tabs.
Use standard links for navigation. Client-side routing that doesn’t update the URL or uses onClick handlers instead of real links breaks agent navigation.
Testing Your Agent Interface
You wouldn’t ship a website without testing it in a browser. Testing how agents perceive your website is becoming equally important.
Screen reader testing is the best proxy. If VoiceOver (macOS), NVDA (Windows), or TalkBack (Android) can navigate your website successfully, identifying buttons, reading form labels, and following the content structure, agents can likely do the same. Both audiences rely on the same accessibility tree. This isn’t a perfect proxy (agents have capabilities screen readers don’t, and vice versa), but it catches the majority of issues.
Microsoft’s Playwright MCP provides direct accessibility snapshots. If you want to see exactly what an AI agent sees, Playwright MCP generates structured accessibility snapshots of any page. These snapshots strip away visual presentation and show you the roles, names, and states that agents work with. Published as @playwright/mcp on npm, it’s the most direct way to view your website through an agent’s eyes.
The output looks something like this (simplified):
If your critical interactive elements don’t appear in the snapshot, or appear without useful names, agents will struggle with your website.
Browserbase’s Stagehand (v3, released October 2025, and humbly self-described as “the best browser automation framework”) provides another angle. It parses both DOM and accessibility trees, and its self-healing execution adapts to DOM changes in real time. It’s useful for testing whether agents can complete specific workflows on your website, like filling a form or completing a checkout.
The Lynx browser is a low-tech option worth trying. It’s a text-only browser that strips away all visual rendering, showing you roughly what a non-visual agent parses. A trick I picked up from Jes Scholz on the podcast.
A practical testing workflow:
Run VoiceOver or NVDA through your website’s key user flows. Can you complete the core tasks without vision?
Generate Playwright MCP accessibility snapshots of critical pages. Are interactive elements labeled and identifiable?
View your page source. Is the primary content in the HTML, or does it require JavaScript to render?
Load your page in Lynx or disable CSS and check if the content order and hierarchy still make sense. Agents don’t see your layout.
A Checklist For Your Development Team
If you’re sharing this article with your developers (and you should), here’s the prioritized implementation list. Ordered by impact and effort, starting with the changes that affect the most agent interactions for the least work.
High impact, low effort:
Use native HTML elements. for actions, for links, for dropdowns. Replace patterns wherever they exist.
Label every form input. Associate elements with inputs using the for attribute. Add autocomplete attributes with standard values.
Server-side render content pages. Ensure primary content is in the initial HTML response.
High impact, moderate effort:
Implement landmark regions. Wrap content in , , , and elements. Add aria-label when multiple landmarks of the same type exist on the same page.
Fix heading hierarchy. Ensure a single h1, with h2 through h6 in logical order without skipping levels.
Move critical content out of hidden containers. Prices, specifications, and key details should not require clicks or interactions to reveal.
Moderate impact, low effort:
Add ARIA states to dynamic components. Use aria-expanded, aria-controls, and aria-hidden for menus, accordions, and toggles.
Use descriptive link text. “Read the full report” instead of “Click here.” Agents use link text to understand where links lead.
Test with a screen reader. Make it part of your QA process, not a one-time audit.
Key Takeaways
AI agents perceive websites through three approaches: vision, DOM parsing, and the accessibility tree. The industry is converging on the accessibility tree as the most reliable method. OpenAI Atlas, Microsoft Playwright MCP, and Perplexity’s Comet all rely on accessibility data.
Web accessibility is no longer just about compliance. The accessibility tree is the literal interface AI agents use to understand your website. The UC Berkeley/University of Michigan study shows agent success rates drop significantly when accessibility features are constrained.
Semantic HTML is the foundation. Native elements like , , , and automatically create a useful accessibility tree. No framework required. No ARIA needed for the basics.
ARIA is a supplement, not a substitute. Use it for dynamic states and custom components. But start with semantic HTML and add ARIA only where native elements fall short. Misused ARIA makes websites less accessible, not more.
Server-side rendering is an agent visibility requirement. AI crawlers that don’t execute JavaScript can’t see content in blank-shell SPAs. If your content isn’t in the initial HTML, it doesn’t exist in the AI ecosystem.
Screen reader testing is the best proxy for agent compatibility. If VoiceOver or NVDA can navigate your website, agents probably can too. For direct inspection, Playwright MCP accessibility snapshots show exactly what agents see.
The first three parts of this series covered why the shift matters, how to get cited, and what protocols are being built. This article covered the implementation layer. The encouraging news is that these aren’t separate workstreams. Accessible, well-structured websites perform better for humans, rank better in search, get cited more often by AI, and work better for agents. It’s the same work serving four audiences.
And the work builds on itself. The semantic HTML and structured data covered here are exactly what WebMCP builds on for its declarative form approach. The accessibility tree your website exposes today becomes the foundation for the structured tool interfaces of tomorrow.
Up next in Part 5: the commerce layer. How Stripe, Shopify, and OpenAI are building the infrastructure for AI agents to complete purchases, and what it means for your checkout flow.
As SEJ’s Roger Montti reported, Pichai described a version of search where users have “many threads running” and are completing tasks rather than browsing results.
But the interview covered more than that one quote. Throughout the conversation, Pichai laid out a timeline, identified the barriers slowing adoption, described how he already uses an internal agent tool, and confirmed infrastructure constraints that limit how quickly this vision can ship.
Here’s what the rest of the interview reveals for search professionals.
How Pichai’s Language Has Escalated
The “agent manager” line didn’t come out of nowhere. Pichai’s language about search’s future has gotten more specific over the past 18 months.
In December 2024, he told an interviewer that search would “change profoundly in 2025” and that Google would be able to “tackle more complex questions than ever before.”
By October 2025, during Google’s Q3 earnings call, he was calling it an “expansionary moment for Search” and reporting that AI Mode queries had doubled quarter over quarter.
In February 2026, he reported Search revenue hit $63 billion in Q4 2025 with growth accelerating from 10% in Q1 to 17% in Q4, attributing the increase to AI features.
Now, in April, he’s putting a label on it. Not “search will change” or “search is expanding,” but “search as an agent manager” where users complete tasks.
Each time the language has moved from abstract to concrete, from prediction to description.
The 2027 Inflection Point
Collison asked Pichai when a fully agentic business process, like automated financial forecasting with no human in the loop, might happen at Google. Pichai pointed to next year.
“I definitely expect in some of these areas 2027 to be an important inflection point for certain things.”
He added that non-engineering workflows would see changes “pretty profoundly” in 2027, noting that some groups inside Google are already working this way.
“There are some groups within Google who are shifting more profoundly, and so for me a big task is how do you diffuse that to more and more groups, particularly in 2026.”
He also acknowledged that younger, AI-native companies have an advantage in adopting these workflows, while larger organizations like Google face retraining and change management challenges.
The Intelligence Overhang
One of the most useful parts of the interview wasn’t from Pichai. It was Collison’s description of what he called the “intelligence overhang,” the gap between what AI can do today and how much organizations are actually using it.
Collison identified four barriers that slow adoption even when the models are capable. The first is prompting skill. Getting good results from AI takes practice, and most people inside organizations haven’t built that skill yet.
The second is company-specific context. Even a skilled prompter needs to know which internal tools, datasets, and conventions to reference. The third is data access. An agent can’t answer “what’s the status of this deal?” if it can’t reach the CRM or if permissions block it. The fourth is role definition. Job descriptions, team structures, and approval workflows were designed for a world without AI coworkers.
Pichai agreed with this assessment and said Google faces the same challenges internally.
“Identity access controls are like real hard problems and so we are working through those things, but those are the key things which are limiting diffusion to us too.”
He described how Google’s internal agent tool, which he referred to as Antigravity, is already changing how he works as CEO. He said he queries it to get quick reads on product launches.
“Hey, we launched this thing, like what did people think about this? Tell me like the worst five things people are talking about, the best five things people are talking about, and I type that.”
That’s a concrete example of the agent manager concept in action today inside Google. Pichai is using search as a task-completion tool, not a link-returning tool. The gap between that internal experience and what’s available to external users is part of what Google is working to close.
For SEO teams and agencies, the intelligence overhang is worth thinking about on two levels. There’s the overhang in your own organization, where AI tools could be doing more than they currently are. And there’s the overhang on Google’s side, where the models are already capable of agent-style search but the product hasn’t fully shipped it yet.
What’s Gating The Timeline
Pichai confirmed that Google’s 2026 capital expenditure will land between $175 billion and $185 billion, correcting a $150 billion figure that Collison cited. That’s roughly six times the $30 billion range Google was spending before the current AI buildout.
When asked about bottlenecks, Pichai identified four constraints in order.
Wafer production capacity is the most basic limit. Memory supply is “definitely one of the most critical constraints now.” Permitting and regulatory timelines for building new data centers are a growing concern. And critical supply chain components beyond memory add additional pressure.
“There is no way that the leading memory companies are going to dramatically improve their capacity. So you have those constraints in the short term, but they get, they get more relaxed as you go out.”
He said these constraints would also drive efficiency gains, predicting that Google would make its AI systems “30x more efficient” even as it scales spending.
He also noted that he personally dedicates an hour each week to reviewing compute allocation at a granular level across teams and projects within Google.
What This Means For Search Professionals
Pichai’s description of search as an agent manager changes the question that SEO professionals need to ask about their work.
In a results-based search model, the goal is to rank. In an agent-based model, the goal is to be useful to a system that’s completing a task. Those are different problems.
Consider what agent-completed search looks like in practice. You tell search to find a plumber, check reviews, confirm availability for Saturday morning, and book an appointment. The agent doesn’t return ten blue links. It pulls from structured business data, review platforms, and booking systems to complete the job. The businesses that are chosen are those whose information is accurate, structured, and accessible to the agent. The ones with outdated hours, no booking integration, or thin review profiles don’t get surfaced.
The same pattern applies to ecommerce. A shopper says, “find me running shoes under $150 that work for flat feet and can arrive by Friday.” An agent that can complete that task needs product data, inventory availability, shipping estimates, and compatibility information. Sites that provide that data in structured, machine-readable formats become part of the agent’s toolkit. Sites that bury it inside JavaScript-rendered pages or behind login walls get skipped.
If an agent can synthesize an answer from five sources without sending the user to any of them, what’s the value of being one of those five sources? That depends entirely on whether the agent cites you, links to you, or treats your content as raw material without attribution.
This aligns with the changes we see in AI Mode. Google reported during its Q4 2025 earnings call that AI Mode queries are three times longer than traditional searches and frequently prompt follow-up questions.
The 2027 timeline matters too. If non-engineering enterprise workflows start becoming agentic next year, the businesses providing the information and services that those agents draw from will need to be structured for machine consumption, not just human browsing. Structured data, clean APIs, and accurate business information become infrastructure, not nice-to-haves.
The Measurement Gap
Pichai’s insistence that AI search is non-zero-sum deserves more scrutiny than it usually gets.
But total query growth and individual site traffic are different metrics. Google can be right that more people are searching more often while individual publishers and businesses see less referral traffic from those searches. Both things can be true at the same time.
Google hasn’t shared outbound click data from AI Mode. Until Google provides that data, Pichai’s “expansionary” claim is an assertion, not a verifiable fact. Search professionals should track their own referral traffic trends independently rather than relying on Google’s characterization of the overall market.
Looking Ahead
Pichai’s language in this interview goes further than what Google has said publicly before. Previous statements described AI search as an evolution. This one puts a clearer label on Google’s direction for Search. Search as an agent manager is a product vision.
The timeline he laid out, with 2027 as the inflection point for non-engineering agentic workflows, gives you a window. How Google monetizes agent-completed tasks, whether agents cite sources or simply use them, and what visibility even means in an agent-manager model are all open questions that will need answers before 2027 arrives.
Google I/O 2026 is scheduled for May 19-20 and will likely provide more details on how these capabilities will ship.
Here are the five most interesting things I learned.
1. Search Will Still Exist In The Future, But Much Of It Will Be Agentic
Sundar was asked if agents would replace Search. He said:
“If I fast forward, a lot of what are just information-seeking queries will be agentic in Search. You’ll be completing tasks. You’ll have many threads running.”
And also, Search will change so that we think of it like an agent manager.
“It keeps evolving. Search will be an agent manager in which you’re doing a lot of things. I think, in some ways, you know, I use Antigravity today, and you know, you have a bunch of agents doing stuff, and I can see search doing versions of those things, and you’re getting a bunch of stuff done.”
He said that people do deep research in AI Mode, and it will soon be the norm to do long-running tasks. He also said that the form factor of devices will change.
2. Google Uses Antigravity Internally
Boy, do I love Google’s IDE and agent manager, Antigravity. I have built so many things with it, including my own RSS feed reader, a screenshot and annotation tool, workflows to publish things I write in a Google Doc to my WordPress site, and a bunch of tools to do agentic things with Google Search Console and Google Analytics 4 data. While I think Claude Cowork and Claude Code are incredible, I truly do prefer using Antigravity.
It turns out that Google makes good use of Antigravity internally. Except they don’t call it Antigravity. They call it “Jet Ski.”
Sundar said that the Google DeepMind and the Google Software Engineers use it:
“I can see groups, and in particular I would say GDM and some of the SWE groups really change their workflows. They are using, we call this for some strange reason, we have a different name internally than externally of the same product, but it’s Jet Ski internally which is Antigravity. You’re living on it, you’re living in an agent manager world. You have workflows, and you’re working in this new way.”
He also uses it himself.
“I would query in Antigravity, in our internal version of Antigravity. “Hey, we launched this thing. What did people think about this? Tell me the worst five things people are talking about?” and I type that. Now that brings it back. Has my life gotten easier? Yes. In the past I would have to spend a lot more time trying to get a sense for it. Now an AI agent is helping me in that journey.”
Also, just last week, the Google Search team started using Antigravity.
“Just last week we rolled it [Antigravity] out to the Search team. We’re constantly pushing that. In a large organization, I think change management is a hard aspect of this technology diffusing, which may be easy for a small company. You can quickly switch over.”
If you want to learn how to use Antigravity, I’ve created a full guide teaching you how it works, and how I use it to not only code, but create full agentic workflows that I actually use in my day-to-day work. It’s available in the paid part of my community, The Search Bar. And next Thursday, the Search Bar Pro crew is having an event where we’re going to split into two teams, Team Claude Code and Team Antigravity, and see who can build the better SEO tool.
I know it’s a bit of a pain to try and use something new in your workflows. I thoroughly believe that those who learn how to use Antigravity today will have a big advantage as things really start to take off as AI improves.
3. Robotics Is Growing Fast
Sundar admitted that Google was previously too early to robotics. AI has become the missing ingredient for ideas conceived 10 to 15 years ago. The Gemini Robotics models have reached state-of-the-art status for spatial reasoning. Google has partnered back with Boston Dynamics and Agile and a few other companies.
Most interesting to me was the discussion on Wing for drone delivery.
“I think we are scaling up Wing where in some reasonable time period, 40 million Americans will have access to a Wing delivery service. I’m not talking years out or something like that.”
When asked if Google was going to do more to build hardware, Sundar said having first-party hardware for robotics and AI would be important.
“I think we’d keep a very open mind. My lesson from Waymo and on the AI side with TPUs, et cetera, I need to really push the curve well, particularly in areas where you have safety, regulatory, everything. You want the first hand experience of the product feedback cycle. I think having first party hardware will end up being very important.”
4. Agentic OpenClaw-Like Systems Are The Future
There’s a reason why OpenClaw (initially Clawdbot) went crazy viral a few weeks ago. I still haven’t set up an OpenClaw system because I don’t feel I know enough about security to make this system safe.
When Sundar was asked if something OpenClaw-like was coming from Google, he said he thought it was the future.
“I think you want to give users capability where you have persistent long-running tasks in a reliable, secure way. You have to think through things like identity, access, et cetera. But I think that’s the future. That’s the agentic future. And bringing that for consumers is a bit of an exciting frontier we are looking at. This is one of mine too.
I think effectively the consumer interfaces are going to have full coding models underneath, and the right harnesses and the right skills and the ability to persist and run somewhere security in the cloud, locally and in the cloud. All those primitives are coming together.
Today I feel like there’s 1% of the world, maybe not 1%, 0.1% of the world who’s living this future. They are building stuff for themselves, but bringing that to mass adoption. Yes. It is a very exciting frontier I think.”
5. AI And AI Agents Are Going To Improve Dramatically In 2027
Sundar was asked when he thought it would happen that agentic systems would be able to work fully with no human in the loop. He said twice that 2027 was likely to be a big year.
“I definitely expect in some of these areas ’27 to be an important inflection point for certain things. Even the people doing it, that is the workflow through which they would produce it. Maybe for a while you would check it in the conventional way, but you switch over, a crossover. But I expect ’27 to be a big year in which some of those shifts happen pretty profoundly.”
The interview finished with Sundar talking about what he was most excited about. He did mention that putting data centers in space was very exciting, but this last bit was super interesting.
“I literally spent time yesterday with someone who was explaining some improvement in post-training, which is one person talking through the improvement they are doing. Listening to it, I’m like, “Oh, it’s going to really show up as a nice jump.” That’s the constant power of this moment. All of that, I don’t want to be specific about the second one, but we’ll publish it one day I’m sure.”
It sounds to me like he is talking about agentic self-improvement.
We are currently learning how to have AI build and do things for us. I recall first learning to code with ChatGPT as a partner. It would give me code to paste into VS Code. Then I’d run it and paste the errors back into ChatGPT. We went back and forth until something actually worked. I felt like I was unnecessary in this process – the copying and pasting robot, and sure enough, today’s systems like Antigravity, Claude Code, and ChatGPT Codex run the code, check the errors, and fix things up without much need for human involvement.
It makes sense to me that the next step in this process is to have AI systems learn to improve their usefulness without us having to prompt them specifically. I expect that when this happens, we will see even faster progression of AI capabilities and usefulness!
More Resources:
Read Marie’s newsletter, AI News You Can Use. Subscribe now.