SERP FAQ Removal & New Data Challenge Schema’s AI Search Value via @sejournal, @MattGSouthern

Schema markup had a rough week. Google ended FAQ rich results. Four days later, Ahrefs published a report, finding that adding JSON-LD didn’t produce a clear citation lift across Google AI Overviews, AI Mode, or ChatGPT.

These developments weaken two common pitches for schema markup: increased SERP visibility and potential AI citation gains. This article examines their implications and what the data indicates about schema’s future.

Google’s Visible Schema Rewards Have Been Narrowing For Years

Google has been pulling back visible Search rewards tied to specific structured data types since 2023. Google restricted FAQ rich results to authoritative government and health sites, and HowTo rich results were limited to desktop and later deprecated.

In 2025, Google announced the retirement of several structured data features, including Course Info, Claim Review, and Estimated Salary. Book Actions was initially included but later carved out after Google removed its deprecation banner. Google called the remaining retirements “not commonly used in Search” and no longer providing value to users.

In 2026, Practice Problem structured data was deprecated. John Mueller noted on Reddit that “markup types come and go, but a precious few you should hold on to.”

The pattern is that visible structured data rewards have disappeared after becoming familiar SEO tactics. The markup itself stays valid, but the rich result doesn’t. Google doesn’t always describe these removals as responses to overuse, but the pattern offers less reason to treat any single markup type as a durable strategy.

These recent updates differ because the evidence for one proposed replacement value also weakened. The “GEO” advisory space claims schema boosts AI citations, and Ahrefs data tested part of that.

What The Ahrefs Report Found

Ahrefs tracked 1,885 web pages that added JSON-LD schema. Each page was matched against control pages that never added schema. Citation changes were measured across Google AI Overviews, AI Mode, and ChatGPT.

The results were flat. Google AI Mode showed +2.4%, ChatGPT showed +2.2%, and Google AI Overviews showed -4.6%.

The first two were too small to tell apart from random variation. The AI Overviews decline was statistically significant, but Ahrefs said it can’t confidently attribute that to schema.

Every page in the dataset already had more than 100 AI Overview citations before any schema was added. These pages were already being crawled and cited.

Ahrefs acknowledged that for pages not yet visible to AI, schema might still help with crawling, parsing, or indexing. But their data can’t confirm that.

Gianluca Fiorelli, a strategic SEO consultant, called the study “one of the more honest pieces of research to come out of the AI Search space in 2026.” But he argued the scope was narrower than the headline suggested. He compared it to “testing whether adding a label to a bottle already on the supermarket shelf makes customers pick it up more often.”

Ahrefs also cited a searchVIU experiment that found five AI systems relied on visible HTML during direct page retrieval and did not use hidden JSON-LD, Microdata, or RDFa. That finding covers one stage of the pipeline. It does not rule out schema playing a role earlier in indexing or entity understanding.

Ryan Law, Ahrefs’ director of content marketing, summarized the finding on LinkedIn, saying:

“Does adding schema markup help your pages get cited in AI search? Probably not,” he wrote. He added that schema is “probably not some magic fix for improving your AI citations.”

The Practitioner Debate

Both updates land in the middle of an active argument about schema and GEO.

Roughly 168,000 pages use the phrase “FAQ schema is critical for GEO,” according to search results that Lily Ray, VP of SEO and AI Search at Amsive, flagged on LinkedIn. She called the trend familiar.

“Anything that can be spammed in SEO, will be spammed,” Ray wrote. She’d warned about this in a 2019 Moz article when FAQ schema first launched, and described Google’s FAQ removal as the same cycle repeating.

Ray hedged throughout her post, calling it “putting on my tin foil hat” and “just an idea.” But the pattern she described is the same one visible in the timeline above. A useful markup type gets scaled as a tactic, Google pulls the reward, and the industry moves on to the next one.

Joost de Valk, founder of Yoast, made the connection explicit in a blog post. “The GEO industry is replaying early SEO, just faster,” de Valk said. “And the FAQ schema deprecation is the first concrete proof point that the cycle is back on.”

He also filed a Schema.org proposal for a new FAQSection type to address what he sees as the structural problem, separating “this page has an FAQ section” from “this page IS an FAQ.”

The frustration was sharpest from practitioners who’d been watching the GEO playbook harden around schema as its most concrete recommendation. Mark Williams-Cook, director at Candour and founder of AlsoAsked, shared the Ahrefs report on LinkedIn.

“GEO bros are selling snake oil with schema to boost citations, but people like Gianluca Fiorelli are talking sense,” he posted.

Marie Haynes, founder of Marie Haynes Consulting, commented on Ray’s post with a different theory altogether.

“My theory is that Google needed our FAQs to train AI so they gave us incentive to add them (aka rich results.) And now they don’t need them anymore,” she wrote. The theory is unconfirmed by any primary source, but it shows how far the speculation has traveled.

Some practitioners pushed back on the gloomier readings. Google’s broader guidance still presents structured data as a way to make page information machine-readable, and at a 2025 Search Central Live event in Madrid, the Search Relations team told practitioners that supported structured data types are still worth using.

What The Data Can’t Answer Yet

Whether schema helps pages that aren’t yet being cited is a separate question that the data can’t answer, because every page already had more than 100 AI Overview citations before schema was added.

The test also pooled all schema types together. Article, FAQ, Product, HowTo, and Organization were all treated as one category. Type-specific effects haven’t been isolated, and they could look different.

The 30-day measurement window may miss slower effects, and on live websites, schema changes can overlap with other page changes, making it hard to separate what schema did from what changed around it. The report only examined schema in the page’s HTML, not schema injected via JavaScript, which AI crawlers treat differently.

Ahrefs measured Google AI Overviews, AI Mode, and ChatGPT. Whether Bing, Copilot, Perplexity, Claude, or other answer systems treat schema differently from the systems Ahrefs measured is an open question.

Google’s FAQ deprecation notice says the company will continue using FAQ structured data to “better understand” pages. What that produces in measurable terms is unclear. The same uncertainty applies to whether schema affects citations indirectly, through eligibility, entity understanding, or source selection, rather than during the direct retrieval that searchVIU tested.

Nobody has published data that isolates that path.

Why This Matters

The Ahrefs data gives no measured reason to add JSON-LD, expecting short-term AI citation gains for pages already visible in AI Overviews. The trickier question is what to do with schema strategies more broadly.

Product, Review, Event, Video, and some other structured data types still support active rich result features. Organization, Person, and Article markup can still help describe entities and content, even when the payoff is less visible.

A blanket “schema doesn’t work” reading overstates what the data showed, because the test pooled all types and measured only one outcome. What the data does challenge is a specific sales pitch.

“Add schema to boost AI citations” has been one of the more concrete recommendations in GEO guides. For example, Frase.io called schema markup “critically important for AI search, GEO, and AEO.”

Without data support for that claim, it’s harder to justify the investment. AI systems in searchVIU’s test relied on visible HTML during retrieval, not JSON-LD. That suggests content structure, clear headings, and direct answers in prose may matter more for AI citation than markup structure.

Looking Ahead

The question hanging over the SEO industry is where schema creates measurable value. Adding JSON-LD didn’t measurably increase AI citations for pages already visible in AI Overviews.

For those pages, schema looks more like plumbing that serves other systems than a lever that moves citation counts. That’s still real value, but it’s a different pitch.


Featured Image: BEST-BACKGROUNDS/Shutterstock

More Resources

The Tech SEO Audit for the AI Search Era: How to Maximize Your AI Visibility via @sejournal, @JetOctopus

This post was sponsored by JetOctopus. The opinions expressed in this article are the sponsor’s own.

How do I optimize my site for ChatGPT and Perplexity, not just Google?

How do I know if AI bots are actually crawling my site?

How should my technical SEO strategy change for AI Search?

A significant portion of your site’s search impressions in 2026 are generated by machines researching on behalf of humans.

Those machines don’t care about your keyword rankings. They care whether your:

  • HTML loads cleanly in under 200 milliseconds
  • Product detail page is reachable in fewer than four clicks
  • Content answers a specific, nine-word question that has never appeared in any keyword research tool in your career.

This isn’t speculation. It’s what our server log data across hundreds of enterprise websites is showing us, consistently, since mid-2025.

What’s Actually Happening On Your Site

My colleague, Stan, flagged a pattern in a Slack message: query lengths were growing at rates that didn’t correlate with human behavior.

A 161% growth rate in 10-word queries year-over-year is not driven by users who suddenly got more verbose. It’s driven by AI agents decomposing a single user prompt into dozens of parallel sub-queries, a process researchers now call “fan-out.”

Query Length Growth in 2025

Image created by JetOctopus, Aggregated GSC data across hundreds of enterprise properties, 2025

The gradient is the tell. Human search behavior doesn’t scale this cleanly by word count. Machines do. By October 2025, 7-plus-word queries reached nearly 1% of total query volume, roughly triple their historical share.

More revealing than the volume is the CTR. While impression counts for 10-word queries spiked 161%, click-through rate collapsed to 2.26%, down from 8–11% in 2023.

The AI reads your page, extracts the answer, synthesizes it for the user. Your site never gets the visit.

We call these “phantom impressions.” They’re real signals that your content is being evaluated inside AI reasoning chains. If you’re filtering them out of your reporting because they don’t drive traffic, you are flying blind.

The Three Bots Visiting Your Site & Their Impact On SERP Visibility

Not all AI crawlers are equal, and treating them as a single category is the first mistake most technical SEOs make.

Training bots crawl broadly and ignore click depth. A training visit means the AI knows your content exists, not that users will ever see it.

AI search bots drop off quickly beyond two or three clicks from the homepage and typically visit each page only once a month.

AI user bots are initiated when a real person asks a question in ChatGPT, Perplexity, or Claude, and the AI researches the answer on their behalf. These are the only visits that translate to actual AI visibility.

Bot Type What Triggers It Crawl Depth Impact on AI Visibility
Training bots Model education cycles Deep — ignores click distance None directly. Awareness only.
AI search bots New URL discovery & fresh content Shallow — ~1 visit/month beyond 2–3 clicks Critical gatekeeper. If it misses a page, user bots won’t find it either.
AI user bots Real user query in ChatGPT / Claude / Perplexity Selective — driven by speed and structure High. Closest proxy to an AI impression.

Your site can receive heavy crawling from training and search bots and still be completely absent from AI-generated answers. If you’re not segmenting AI bot traffic by type in your log analysis, you have no idea which third of the iceberg you’re measuring.

Which SEO Signals Do LLMs Respect?

Robots.txt is your primary lever.

Most major AI platforms (ChatGPT, Claude, Gemini) follow robots.txt directives. Perplexity is a partial exception: PerplexityBot respects robots.txt, but Perplexity-User, the user-triggered bot, does not. Cloudflare confirmed this in an investigation. Most sites haven’t audited their robots.txt with AI access in mind. Do it.

Sitemaps are broadly supported.

ChatGPT, Claude, and PerplexityBot all use XML sitemaps for URL discovery. Keep them accurate.

Signals Best Saved For SEO & Ranking Efforts

These signals below don’t appear to impact AI visibility, but are still key for ranking for queries that still trigger traditional SERPs.

Canonical tags and noindex directives do nothing for AI bots.

AI crawlers don’t build a search index, so they have no use for these meta-signals. Content hidden from Google using noindex is fully visible to ChatGPT’s crawler.

LLM.txt does nothing.

Our log data shows major AI bots don’t read this file. Don’t invest time here.

JavaScript rendering is a critical blind spot.

Most AI crawlers (ChatGPT, Claude, Perplexity) don’t render JavaScript. If your product pages load key content client-side, those agents read an empty shell. Server-side rendering is the only architecture that works universally. The exception is Google Gemini, which uses the same Web Rendering Service as Googlebot.

How To Make Sure ChatGPT, Perplexity & LLMs Can Reach Your Content

AI search bots visit deep pages roughly once a month and drop off sharply beyond three clicks from the homepage. The pages with the most specific, answerable information are often the hardest for agents to reach.

The fix: Elevate your most valuable deep pages through internal linking, ensuring they’re reachable within four clicks.

Pages crawled by training bots but never reached by user bots are your highest-priority targets. Pages AI user bots visit frequently are telling you what to scale: more content covering the same topic cluster and depth.

Optimize Content For Longer, Fan-Out Queries

95% of the queries driving AI citations have zero monthly search volume. They’re synthetic sub-queries generated by AI models. But they show up in GSC: impressions, no clicks, query lengths you’d never target voluntarily.

How To Find Fan Out Query Opportunities

To surface fan out queries that are worth chasing, connect your GSC API to JetOctopus (to bypass the 1,000-row UI limit) and filter for: query length greater than 7 words, impressions under 50, clicks at 0, over the last 3 months. That’s your Fan-Out Opportunity Matrix, the exact questions AI agents are asking about your content.

Prompt Types That Fan Out Most

Image created by JetOctopus, 2025

If your content isn’t structured to answer list and comparison queries, with explicit rankings, pros/cons, and side-by-side specs, you’re leaving the highest fan-out surface area unoptimized.

“Product review” intent queries surged from 239 in June 2025 to over 40,000 by September 2025. That 16,000% increase was AI agents systematically harvesting structured opinion data. If your product pages lack this depth, you’re invisible to that harvest.

The Technical Audit: Where to Start

Step 1: Identify AI User Bot Traffic In Logs

Pull raw server logs (Apache/Nginx) and export all lines containing these user agents: OAI-SearchBot and ChatGPT-User, PerplexityBot and Perplexity-User, Claude-SearchBot and Claude-User. Then manually group hits by user-agent patterns and endpoints in a spreadsheet. To distinguish training bots from user bots, you’ll need to maintain your own classification list — one that changes often and isn’t standardized.

In JetOctopus Log Analyzer, this segmentation is built in: filter by bot type (training, search, and user) in a few clicks and immediately see which pages AI user bots visit (your AI-visible content, ready to scale) versus pages training bots hit but user bots never reach (your highest-priority fix targets).

Step 2: Audit Technical Accessibility Of Deep Pages

Pick a sample of deep URLs and check HTML payload size, confirm key content isn’t injected via JavaScript by viewing raw HTML, simulate crawl depth by counting clicks from the homepage, and test load time in Chrome DevTools or Lighthouse. Also check whether important content sits behind accordions or “View More” elements — these require JavaScript execution that AI bots skip entirely. For large sites with thousands of deep pages, this sampling approach misses a lot. AI agents don’t click. If information only appears after user interaction, it doesn’t exist for these crawlers.

Step 3: Clean Up Your Robots.txt

Open your robots.txt and review all Disallow and Allow directives for every user-agent line by line. AI bots follow Disallow rules, so make sure you’re not accidentally blocking important URLs. Manually test key URLs to confirm they aren’t blocked. A 30-minute audit here can prevent you from blocking crawlers you want in, or exposing content you’d rather keep out.

Step 4: Map Your Phantom Impressions

Export data from GSC Performance reports filtered by impressions with zero clicks. Because of the 1,000-row UI limit, you’ll need to use the GSC API or export in chunks by date and query, then merge datasets in spreadsheets or BigQuery. Also factor in query frequency: long queries appearing daily are likely not fan-outs.

Connect your GSC API to JetOctopus to bypass the row limit and build your Fan-Out Opportunity Matrix automatically — the exact questions AI agents are asking about your content, ready to act on.

Step 5: Monitor The Changes

Set up a recurring export process — pull GSC data monthly and compare impressions over time, re-run log analysis scripts and diff bot activity, track Core Web Vitals separately in PageSpeed Insights or CrUX. You’ll end up stitching together multiple data sources with no unified alerting, making it hard to catch regressions early.

JetOctopus Alerts covers exactly this: unified notifications for changes in AI bot activity alongside Googlebot behavior, Core Web Vitals, on-page SEO issues, and SERP efficiency drops, so you catch regressions before they compound.

The New KPI: Technical Accessibility

SEO in 2026 is restructuring around one constraint: can an AI agent crawl, reach, and extract a fact from your 50,000th product page in under 200 milliseconds?

If the answer is no, your rankings, backlinks, and content quality become irrelevant for a growing share of search interactions. The machines are searching. The question is how quickly you can see what’s actually happening.

Start with your logs. Everything else follows from there.

Want to see exactly how AI bots are interacting with your site: which pages they reach, which they skip, and where your fan-out opportunities are hiding? Book a live walkthrough of the JetOctopus platform. We’ll pull your actual log data and show you what your GSC reports aren’t telling you.

Image Credits

Featured Image: Image by JetOctopus. Used with permission.

Schema Markup Didn’t Move AI Citations In Ahrefs Test via @sejournal, @MattGSouthern

Schema markup is far more common on pages cited by AI. But a new Ahrefs report found that adding it didn’t result in a clear increase in citations.

Ahrefs tracked 1,885 web pages that added JSON-LD schema. Each page was matched against control pages that never added schema, and citation changes were measured across Google AI Overviews, AI Mode, and ChatGPT.

No platform showed a meaningful citation increase after schema was added.

What Ahrefs Found

The report analyzed 6 million URLs and found that pages cited by AI were roughly three times more likely to include JSON-LD. This gap has been seen as evidence that schema improves AI visibility. However, Ahrefs tested whether this held true when isolated from other signals, since sites with schema tend to invest in better content and earn more links.

They ran a controlled comparison, matching each schema page with three control pages from different domains with similar citation levels that never added JSON-LD. Citation changes were measured 30 days before and after schema addition.

Using its Brand Radar tool and Agent A, Ahrefs conducted a matched difference-in-differences analysis to account for platform trends. Here’s what was found.

  • Google AI Overviews: −4.6% (a small but statistically notable decline relative to controls)
  • Google AI Mode: +2.4% (too small to distinguish from random variation)
  • ChatGPT: +2.2% (too small to distinguish from random variation)

Three more tests were run alongside the primary comparison, and all four found no clear positive or negative effect.

The AI Overview Decline

The −4.6% decline in the AI Overview section deserves context. Ahrefs reports both treated and control pages were already declining before schema was added. Treated pages declined slightly faster, but the difference is small, with about 12 fewer daily citations per page in a sample where most pages received hundreds.

The report notes that the decline could reflect a small negative effect from schema, or it could be coincidence. It doesn’t draw a conclusion either way.

What The Report Doesn’t Cover

Every page in the dataset had 100+ AI Overview citations before any schema was added. These pages were already in the consideration set, being crawled and surfaced.

The report admits this limitation. For pages not yet visible to AI, schema might still aid crawling, parsing, or indexing, but the data can’t confirm this.

The report also notes other limitations. Pages adding JSON-LD often change other elements, making it hard to separate schema effects from those changes. All schema types were pooled, so some might perform differently. The 30-day window might miss slower effects.

A searchVIU experiment cited in the report tested whether five AI systems used schema markup when fetching pages in real time. None did; they only extracted visible HTML, ignoring JSON-LD, Microdata, and RDFa. This was a direct-fetch test, not proof of schema’s role during training, indexing, or retrieval.

Why This Matters

Schema markup is frequently recommended for AI visibility. However, Ahrefs’ data complicates this. While schema supports rich results and knowledge graphs, adding JSON-LD doesn’t increase AI citations for pages already cited.

The data shows a correlation: pages with schema are cited more often by AI, but Ahrefs interprets this as a sign of overall site quality rather than schema’s direct impact.

Looking Ahead

The report can’t determine whether schema helps pages that aren’t yet cited, which is a different group of pages that need another study. If pages are visible to AI, JSON-LD probably won’t boost citations.


Featured Image: Roman Samborskyi/Shutterstock

Google Drops FAQ Rich Results From Search via @sejournal, @MattGSouthern
  • Google has removed FAQ rich results from Search after limiting them for most sites.
  • Search Console reporting ends in June, followed by API support in August.
  • FAQ schema can stay on your pages, but it no longer earns visible FAQ results in Google.

Google deprecated FAQ rich results, completing a removal that started years ago. FAQ rich results were already restricted for most sites.

How To Design URL Structures For AI Retrieval, Not Just Rankings

For years, URL structure was a technical SEO checkbox. Keep it short, use hyphens, include the keyword, done.

While that playbook still works, it’s increasingly incomplete. A growing share of the target audience now discovers content through AI assistants and large language models like ChatGPT, Perplexity, Claude, Google’s AI Overviews, and more.

These systems retrieve and synthesize information differently from traditional search crawlers, and if your URL architecture isn’t built with that in mind, you are increasing your chances of not being cited by LLMs.

In the new age of search, we need to extend those SEO fundamentals to also align with AI bots and how they crawl URLs.

Why AI Systems Read URLs Differently

Search engines have spent decades developing sophisticated crawling and indexing infrastructure. They follow redirects, resolve canonicals, parse JavaScript (sometimes…), and can infer context from a page when the URL is a string of random characters.

AI retrieval systems, particularly retrieval-augmented generation (RAG) pipelines and web-connected LLMs, often work differently.

There are three core parts to how RAG works:

  1. The input prompt is converted into a vector embedding
  2. Relevant passages are then retrieved from indexed URLs, documents and knowledge graphs in traditional search results like Google and Bing.
  3.  An LLM like ChatGPT or similar will then process this information and generate a refined response.

A developer-built RAG system will essentially use data sources from URLs to extract content – they will crawl the URL, convert the web content into searchable “chunks” and store them as numerical vectors for later retrieval.

This is now also evolving into a realm of URL context grounding, which is specific to Gemini. The aim for URL context grounding is to help Gemini (and presumably AI Overviews / AI Mode) to better understand and answer questions about content and data in individual URLs without performing traditional RAG processing.

The aim here is for the LLM to specifically pull direct information from multiple URLs, analyze multiple reports and combine information from several sources to generate more accurate summaries. This should, in theory, help to improve AI factual accuracy and reduce hallucinations.

Then there’s zero shot classification – a technique that enables models to categorize the purpose of a webpage without any task-specific training data.

Rather than relying on labeled examples, the model analyzes semantic cues such as URL structures (treated as plain text strings) and maps them to predefined categories using methods like cosine similarity or prompt-based reasoning.

This works by drawing on the model’s pre-trained language knowledge to infer a page’s likely function, while also detecting distinct patterns in the words and phrasing that signal what type of content the page contains.

This has been particularly useful in identifying phishing links and other malicious links based solely on their URL patterns but also indicates how LLMs could begin to leverage zero-shot classification to rely solely on URLs to infer semantic relevance.

A URL that communicates nothing forces LLM models to work harder and introduces ambiguity in how the content gets categorized.

More practically, when an AI system cites a source in a response, it often surfaces the URL alongside the excerpt. That URL becomes visible to real users, in the same way it does in a search result, and they’re going to make real decisions about whether or not to click.

A clean, descriptive path builds trust in a way that something like /p?id-4821 never will.

The Core Principle Of URLs As Semantic Signals

Think of your URL structure as a secondary content layer – one that communicates hierarchy, topic, and specificity independently to the page title or H1, or other metadata.

A URL like /resources/seo/url-structure-ai-retrieval/ tells a retrieval system several things at once: This lives under a resources hub, it’s within an SEO category, and it covers a specific subtopic at a granular level.

That’s a useful signal. It maps to how AI systems try to understand content provenance and relevance before surfacing it in a response.

This matters especially for:

  • Long-tail and question-based queries, where AI systems are looking for precise matches to specific information needs.
  • Topical authority, where your URL hierarchy can reinforce that your domain owns a subject area.
  • Citation quality, where a descriptive URL increases the likelihood an AI agent references your content over a competitor’s near-identical page.

Practical Architecture Principles

There are a number of practical architecture principles that you should consider for both traditional search as well as AI search.

Use A Logical, Shallow Hierarchy

Deep nesting (i.e., /blog/category/subcategory/year/month/post-title/) creates noise, and your content is multiple steps away from the homepage. A structure three levels deep is almost always sufficient, i.e., domain > category > specific page. There are some CMS setups, like Shopify, where you are forced into four or five, depending on your theme (i.e., domain/blog/name-of-blog/blog-post-title/), but as long as you’re adding meaningful context and not administrative clutter, your structure will be aligned with the principle.

Make Every Segment Human-Readable And Descriptive

Avoid abbreviations, internal jargon, or ID numbers in public-facing URLs. A URL like /ai-search-optimization communicates the topic directly, whereas a URL like /aso-v2 communicates nothing without prior knowledge.

Align URL Slugs With The Actual Search Intent, Not Just The Keyword

There’s a big difference between /email-marketing and /email-marketing-best-practices-b2b. The second one signals specificity. It’s more likely to surface when an AI system is generating a response to a precise question, because the URL itself narrows the relevance scope before the content is even parsed.

Be Consistent With Category Naming Across Your Site

If your content strategy uses /guides/ for long-form education content and /blog/ for shorter commentary, maintain that consistently. It’s likely that AI retrieval systems build a model of your site structure over time. Inconsistency blurs the signal about what type of content lives where.

Avoid Keyword Stuffing In URLs

This is old SEO advice, but it also applies here. A URL crammed with keywords looks spammy to human users who see it cited in an AI response, which undermines the trust benefit you’re trying to build. One primary keyword or phrase per segment is the right call.

What Does This Look Like In Practice

If two different marketers are writing about the same topic, the URL structure could be key for RAG systems to better understand the context of the page as part of content retrieval.

An example:

Marketer A publishes /blog/2024/03/email-tips-part-4.

Marketer B publishes /resources/email-marketing/b2b-deliverability-guide.

Marketer B’s URL structure properly communicates hierarchy (resources hub), category (email marketing), and a specific focus (B2B deliverability) before a single word of body copy is processed.

Users are also more likely to benefit from this URL being cited because they can make sense of it immediately.

It can be argued that this type of clarity and specificity could compound as your URL structure and site’s information architecture can dictate the entire topical structure of your site, also helping to communicate both expertise and relevance.

The Redirect & Consolidation Problem

This is more relevant to enterprise sites that have accumulated URL debt like redirects, duplicate paths, and inconsistent slugs due to historical content management system migrations.

This could create a specific problem for AI retrieval if there are redirect chains and duplicate paths, as crawlers may not consistently land on the canonical version of a page, and different retrieval systems handle redirect resolution differently.

A practical fix will be to prioritize your website’s URLs. Audit your highest traffic and highest value pages, and confirm that their canonical URLs are clean, accessible, and structured in line with your current taxonomy.

Then work backward.

You don’t need to restructure the entire site for the chance of being cited in AI responses, but especially for your highest value pages, you should ensure that you’re offering the best possible URL signals.

What You Should Avoid Changing

It’s important not to always chase the big and shiny, so don’t completely restructure your entire site’s URL architecture just for marginal AI retrieval gains.

URL restructuring carries real SEO risk and time to recover link equity if 301 redirects are put in place – and there have been many web migration horror stories that can attest to what can happen when they’re not implemented correctly.

The goal is to apply these principles to new content and flag structural problems in existing high-value pages where the case to remediate these issues is clear and lower risk.

If your current URL structure already follows clean, descriptive, hierarchical conventions (which is all a standard part of SEO best practice), then congratulations! You’ve been optimizing for AI retrieval without even knowing.

In Summary

URL structure has always been a relatively small signal, but as AI assistants become more of a meaningful discovery channel, URL structures have the potential to be cited in more places than just Google and Bing.

They can help you to appear in AI-generated answers, they can shape citation quality, and they can contribute to how retrieval systems will categorize your content before anything else.

Simply build URLs that tell the story of your content clearly, before the user clicks on it.

More Resources:


Featured Image: Vitya_M/Shutterstock

The Technical SEO Audit Needs A New Layer via @sejournal, @slobodanmanic

The standard technical SEO audit checks crawlability, indexability, website speed, mobile-friendliness, and structured data. That checklist was designed for one consumer: Googlebot.

This is how it’s always been.

In 2026, your website has, at least, a dozen additional non-human consumers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot train models and power AI search results. User-triggered agents like the newly announced Google-Agent, or its “siblings” Claude-User and ChatGPT-User, browse websites on behalf of specific humans in real time. A Q1 2026 analysis across Cloudflare’s network found that 30.6% of all web traffic now comes from now bots, with AI crawlers and agents making up a growing share. Your technical audit needs to account for all of them.

Here are the five layers to add to your existing technical SEO audit.

Layer 1: AI Crawler Access

Your robots.txt was probably written for Googlebot, Bingbot, and maybe a few scrapers. AI crawlers need their own robots.txt rules, and they need to be separate from Googlebot and Bingbot.

What To Check

Review your robots.txt for rules targeting AI-specific user agents: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, AppleBot-Extended, CCBot, and ChatGPT-User. If none of these appear, you’re running on defaults, and those defaults might not reflect what you actually want. Never accept the defaults unless you know they are exactly what you need.

The key is making a conscious decision per crawler rather than blanket allowing or blocking everything. Not all AI crawlers serve the same purpose. AI crawler traffic can be split into three categories: training crawlers that collect data for model training (89.4% of AI crawler traffic according to Cloudflare data), search crawlers that power AI search results (8%), and user-triggered agents like Google-Agent and ChatGPT-User that browse on behalf of a specific human in real time (2.2%). Each category warrants a different robots.txt decision.

Chart showing traffic volume by crawler purpose - Cloudflare Radar Q1 2026
Cloudflare Radar data showing traffic volume by crawl purpose (Q1 2026); Screenshot by author, April 2026

The crawl-to-referral ratios from Cloudflare’s Radar report can make this an informed decision for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for every single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI answers. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents data extraction from a provider that sends zero traffic back. The crawl-to-referral ratios tell you who is taking without giving.

There is one crawler that requires special attention. Google added Google-Agent to its official list of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI systems running on Google infrastructure that browse websites on behalf of users. Unlike traditional crawlers, Google-Agent ignores robots.txt. Google’s position is that since a human initiated the request, the agent acts as a user proxy rather than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt rules. This is both interesting, and important for the future, even if it’s not within the scope of this article.

Official documentation for each crawler:

Layer 2: JavaScript Rendering

Googlebot renders JavaScript using headless Chromium. There is nothing new about that. What is new and different is that virtually every major AI crawler does not render JavaScript.

Crawler Renders JavaScript
GPTBot (OpenAI) No
ClaudeBot (Anthropic) No
PerplexityBot No
CCBot (Common Crawl) No
AppleBot Yes
Googlebot Yes

AppleBot (which uses a WebKit-based renderer) and Googlebot are the only major crawlers that render JavaScript. Four of the six major web crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML only, making server-side rendering a requirement for AI search visibility, not an optimization. If your content lives in client-side JavaScript, it is invisible to the crawlers training OpenAI, Anthropic, and Perplexity’s models and powering their AI search products.

What To Check

Run curl -s [URL] on your critical pages and search the output for key content like product names, prices, or service descriptions. If that content isn’t in the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it either. Alternatively, use View Source in your browser (not Inspect Element, which shows the rendered DOM after JavaScript execution) and check whether the important information is present in the raw HTML.

CURL fetch of No Hacks homepage
Curl fetch of No Hacks homepage (Image from author, April 2026)

Single-page applications (SPAs) built with React, Vue, or Angular are particularly at risk unless they use server-side rendering (SSR) or static site generation (SSG). A React SPA that renders product descriptions, pricing, or key claims entirely on the client side is sending AI crawlers a blank page with a link to the JavaScript bundle.

The fix isn’t complicated. Server-side rendering (SSR), static site generation (SSG), or pre-rendering solves this for every major framework. Next.js supports SSR and SSG natively for React, Nuxt provides the same for Vue, and Angular Universal handles server rendering for Angular applications. The audit just needs to flag which pages depend on client-side JavaScript for critical content.

Layer 3: Structured Data For AI

Structured data has been part of technical SEO audits for years, but the evaluation criteria need updating. The question is no longer just “does this page have schema markup?” It’s “does this markup help AI systems understand and cite this content?”

What To Check

  • JSON-LD implementation (preferred over Microdata and RDFa for AI parsing).
  • Schema types that go beyond the basics: Organization, Article, Product, FAQ, HowTo, Person.
  • Entity relationships: sameAs, author, publisher connections that link your content to known entities.
  • Completeness: are all relevant properties populated, or are you just checking a box using skeleton schemas with name and URL?

Why This Matters Now

Microsoft’s Bing principal product manager Fabrice Canel confirmed in March 2025 that schema markup helps LLMs understand content for Copilot. The Google Search team stated in April 2025 that structured data gives an advantage in search results.

No, you can’t win with schema alone. Yes, it can help.

The data density angle matters too. The GEO research paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (presented at ACM KDD 2024, first to publicly use the term “GEO”) found that adding statistics to content improved AI visibility by 41%. Yext’s analysis found that data-rich websites earn 4.3x more AI citations than directory-style listings. Structured data contributes to data density by giving AI systems machine-readable facts rather than requiring them to extract meaning from prose.

An important caveat: No peer-reviewed academic studies exist yet on schema’s impact on AI citation rates specifically. The industry data is promising and consistent, but treat these numbers as indicators rather than guarantees.

W3Techs reports that approximately 53% of the top 10 million websites use JSON-LD as of early 2026. If your website isn’t among them, you’re missing signals that both traditional and AI search systems use to understand your content.

Duane Forrester, who helped build Bing Webmaster Tools and co-launched Schema.org, argues that schema markup is only step one. As AI agents continue moving from simply interpreting pages to making decisions, brands will also need to publish operational truth (pricing, policies, constraints) in machine-verifiable formats with versioning and cryptographic signatures. Publishing machine-verifiable source packs is beyond the scope of a standard audit today, but auditing structured data completeness and accuracy is the foundation verified source packs build on.

Layer 4: Semantic HTML And The Accessibility Tree

The first three layers of the AI-readiness audit cover crawler access (robots.txt), JavaScript rendering, and structured data. The final two address how AI agents actually read your pages and what signals help them discover and evaluate your content.

Most SEOs evaluate HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the way Googlebot does. They read the accessibility tree instead.

The accessibility tree is a parallel representation of your page that browsers generate from your HTML. It strips away visual styling, layout, and decoration, keeping only the semantic structure: headings, links, buttons, form fields, labels, and the relationships between them. Screen readers like VoiceOver and NVDA have used the accessibility tree for decades to make websites usable for people with visual impairments. AI agents now use the same tree to understand and interact with web pages.

And the reason is simple: efficiency. Processing screenshots is both more expensive and slower than working with the accessibility tree.

Accessibility tree shown in Google Chrome
This is what an accessibility tree looks like in Google Chrome (Image from author, April 2026)

This matters because the accessibility tree exposes what your HTML actually communicates, not what your CSS (or JS) makes it look like. A

styled to look like a button doesn’t appear as a button in the accessibility tree. An image without alt text means nothing. A heading hierarchy that skips from H1 to H4 creates a broken structure that both screen readers and AI agents will struggle to navigate.

Microsoft’s Playwright MCP, the standard tool for connecting AI models to browser automation, uses accessibility snapshots rather than raw HTML or screenshots. Playwright MCP’s browser_snapshot function returns an accessibility tree representation because it’s more compact and semantically meaningful for LLMs. OpenAI’s documentation states that ChatGPT Atlas uses ARIA tags to interpret page structure when browsing websites.

Web accessibility and AI agent compatibility are now the same discipline. Proper heading hierarchy (H1-H6) creates meaningful sections that AI systems use for content extraction. Semantic elements like

,

,

, and

tell machines what role each content block plays. Form labels and descriptive button text make interactive elements understandable to agents that parse the accessibility tree instead of rendering visual design.

What To Check

  • Heading hierarchy: logical H1-H6 structure that machines can use to understand content relationships.
  • Semantic elements: nav, main, article, section, aside, header, footer, used appropriately.
  • Form inputs: every input has a label, every button has descriptive text.
  • Interactive elements: clickable things use or , not

    .

  • Accessibility tree: run a Playwright MCP snapshot or test with VoiceOver/NVDA to see what agents actually see.

Somehow, things are getting worse on this front. The WebAIM Million 2026 report found that the average web page now has 56.1 accessibility errors, up 10.1% from 2025.

ARIA (Accessible Rich Internet Applications) usage increased 27% in a single year. ARIA is a set of HTML attributes that add extra semantic information to elements, telling screen readers and AI agents things like “this div is actually a dialog” or “this list functions as a menu.” But what’s critical is this: pages with ARIA present had significantly more errors (59.1 on average) than pages without ARIA (42 on average). Adding ARIA without understanding it makes things worse, not better, because incorrect ARIA overrides the browser’s default accessibility tree interpretation with wrong information. Start with proper semantic HTML. Add ARIA only when native elements aren’t sufficient.

Technical SEOs do not need to become accessibility experts. But treating accessibility as someone else’s problem is no longer viable when the same tree that screen readers parse is now the primary interface between AI agents and your website.

Sidenote: The Markdown Shortcut Doesn’t Work

Serving raw markdown files to AI crawlers instead of HTML can result in a 95% reduction in token usage per page. However, Google Search Advocate John Mueller called this “a stupid idea” in February 2026 on Bluesky. Mueller’s argument was this: “Meaning lives in structure, hierarchy and context. Flatten it and you don’t make it machine-friendly, you make it meaningless.” LLMs were trained on normal HTML pages from the beginning and have no problems processing them. The answer isn’t to create a flat, simplified version for machines. It’s to make the HTML itself properly structured. Well-written semantic HTML already is the machine-readable format. Besides, that simplified version already exists in the accessibility tree, and it is what AI agents already use.

Layer 5: AI Discoverability Signals

The final layer covers signals that don’t fit neatly into traditional audit categories but directly affect how AI systems discover and evaluate your website.

llms.txt (dis-honourable mention). Listed first for one reason only, ask any LLM what you should do to make your website more visible to AI systems, and llms.txt will be at or near the top of the list. It’s their world, I guess. The llms.txt specification provides a simple markdown file that helps AI agents understand your website’s purpose, structure, and key content. No large-scale adoption data has been published yet, and its actual impact on AI citations is unproven. But LLMs consistently recommend it, which means AI-powered audit tools and consultants will flag its absence. It takes minutes to create and costs nothing to maintain.

OK, now that we’ve got that out of the way, let’s look at what might really matter.

AI crawler analytics. Are you monitoring AI bot traffic? Cloudflare’s AI Audit dashboard shows which AI crawlers visit, how often, and which pages they hit. If you’re not on Cloudflare, check server logs for Google-Agent, ChatGPT-User, and ClaudeBot user agent strings. Google publishes a user-triggered-agents.json file containing IP ranges that Google-Agent uses, so you can verify whether incoming requests are genuinely from Google rather than spoofed user agent strings.

Entity definition. Does your website clearly define what the business is, who runs it, and what it does? Not in marketing copy, but in structured, machine-parseable markup. Organization schema should include name, URL, logo, founding date, and sameAs links to verified profiles on LinkedIn, Crunchbase, and Wikipedia. Person schema for key people should connect them to the organization via author and employee properties. AI systems need to resolve your identity as a distinct entity before they can confidently recommend you over competitors with similar names or offerings. Don’t slap this on top of your website when your designer is done with their work. Start here; it will make your life easier.

Content position. Where you place information on the page directly affects whether AI systems cite it. Kevin Indig’s analysis of 98,000 ChatGPT citation rows across 1.2 million responses found that 44.2% of all AI citations come from the top 30% of a page. The bottom 10% earns only 2.4-4.4% of citations regardless of industry. Duane Forrester calls this “dog-bone thinking”: strong at the beginning and end, weak in the middle, a pattern Stanford researchers have confirmed as the “lost in the middle” phenomenon. Audit your key pages: are the most important claims and data points in the first 30%, or buried in the middle?

Content extractability. Pull any key claim from your page and read it in isolation. Does it still make sense without the surrounding paragraphs? AI retrieval systems, like ChatGPT, Perplexity, and Google AI Overviews, extract and cite individual passages and sentences that rely on “this,” “it,” or “the above” for meaning, become unusable when extracted from their original context. Ramon Eijkemans’ excellent utility-writing framework maps these principles to documented retrieval mechanisms: self-contained sentences, explicit entity relationships, and quotable anchor statements that AI systems can confidently cite without additional inference.

The Audit Checklist

Check Tool/Method What You’re Looking For
AI crawler robots.txt Manual review Conscious per-crawler decisions
JavaScript rendering curl, View Source, Lynx browser Critical content in static HTML
Structured data Schema validator, Rich Results Test Complete, connected JSON-LD
Semantic HTML axe DevTools, Lighthouse Proper elements, heading hierarchy
Accessibility tree Playwright MCP snapshot, screen reader What agents actually see
AI bot traffic Cloudflare, server logs Volume, pages hit, patterns

From Audit To Action

This audit identifies gaps. Fixing them requires a sequence, because some fixes depend on others. Optimizing content structure before establishing a machine-readable identity means agents can extract your information, but can’t confidently attribute it to your brand. I wrote Machine-First Architecture to provide that sequence: identity, structure, content, interaction, each pillar building on the previous one.

Why Technical SEO Audit Is Where This Belongs

None of this is technically SEO. Robots.txt rules for AI crawlers don’t affect Google rankings. Accessibility tree optimization doesn’t move keyword positions. Content position scoring has nothing to do with search indexing.

But most of it did grow out of technical SEO. Crawl management, structured data, semantic HTML, JavaScript rendering, server log analysis: these are skills technical SEOs already have. The audit methodology transfers directly. The consumer it serves is what changed.

The websites that get cited in AI responses, that work when Chrome auto browse visits them, that show up when someone asks ChatGPT for a recommendation, they won’t be the ones with the best content alone. They’ll be the ones whose technical foundation made that content accessible to machines. Technical SEOs are the people best equipped to build that foundation. The old audit template just needs a new section to reflect it.

More Resources:


Featured Image: Anton Vierietin/Shutterstock

Google May Expand Unsupported Robots.txt Rules List via @sejournal, @MattGSouthern

Google may expand the list of unsupported robots.txt rules in its documentation based on analysis of real-world robots.txt data collected through HTTP Archive.

Gary Illyes and Martin Splitt described the project on the latest episode of Search Off the Record. The work started after a community member submitted a pull request to Google’s robots.txt repository proposing two new tags be added to the unsupported list.

Illyes explained why the team broadened the scope beyond the two tags in the PR:

“We tried to not do things arbitrarily, but rather collect data.”

Rather than add only the two tags proposed, the team decided to look at the top 10 or 15 most-used unsupported rules. Illyes said the goal was “a decent starting point, a decent baseline” for documenting the most common unsupported tags in the wild.

How The Research Worked

The team used HTTP Archive to study what rules websites use in their robots.txt files. HTTP Archive runs monthly crawls across millions of URLs using WebPageTest and stores the results in Google BigQuery.

The first attempt hit a wall. The team “quickly figured out that no one is actually requesting robots.txt files” during the default crawl, meaning the HTTP Archive datasets don’t typically include robots.txt content.

After consulting with Barry Pollard and the HTTP Archive community, the team wrote a custom JavaScript parser that extracts robots.txt rules line by line. The custom metric was merged before the February crawl, and the resulting data is now available in the custom_metrics dataset in BigQuery.

What The Data Shows

The parser extracted every line that matched a field-colon-value pattern. Illyes described the resulting distribution:

“After allow and disallow and user agent, the drop is extremely drastic.”

Beyond those three fields, rule usage falls into a long tail of less common directives, plus junk data from broken files that return HTML instead of plain text.

Google currently supports four fields in robots.txt. Those fields are user-agent, allow, disallow, and sitemap. The documentation says other fields “aren’t supported” without listing which unsupported fields are most common in the wild.

Google has clarified that unsupported fields are ignored. The current project extends that work by identifying specific rules Google plans to document.

The top 10 to 15 most-used rules beyond the four supported fields are expected to be added to Google’s unsupported rules list. Illyes did not name specific rules that would be included.

Typo Tolerance May Expand

Illyes said the analysis also surfaced common misspellings of the disallow rule:

“I’m probably going to expand the typos that we accept.”

His phrasing implies the parser already accepts some misspellings. Illyes didn’t commit to a timeline or name specific typos.

Why This Matters

Search Console already surfaces some unrecognized robots.txt tags. If Google documents more unsupported directives, that could make its public documentation more closely reflect the unrecognized tags people already see surfaced in Search Console.

Looking Ahead

The planned update would affect Google’s public documentation and how disallow typos are handled. Anyone maintaining a robots.txt file with rules beyond user-agent, allow, disallow, and sitemap should audit for directives that have never worked for Google.

The HTTP Archive data is publicly queryable on BigQuery for anyone who wants to examine the distribution directly.


Featured Image: Screenshot from: YouTube.com/GoogleSearchCentral, April 2026. 

Google Lists Best Practices For Read More Deep Links via @sejournal, @MattGSouthern

Google updated its snippet documentation today with a new section on “Read more” deep links in Search results. The section outlines three best practices for increasing the likelihood that a page appears with these deep links.

What A Read More Deep Link Is

Google defines the feature as “a link within a snippet that leads users to a specific section on that page.”

The examples in the documentation show the link appearing inside the snippet area of a standard Search result.

Screenshot from: developers.google.com/search/docs/appearance/snippet, April 2026.

The Three Best Practices

Google lists three best practices that can increase the likelihood of these links appearing.

First, content must be immediately visible to a human on page load. Content hidden behind expandable sections or tabbed interfaces can reduce that likelihood, per Google’s guidance.

Second, avoid using JavaScript to control the user’s scroll position on page load. One example Google gives is forcing the user’s scroll to the top of the page.

Third, if the page uses history API calls or window.location.hash modifications on page load, keep the hash fragment in the URL. Removing it breaks deep linking behavior.

More Context

Read more deep links are one type of anchor URL that appears in Search Console performance reports. John Mueller previously addressed those hashtag URLs, confirming that they come from Google and link to page sections.

Before today’s addition, the documentation was last revised in 2024. That change clarified page content, not the meta description, as the primary source of search snippets.

Why This Matters

For websites, the new guidance outlines what can increase the likelihood that a Read more deep link will appear.

Pages using accordion UI patterns, tabbed content, or forced-scroll JavaScript may reduce that likelihood. Teams working with single-page applications should ensure that hash fragments remain in URLs during page loads.

Looking Ahead

This is a documentation clarification, not a new SERP feature. Read more deep links have appeared in Search for some time. What’s new is the written guidance on how to increase that likelihood.

Developers working on JavaScript-heavy sites should test how their pages handle scroll position and hash fragments on initial load. Today’s update provides clearer signals on what can reduce the likelihood of a “Read more” link appearing.


Featured Image: Blossom Stock Studio/Shutterstock

Machine-First Architecture: AI Agents Are Here And Your Website Isn’t Ready, Says NoHacks Podcast Host via @sejournal, @theshelleywalsh

AI agents are already here. Not as a concept, not as a demo, but shipping inside browsers used by billions of people. Every major tech company has launched either a browser with AI built in or an extension that acts on your behalf.

Anthropic’s Claude for Chrome can navigate websites, fill forms, and perform multi-step operations on your behalf. Google announced Gemini in Chrome with agentic browsing capabilities, including auto browse, which can act on webpages for you. OpenClaw, the open-source AI agent, connects large language models directly to browsers, messaging apps, and system tools to execute tasks autonomously.

For more understanding about optimizing for agents, I spoke to Slobodan Manic, who recently wrote a five-part series on optimizing websites for AI agents. His perspective sits at the intersection of technical web performance and where AI agent interaction is actually heading.

From Slobodan Manic’s testing, almost every website is structurally broken for this shift.

“It started with us going to AI and asking questions. And now AI is coming to us and meeting us where we are. From my testing, I noticed that websites are nowhere near being ready for this shift because structurally almost every website is broken.”

The Single Biggest Thing That’s Changed

I started by asking Slobodan what’s changed in the last six to nine months that means SEOs need to pay attention to AI agents right now.

“Every major tech company has launched either a browser that has AI in it that can do things for you or some kind of extension that gets into Chrome. Claude has a plugin for Chrome that can do things for you, not just analyze web pages, summarize web pages, but actually perform operations.”

When ChatGPT first launched in 2023, making AI widely accessible, in parallel with how we started typing basic queries in search engines 25 years ago, we asked AI questions. We are now becoming more sophisticated and fluid with our prompting as we realize that AI can do so much more than [write me an email to politely decline an invitation].

Agents represent an even bigger shift to a different dynamic, where AI can complete tasks on our behalf and run complex systems. [Check my emails and delete any that are spam, sort them into a priority group, and surface what needs my immediate attention and provide a qualified response to anything on a basic query, plus make appointments in my calendar for any meeting invites].

Understanding and taking advantage of the possibilities is something we are all trying to figure out right now. What we should be aware of is that most websites aren’t built or ready for this agentic world.

Websites Are Becoming Optional, Or Are They?

I have a theory that brand websites are becoming hubs, the central point that connects all of your content assets online. But Slobodan has gone further. He’s written about websites becoming optional for the end user, with pages built by machines for machines and the interaction happening through closed system interfaces. I asked him to expand on that vision and what kind of timeframe we’re realistically looking at.

“First I’ll say that this is not fully happening today. This is still near to mid future. This is not March 2026,” he clarified. But the signals are concrete.

“Google had a patent granted in January that will let them use AI to rewrite the landing page for you if your landing page is not good enough. And then we have all these other companies including Google that announced Gemini browsing for you inside Chrome. So we have an end-to-end AI system that does everything while humans just wait for results.”

He was careful not to overstate it. People still like to browse, read, and compare things. Websites aren’t disappearing.

“Just the same way as mobile traffic has not killed desktop traffic even if it’s taken a bigger share of traffic overall, higher percentage of overall traffic while the desktop traffic is staying flat in terms of absolute numbers, I think this is another lane that will open where things will be happening without a human being involved in every step.”

His timeline for this: “Within a year we can have this become a reality. Not majority, but if Google starts rewriting landing pages using AI, we will see this happening probably 2027, if not sooner.”

When Checkout Becomes A Protocol

Slobodan has written that checkout is becoming a protocol, not a page. If an AI agent can buy on your behalf without ever loading a brand’s website, I asked, “What does that mean for how brands build trust and differentiate when the customer never sees their site?”

“If you’re building trust in a checkout page, you’re doing it wrong. Let’s start there. That I firmly believe. This is not to do with AI. This was never the right place to build trust,” he responded.

Slobodan pointed to every Shopify checkout page that looks identical. “There’s no trust built there. It’s just a machine-readable page that looks the same for everyone, for every brand. You’re supposed to be doing your job before the user needs to pay you.”

This is where he referenced Jono Alderson, and the concept of upstream engineering. “Moving upstream and doing work there and not on the website is the only way to move forward for anyone whose job is optimizing websites. That’s SEO, that’s CRO, that’s content, that’s anyone doing any kind of website work.”

He best summarized by saying “Your website is a part of the equation. Your website is not the equation. And that’s the biggest structural shift that people need to make to survive moving forward.”

What SEOs And Brands Should Actually Do Now

I asked what SEOs and brands can practically start doing to transition over the next year. His answer reframed how we should think about the website itself.

“If your website was your storefront, and it was for decades, people come to you, people do business there. It needs to be a warehouse and a storefront moving forward or you’re not going to survive. Simple as that.”

“We had all those bookstores that were selling books in the ’90s and then Amazon shows up and then you need to be a warehouse. You need to exist in two planes at the same time for the near future at least. So focusing only on your website is the most wrong thing you can do moving forward.”

His main area of focus right now is what he calls machine-first architecture. The principle is to build for machines before you build for humans.

“You don’t build your website for humans until you’ve built it for machines. When you’re working on a product page, there’s no Figma, there’s no design, there’s no copy. You start with your schema. What is your schema supposed to say? What is the meaning of the page? You start with the meaning and then from that build into a web page as it’s built for humans.”

He compared it directly to the mobile-first shift. “That did not mean no desktop. That meant do the more difficult version of it first and then do the easy thing. Trust me, it’s a lot more complicated to add meaning and structure to a page that’s already been designed than to do it the other way.”

And it extends beyond the website. “If you’re saying something on your website, you better check all of your profiles everywhere online, what people are saying about you. It’s everything everywhere all at once. But this is what optimization has become and what it needs to be.”

I also put to him the argument that optimizing for LLMs is fundamentally different from SEO. His response was unequivocal.

“Hard disagree. The hardest possible disagree. If you were doing things the right way, working on the foundations and checking every box that has to be checked, it’s not different at all.”

Where he sees a difference is in the speed of consequences. “With AI in the mix, you just get exposed much faster and the consequences are much greater. There’s nothing different other than those two things.”

This echoed something I’ve felt strongly. The cycle is moving more quickly, but there’s so much similarity with what happened at the foundation of this industry 25 to 30 years ago, which I raised in my SEO Pioneers series. We’re feeling our way through in the same way. And Slobodan agreed.

“They figured this out once and maybe we should ask them how to figure it out again.”

Vibe Coding Is A Trap, Deep Work Is The Moat

For my last question, I put it to Slobodan that he’s said vibe coding is a trap and deep work is the only moat left. For the SEO practitioner feeling overwhelmed, what’s the one thing they should actually do this week?

“It’s really the foundations. I hate to give the boring answer, but it’s really fixing every single foundational thing that you have on your website or your website presence.”

He’s watched the industry chase one shiny tool after another. “There’s always a new shiny toy to work on while your website doesn’t work with JavaScript disabled. Just ignore all of that until you’ve fixed every single broken foundation you have on your website.”

On vibe coding specifically, he was precise: “I don’t like the term vibe coding. It just suggests that you have no idea what you’re doing and you’re happy about it. That’s the way that sounds to me. The concept of AI-assisted coding, it’s there. It’s great. It’s not going away.”

“But just focus on what you should be doing first before you use AI to do it faster.”

What resonated with me is how well this applies to writing, too. AI is brilliant at confidently producing a draft that, at first glance, looks great. But when you actually read it, you realize it’s just somebody confidently talking nonsense.

Slobodan nailed the core problem: “You need to know what good is and what good looks like. Because AI will always give you something. If you don’t know enough about that specific thing, it will always look good from the outside. And there’s a reason why everyone is okay with vibing everything except for their own profession, because they try it and they see that the results are just horrific.”

Build For Machines First, Everything Else Follows

The one thing to take away from this conversation is to build for machines first, then humans. Not because human user experience won’t matter, but because getting the machine layer right first makes the human layer better.

Your website is no longer the only version of your business that people, or agents, will encounter. The brands that treat it as part of a wider ecosystem rather than the whole ecosystem are the ones that will come through this transition in the strongest position.

Watch the full video interview with Slobodan Manic here, or on YouTube.

Thank you to Slobodan for sharing his insights and being my guest on IMHO.

More Resources:


This post was originally published on Shelley Edits.


Featured Image: Shelley Walsh/Search Engine Journal

ChatGPT Now Crawls 3.6x More Than Googlebot: What 24M Requests Reveal

This post was sponsored by Alli AI. The opinions expressed in this article are the sponsor’s own. 

Everyone assumes Googlebot is the dominant crawler hitting their website. That assumption is now wrong.

We analyzed 24,411,048 proxy requests across 78,000+ pages on 69 customer websites on Alli AI’s crawler enablement platform over a 55-day period (January to March 2026). OpenAI’s ChatGPT-User crawler made 3.6x more requests than Googlebot across our data sample. And that’s not even counting GPTBot, OpenAI’s separate training crawler.

A note on methodology: Crawler identification used user agent string matching, verified against published IP ranges. Request metrics are measured at the proxy/CDN layer. The dataset covers 69 websites across a variety of industries and sizes, predominantly WordPress-based. Full methodology is detailed at the end.

Finding 1: AI Crawlers Now Outpace Google 3.6x & ChatGPT Leads the Pack

Image created by Alli AI, April 2026.

When we ranked every identified crawler by request volume, the results were unambiguous:

Rank Crawler Requests Category
1 ChatGPT-User (OpenAI) 133,361 AI Search
2 Googlebot 37,426 Traditional Search
3 Amazonbot 35,728 AI / E-Commerce
4 Bingbot 18,280 Traditional Search
5 ClaudeBot (Anthropic) 13,918 AI Search
6 MetaBot 10,756 Social
7 GPTBot (OpenAI) 8,864 AI Training
8 Applebot 6,794 AI Search
9 Bytespider (ByteDance) 6,644 AI Training
10 PerplexityBot 5,731 AI Search

ChatGPT-User made more requests than Googlebot, Amazonbot, and Bingbot combined.

Image created by Alli AI, April 2026.

Grouped by purpose, AI-related crawlers (ChatGPT-User, GPTBot, ClaudeBot, Amazonbot, Applebot, Bytespider, PerplexityBot, CCBot) made 213,477 requests versus 59,353 for traditional search crawlers (Googlebot, Bingbot, YandexBot). AI crawlers are now making 3.6x more requests than traditional search crawlers across our network.

Finding 2: OpenAI Uses 2 Crawlers (And Most Sites Don’t Know the Difference)

Image created by Alli AI, April 2026.

OpenAI operates two distinct crawlers with very different purposes.

ChatGPT-User is the retrieval crawler. It fetches pages in real time when users ask ChatGPT questions that require up-to-date web information. This determines whether your content appears in ChatGPT’s answers.

GPTBot is the training crawler. It collects data to improve OpenAI’s models. Many sites block GPTBot via robots.txt but not ChatGPT-User, or vice versa, without understanding the distinct consequences of each.

Combined, OpenAI’s crawlers made 142,225 requests: 3.8x Googlebot’s volume.

The robots.txt directives are separate:

User-agent: GPTBot      # Training crawler — feeds OpenAI's models
User-agent: ChatGPT-User # Retrieval crawler — fetches pages for ChatGPT answers

Finding 3: AI Crawlers Are Faster & More Reliable, But Their Volume Adds Up

Image created by Alli AI, April 2026.

AI crawlers are significantly more efficient per request:

Crawler Avg Response Time 200 Success Rate
PerplexityBot 8ms 100%
ChatGPT-User 11ms 99.99%
GPTBot 12ms 99.9%
ClaudeBot 21ms 99.9%
Bingbot 42ms 98.4%
Googlebot 84ms 96.3%

Two likely reasons. First, AI retrieval crawlers are fetching specific pages in response to user queries, not exhaustively discovering site architecture. They know what they want, they grab it, and they leave. Second, while all crawlers on our infrastructure receive pre-rendered responses, Googlebot’s broader crawl pattern means it requests a wider range of URLs, including stale paths from sitemaps and its own legacy index, which adds latency from redirect chains and error handling that retrieval crawlers avoid entirely.

But there’s a catch: while each individual request is lightweight, the sheer volume means aggregate server load is substantial. ChatGPT-User at 11ms × 133,361 requests is still a real infrastructure cost, just distributed differently than Googlebot’s fewer, heavier requests.

Finding 4: Googlebot Sees a Different (Worse) Version of Your Site

Image created by Alli AI, April 2026.

Googlebot’s 96.3% success rate versus near-perfect rates for AI crawlers reveals an important structural difference.

Googlebot received 624 blocked responses (403) and 480 not found errors (404), accounting for 3% of its requests. Meanwhile, ChatGPT-User achieved 99.99% success. PerplexityBot hit a perfect 100%.

Image created by Alli AI, April 2026.

Why the gap? The most likely explanation is index age and crawl behavior, not site misconfiguration.

Googlebot maintains a massive legacy index built over years of continuous crawling. It routinely re-requests URLs it already knows about — including pages that have since been deleted (404s) or restructured (403s). This is normal behavior for a search engine maintaining an index of this scale, but it means a meaningful percentage of Googlebot’s requests are directed at URLs that no longer exist.

AI crawlers don’t carry that baggage. ChatGPT-User fetches specific pages in response to real-time user queries, targeting content that’s currently relevant and linked. That’s a structural advantage that produces near-perfect success rates.

Industry Reports Confirm AI Crawling Surged 15x in 2025

These findings align with broader industry trends. Cloudflare’s 2025 analysis reported ChatGPT-User requests surging 2,825% YoY, with AI “user action” crawling increasing more than 15x over the course of 2025. Akamai identified OpenAI as the single largest AI bot operator, accounting for 42.4% of all AI bot requests. Vercel’s analysis of nextjs.org confirmed that none of the major AI crawlers currently render JavaScript.

Our data shows this crossover may already be happening at the site level for properties that actively enable AI crawler access.

Your New SEO Strategy: How To Audit, Clean Up & Optimize For AI Crawlers

1. Audit your robots.txt for AI crawlers today

Most robots.txt files were written for a Googlebot-first world. At minimum, have explicit directives for ChatGPT-User, GPTBot, ClaudeBot, Amazonbot, PerplexityBot, Applebot, Bytespider, CCBot, and Google-Extended.

Our recommendation: Most businesses benefit from allowing both retrieval crawlers (ChatGPT-User, PerplexityBot, ClaudeBot) and training crawlers (GPTBot, CCBot, Bytespider), training data is what teaches these models about your brand, products, and expertise. Blocking training crawlers today means AI models learn less about you tomorrow, which reduces your chances of being cited in AI-generated answers down the line.

The exception: if you have content you specifically need to protect from model training (proprietary research, gated content), use granular Disallow rules for those paths rather than blanket blocks.

2. Clean up stale URLs in Google Search Console

Our data shows Googlebot hits a 3% error rate, mostly 403s and 404s, while AI crawlers achieve near-perfect success rates. That gap likely reflects Googlebot re-crawling legacy URLs that no longer exist. But those failed requests still consume the crawl budget.

Audit your GSC crawl stats for recurring 404s and 403s. Set up proper redirects for restructured URLs and submit updated sitemaps.

3. Treat AI crawler accessibility as a distinct SEO channel

Ranking in ChatGPT’s answers, Perplexity’s results, and Claude’s responses is emerging as a distinct visibility channel. If your content isn’t accessible to these crawlers, particularly if you’re running JavaScript-heavy frameworks, you’re invisible in AI search.

We’ve published a live dashboard showing how AI crawler traffic breaks down across a real site: which platforms are visiting, how often, and their share of total traffic; if you want to see what this looks like in practice.

4. Plan for volume, not just individual request weight

AI crawlers send light, fast requests, but they send many of them. ChatGPT-User alone accounted for more than 133,000 requests in 55 days. The aggregate server load from AI crawlers is now likely exceeding your Googlebot load. Make sure your hosting and CDN can handle it, the low per request response times in our data reflect the fact that Alli AI serves pre-rendered static HTML from the CDN edge, which is exactly the kind of architecture that absorbs this volume without taxing your origin server.

Methodology

This analysis is based on 24,411,048 HTTP proxy requests processed through Alli AI’s crawler enablement platform between January 14 and March 9, 2026, covering 69 customer websites.

Crawler identification used user agent string matching, verified against published IP ranges. For OpenAI crawlers specifically, every request was cross-referenced against OpenAI’s published CIDR ranges. This confirmed 100% of GPTBot requests and 99.76% of ChatGPT-User requests originated from OpenAI’s infrastructure. The remaining 0.24% (requests from spoofed user agents) were excluded.

Limitations: The dataset is scoped to Alli AI customers who have opted into crawler enablement. Crawlers that don’t self-identify via user agent are not captured. Response time measurements are at the proxy layer, not the origin server.

About Alli AI

Alli AI provides server-side rendering infrastructure for AI and search engine crawlers. This analysis was produced using data from our proxy infrastructure to help the SEO community better understand the evolving crawler landscape.

Want to see this data in action? See the breakdown firsthand by visiting our AI visibility dashboard.


Image Credits

Featured Image: Image by Alli AI. Used with permission.

In-Post Iamges: Images by Alli AI. Used with permission.