Mueller Explains Why Google Uses Markdown On Dev Docs via @sejournal, @MattGSouthern

Google’s John Mueller says markdown pages serve a specific purpose for developer documentation sites but won’t help most websites, even as search becomes more agentic.

Mueller laid out his reasoning in a Bluesky thread. He was responding to a question from Lily Ray about why Google publishes LLMs.txt files and markdown pages, even though they aren’t needed for search performance.

His response focused mainly on markdown versions of developer documentation, not llms.txt as a standalone file.

Mueller wrote:

“The short answer is that it’s not done for search. There’s more to websites than just SEO :-).”

Mueller’s Discovery Vs. Functionality Framework

His reasoning focuses on two different website goals.

He called the first “discovery,” or being found via a search engine, and the second “functionality,” which helps users complete tasks on the page.

Mueller acknowledged the term wasn’t precise. “There’s probably a more accurate term for this,” he wrote in the thread.

He compared the distinction to calls to action on traditional pages, stating:

“You don’t ‘do them’ for SEO (to be found), but if you’re responsible for the website overall, ensuring a high ‘discovery rate’ (SEO) together with a high conversion rate is useful to justify your work.”

Why Developer Docs Are Different

On developers.google.com, he noted, markdown versions make sense.

Mueller said;

“AI coding has gotten very popular, and these coding systems can be (I think) efficient and accurate with the code they produce if they can easily read / parse reference material, such as developer documentation.”

He added that markdown can help AI systems “understand the context of the documentation they’re looking at, as well as a simplified version of the reference page.”

Mueller called this a workaround rather than a long-term need, adding:

“OF COURSE they can read HTML just fine, so this is imo more of a temporary crutch, perhaps to save some tokens.”

Non-Developer Sites Should Skip It

For everyone else, Mueller was direct, stating:

“For non-developer sites, I don’t think this makes much sense, even with more agentic traffic in the future. Making a markdown version of a shoe’s specs is not going to get you more sales (competitors appreciate it tho).”

He went further in a follow-up post, pushing back on the idea that sites should prepare for a future where agents drive more traffic.

Mueller added:

“And (I know, nobody reads this far), if you think this is important to prepare for when agents are everywhere: your site (all sites) have much more important things to do for SEO than to prepare for a potential future situation that may or may not come. Prioritize needs before dreams.”

Why This Matters

Mueller’s comments show a more detailed position than his earlier statements on the topic.

In February, Mueller called the idea of serving markdown pages to bots “a stupid idea.” His Bluesky comments carve out an exception for developer documentation while holding the line for every other type of site.

The thread also arrived on the same day we reported that Google’s guidance on llms.txt now depends on which product you ask. Google’s generative AI optimization guide says to skip llms.txt, while Lighthouse 13.3 added an experimental audit that checks for the file as part of agentic browsing readiness.

Looking Ahead

Mueller’s distinction between discovery and on-page functionality can help you evaluate whether agentic optimization is worth their time. The test is whether building for agents right now produces measurable results for a specific site.

The “prioritize needs before dreams” line captures a broader tension in the industry right now. Vendors have been promoting llms.txt and markdown optimization as emerging practices, but neither Google’s search documentation nor independent data support investing in these for non-developer sites.


Featured Image: kirill_makarov/Shutterstock

Google’s llms.txt Guidance Depends On Which Product You Ask via @sejournal, @MattGSouthern

Google’s Search and Chrome documentation now point in different directions on llms.txt, depending on whether the goal is Search visibility or agentic browser readiness.

Google Search recently published a new optimization guide that lists llms.txt among the tactics you don’t need for generative AI features. The guide groups it with content chunking, AI-specific rewriting, and special schema.

Days earlier, Google’s Lighthouse tool shipped version 13.3, which added a new Agentic Browsing category. The update includes an llms.txt audit that checks whether a site provides the file and flags server errors when retrieving it.

The Lighthouse documentation describes llms.txt as a way to provide “a machine-readable summary of a website’s content, specifically designed for LLMs and AI agents.” It adds that without the file, “agents may spend more time crawling the site to understand its high-level structure and primary content.”

What Google Search Has Said

Google’s Search team has maintained for over a year that llms.txt is not a Google initiative or something Google plans to adopt.

John Mueller compared llms.txt to the keywords meta tag, noting no AI services used it and bots didn’t request the file. He called building separate Markdown pages for bots “a stupid idea.

At Search Central Live Deep Dive Asia Pacific, Gary Illyes and Amir Taboul confirmed Google was not pursuing llms.txt.

Google’s optimization guide explicitly states llms.txt should be skipped, providing the most recent direct statement from the Search team.

What Chrome’s Lighthouse Now Does

Lighthouse 13.3 ships with the Agentic Browsing category by default, checking WebMCP integration, agent accessibility, layout stability, and llms.txt.

The llms.txt audit only marks sites as “Not Applicable” if they return a 404; errors flag the audit. The Lighthouse docs describe llms.txt as an “emerging convention” at llmstxt.org, advising site owners to create and place it in their root directory.

This category is separate from SEO audits and indicates that llms.txt helps browser-based agents understand site structure, not improve search rankings or AI citations.

Google Has Been Here Before

Google’s internal teams have sent mixed signals on llms.txt before.

In December, Lidia Infante spotted an llms.txt file on Google’s Search Central developer documentation. Mueller responded on Bluesky with “hmmn :-/” and didn’t clarify further.

Dave Smart noted that the file appeared on multiple Google developer properties, including developer.chrome.com and web.dev. The pattern suggested an internal CMS platform update that automatically deploys llms.txt files, not a Search team decision.

The Search Central file was removed within hours, but files on other Google properties remained.

Why This Matters

Google’s answer on llms.txt varies by use case.

For Google Search, llms.txt isn’t needed for AI Overviews, AI Mode, or other generative AI Search features.

For browser-based agents, Lighthouse considers llms.txt optional in an experimental machine interaction category.

Guidance is split between different Google developer sites, which can lead to conflicting instructions when comparing Lighthouse or its llms.txt documentation with Google’s Search docs.

Looking Ahead

Google hasn’t commented on the documentation gap between the two product teams.

For many sites, creating a basic llms.txt file is simple, but maintaining it is questionable, given that Google Search states it’s unnecessary for AI Search visibility.


Featured Image: Stock-Asso/Shutterstock

Google Testing Web Bot Auth To Verify AI Agent Requests via @sejournal, @MattGSouthern

Google published documentation explaining its testing of Web Bot Auth, an experimental IETF protocol that can help websites cryptographically verify some automated requests from bots and AI agents.

The protocol adds another verification layer by letting agents sign HTTP requests with cryptographic keys. Websites can then verify those signatures against published public keys to confirm the request came from who it claims to be.

What’s New

Web Bot Auth uses HTTP Message Signatures (RFC 9421) to let automated clients sign outgoing requests. A bot holds a private key, publishes its public key at a known URL, and signs each request. The receiving website checks the signature against the public key to confirm identity.

Google says a subset of signed Google-Agent requests are authenticated as https://agent.bot.goog. Signed requests include a Signature-Agent HTTP header set to g="https://agent.bot.goog", and the corresponding signature can be verified using public keys published at that domain’s .well-known directory.

According to Google’s documentation, bot-detection services, CDNs, and WAFs already support the protocol. The IETF draft is authored by Thibault Meunier of Cloudflare and Sandor Major of Google. Cloudflare publishes a reference implementation on GitHub.

The IETF Web Bot Auth Working Group was chartered in early 2026 with milestones for standards-track specifications and a best current practice document.

What Google Is Not Doing Yet

Not all Google user agents are participating. The documentation says Google is testing with “some AI agents hosted on Google infrastructure” but does not name which ones beyond the Google-Agent user-triggered fetcher.

Even for participating agents, not every request is signed. The documentation recommends that sites continue relying on IP addresses, reverse DNS, and user-agent strings as the primary verification method while signed traffic rolls out gradually.

The Internet-Draft could change as the working group develops the standard.

Why This Matters

Bot impersonation has been a persistent problem. Scrapers and bad actors can spoof user-agent strings to disguise their traffic as Googlebot or other legitimate crawlers, making it harder for site owners to tell real bot traffic from fake.

We covered this issue when Google’s Martin Splitt warned that “not everyone who claims to be Googlebot actually is Googlebot.” The available verification methods at the time were reverse DNS lookups and IP range checks. Web Bot Auth would add a layer that can’t be forged without the agent’s private key.

For sites already using a CDN or WAF that supports the protocol, verification may happen automatically. For everyone else, the experimental status means there is no urgency to act. The documentation recommends treating existing verification as the default and Web Bot Auth as supplementary.

Looking Ahead

Web Bot Auth is still moving through the standards process, and Google’s implementation remains experimental.

For now, the practical change is visibility. Websites may start seeing signed requests from some Google-Agent traffic, while existing verification methods remain the default.

The next question is whether more AI agents adopt signed requests, and whether hosting providers make verification automatic for websites that don’t want to manage keys.

Your Website Is A Source, Not A Megaphone via @sejournal, @slobodanmanic

There’s a lesson from the early days of social media that most brands eventually learned the hard way: Social media is not a megaphone.

You couldn’t just broadcast your press releases into the feed and expect people to care. The channel had rules. It rewarded conversation, not announcements. The companies that figured this out early thrived. The rest spent years shouting into a void, wondering why nobody was engaging.

We’re watching the same mistake happen again, just one layer deeper. This time it’s not about which platform you’re on. It’s about assuming your website is where the message lives.

Why Most Websites Break When AI Agents Read Them

Most websites are still built on a core assumption: Someone will arrive at your front door, navigate your carefully designed pages, and consume your message in the exact sequence and format you intended.

That assumption is breaking.

In 2026, your website is no longer the only interface to your content. An AI agent might summarize your service page for someone mid-conversation. A voice assistant might read your pricing aloud, stripped of all visual hierarchy. A research tool might pull three paragraphs from your blog, recontextualize them alongside a competitor’s, and present them in a comparison the user never asked you for. Someone might never visit your site and still make a decision based entirely on what your website says.

If your message only works when it’s wrapped in your layout, your fonts, your carefully choreographed scroll, you don’t have a message. You have a brochure. And brochures don’t travel well.

The shift that’s happening is subtle but fundamental: You need to design the message independently of the medium.

This doesn’t mean your website stops mattering. It means your website is now one of many surfaces where your message might land. And the message has to hold up in all of them. It has to make sense when it’s read in full, when it’s summarized in three sentences, when it’s pulled apart and reassembled by something you didn’t build and don’t control.

That changes how you write. It changes how you structure information. It changes what you think of as “the product” of your content work.

Here’s a simple test: If there’s a single “Lorem ipsum” anywhere in your website while it’s being built, the message came second. The design came first. That order no longer works.

A few things this means in practice:

Your core message needs to be extractable. If an agent grabs one paragraph from your website, does that paragraph carry weight on its own, or does it collapse without the paragraphs around it?

Your value proposition can’t hide behind design. Bold typography and hero animations don’t travel through an API. The words have to do the work.

Structure becomes a form of portability. Clear headings, logical hierarchy, well-defined claims. These aren’t just good for traditional SEO anymore. They’re how machines parse your intent and relay it accurately.

You need to think about your content the way a news agency thinks about a wire story. The story has to work no matter which publication picks it up, no matter how they crop it, no matter what headline they slap on it. The facts and the narrative have to be embedded in the text itself, not in the presentation layer.

Brand Control When AI Recontextualizes At Scale

There’s a natural resistance to this idea. “If I don’t control the experience, how do I control the brand?” But that’s the megaphone instinct talking. The desire to control exactly how every word lands, in exactly the right font, with exactly the right whitespace. That was always a bit of an illusion anyway. People skim. People read on phones in bad lighting. People copy-paste your pricing into a Slack thread with zero context.

The difference now is that the recontextualization is happening at scale, automatically, and often before a human even sees it.

So, the question isn’t how to prevent that. It’s how to make sure your message is strong enough to survive it.

Websites As Canonical Sources, Not Just Destinations

Your website still matters. But its job description has changed.

Your website is no longer just a destination. It’s a source. It’s the canonical, structured, well-maintained origin point from which your message gets picked up, interpreted, summarized, and carried elsewhere. The better that source material is, the better it travels.

Think of it this way: Your website used to be the store. Now, it’s also the warehouse. And the warehouse needs to be organized well enough that anyone (human or machine) can find what they need, understand what it means, and carry it somewhere else without losing the plot.

The companies that get this right will be the ones whose message shows up clearly, no matter where the conversation is happening. The ones that don’t will keep designing beautiful megaphones, and keep wondering why the room isn’t listening.

More Resources:


This post was originally published on No Hacks.


Featured Image: Pixel-Shot/Shutterstock

Google Tells Developers To Build For AI Agents, Not Just Humans via @sejournal, @MattGSouthern

Google’s web.dev site now includes guidance advising developers to treat AI agents as a distinct audience alongside human visitors.

Titled “Build agent-friendly websites,” it tells developers that “some human users are pivoting from manual navigation to delegating goal-oriented journeys to AI agents.”

Google frames this as a design problem, noting that websites built with complex hover states and shifting layouts are “functionally broken for agents.”

What The Guide Covers

Google describes three ways agents interpret websites:

  1. Screenshots let agents use vision models to identify elements visually.
  2. Raw HTML gives agents the DOM structure and hierarchy.
  3. And the accessibility tree provides what Google calls a “high-fidelity map” of interactive elements, stripped of visual noise.

Google’s recommendations for agent-friendly design include using semantic HTML elements like and over styled

elements, keeping layouts stable across pages, linking tags to inputs with the for attribute, and setting cursor: pointer on clickable elements.

Google wraps up with a statement that highlights the connection between agent optimization and current web standards:

“Everything we suggest to make a site ‘agent-ready’ also makes sites better for humans.”

WebMCP As A Forward Signal

At the bottom of the guide, Google links to WebMCP, a proposed web standard for helping websites interact with agents. Chrome’s team describes it as an early preview program and is accepting sign-ups for developers who want to experiment.

WebMCP would let websites register tools with defined input/output schemas that agents can discover and call as functions. Slobodan Manic covered WebMCP last week as part of the broader protocol stack forming around agent interaction.

Why This Matters

Semantic HTML, stable layouts, and proper accessibility markup have been web development defaults for years, and we’ve covered agent-optimization in depth.

What’s new is Google making this an official developer resource. Putting agent-friendliness on web.dev signals that Google is treating agent interaction as part of its developer guidance, alongside established areas like accessibility and performance.

For sites that already follow accessibility best practices, there’s little to change. For those that don’t, the business case for semantic HTML now extends beyond screen readers to AI agents that browse, compare, and transact on behalf of users.

Looking Ahead

The WebMCP early preview program is open for sign-ups. Chrome is listed for Google I/O on May 19–20, giving developers another place to watch for updates on browser-based agent interactions.


Featured Image: Summit Art Creations/Shutterstock

The Fully Non-Human Web: No One Builds The Page, No One Visits It via @sejournal, @slobodanmanic

In January 2026, Google was granted patent US12536233B1. Six engineers worked on it, and it describes a system that scores a landing page on conversion rate, bounce rate, and design quality. If the landing page falls below a threshold, generate an AI replacement personalized to the searcher. The advertiser never sees it. Never approves it. Might not even know it happened.

The debate around this patent has centered on scope: Is it limited to shopping ads, or does it signal something broader? That’s the wrong question.

The right question: What happens when you combine AI-generated pages with AI agents that browse, shop, and transact on behalf of humans?

For the first time, we have the infrastructure for a web where no human creates the page and no human visits it. Both sides can be non-human. That changes everything.

The Supply Side: AI-Generated Pages

The supply side of the web has always been human. Someone designs a page, writes copy, publishes it. Three developments are changing that.

Google’s patent US12536233B1 is the most direct: Score a landing page on conversion rate, bounce rate, and design quality, then replace underperforming pages with AI-generated versions. The replacement pages draw on the searcher’s full search history, previous queries, click behavior, location, and device data. Google builds personalized landing pages no advertiser can match, because no advertiser has access to cross-query behavioral data at that scale. Barry Schwartz covered the patent on Search Engine Land, describing a system where Google could automatically create custom landing pages, replacing organic results. Glenn Gabe called Google’s AI landing page patent potentially more controversial than AI Overviews. Roger Montti at Search Engine Journal argued the patent’s scope is limited to shopping and ads. Both camps agree: the technology to score and replace landing pages with AI exists and works.

NLWeb, Microsoft’s open project, takes a different approach. NLWeb turns any website into a natural language interface using existing Schema.org markup and RSS feeds. An AI agent querying an NLWeb-enabled site doesn’t load a page at all. The agent asks a structured question, NLWeb returns a structured answer. The rendered page becomes optional.

WebMCP goes further still. With WebMCP, a website registers tools with defined input/output schemas that AI agents discover and call as functions. A product search becomes a function call. A checkout becomes an API request. WebMCP eliminates the “page” concept entirely, dissolving the web page as a unit of content into a set of callable capabilities.

Each mechanism works differently, but the direction is the same: the page is becoming something generated, queried, or bypassed entirely. The human-designed, human-published web page is no longer the only way content reaches an audience.

The Demand Side: AI Agents As Visitors

The demand side shifted faster. In 2024, bots surpassed human traffic for the first time in a decade, accounting for 51% of all web activity. Cloudflare’s data shows AI “user action” crawling (agents actively doing things, not just indexing) grew 15x during 2025. Gartner predicts 40% of enterprise applications will feature task-specific AI agents by end of 2026, up from less than 5% in 2025. The scale is hard to overstate.

Agentic browsers are the most visible shift. Chrome’s auto browse turned 3 billion Chrome installations into potential AI agent launchpads. Google’s Gemini scrolls, clicks, fills forms, and completes multi-step tasks autonomously inside Chrome. Perplexity’s Comet browser conducts deep research across multiple sites simultaneously. Microsoft’s Edge Copilot Mode handles multi-step workflows from within the browser sidebar. The full agentic browser landscape now includes over a dozen consumer and developer tools, all browsing on behalf of humans.

Commerce agents have moved past browsing into buying. OpenAI launched Instant Checkout to let users purchase products directly inside ChatGPT, powered by Stripe’s Agentic Commerce Protocol (ACP). OpenAI killed the feature in March 2026 after near-zero purchase conversions and only a dozen merchant integrations out of over a million promised. The failure was execution, not concept: Alibaba’s Qwen app processed 120 million orders in six days in February 2026 because Alibaba owns the AI model, the marketplace, the payment rails (Alipay), and the logistics. OpenAI tried to replicate agentic commerce without owning the stack. Google and Shopify’s Universal Commerce Protocol (UCP) connects over 20 companies, including Walmart, Target, and Mastercard, in a framework designed for AI agents to handle commerce from product discovery through checkout. Shopify auto-opted over a million merchants into agentic shopping experiences with ChatGPT, Copilot, and Perplexity. The transaction happens in an AI conversation. No checkout page loads.

Agent-to-agent communication removes the human from both ends. Google’s Agent-to-Agent (A2A) protocol lets AI agents from different vendors discover each other’s capabilities and collaborate on tasks without human mediation. A travel planning agent negotiates directly with a booking agent. A procurement agent evaluates supplier agents across vendors. Over 150 organizations support A2A, including Salesforce, SAP, and PayPal, making agent-to-agent commerce and coordination a production reality.

When Both Sides Go Non-Human

Until now, one side of the web was always human. A person built the page, or a person visited it. Usually both.

Google’s patent closes the circuit.

Here’s what a complete non-human flow might look like. A user tells their AI assistant they need running shoes. The assistant queries product data through NLWeb or WebMCP, no page load needed. The assistant evaluates options by checking inventory across retailers via A2A. If the user needs to review a comparison, Google generates a landing page personalized to that specific user’s search history and preferences. The assistant completes checkout through ACP or UCP using Shared Payment Tokens. The user receives a confirmation.

The human’s role in that entire flow: stating intent and approving the purchase. Discovery, page generation, product evaluation, and transaction completion are all handled by AI systems. The human touches only the two endpoints of the chain.

Every piece of technology in that chain exists in production today. Chrome auto browse is live for 3 billion Chrome users. A2A has 150+ organizational supporters. ACP underpins Stripe’s agentic commerce infrastructure (ChatGPT’s Instant Checkout failed on execution, not protocol). UCP connects Shopify, Google, Walmart, and Target. Patent US12536233B1 is granted. No single company has assembled the full loop yet, but every component is operational.

Who’s Building The Non-Human Web

Here’s where it gets interesting. Map out who’s building what, and a pattern emerges:

Layer What Who
Page generation AI landing pages Google
Content-as-API WebMCP, NLWeb Google, Microsoft
Agent infrastructure MCP, A2A Anthropic, Google
Agent browsers Chrome, Comet, Copilot Google, Perplexity, Microsoft
Agent commerce ACP, UCP Stripe + OpenAI, Shopify + Google
Edge delivery Markdown for Agents Cloudflare

Google appears in five of six layers: page generation (patent US12536233B1), content-as-API (WebMCP), agent infrastructure (A2A), agent browsers (Chrome auto browse), and commerce (UCP). Google is positioning itself to mediate the non-human web the same way Google mediates the human one through Search.

The Agentic AI Foundation (AAIF), formed under the Linux Foundation with Anthropic, OpenAI, Google, and Microsoft as platinum members, provides the governance layer. The AAIF functions as the W3C for the agentic web: the vendor-neutral body that decides which protocols become standards for agent interoperability.

What Website Owners Need To Know

This isn’t an optimization checklist. It’s three structural shifts in what your website is for.

Your Data Layer Is Your Website

Google’s patent generates landing pages from product feed data, making product feeds the most important asset an ecommerce business maintains. NLWeb queries Schema.org markup instead of rendering pages, making structured markup the front door to your content. WebMCP exposes site capabilities as function calls, making tool definitions the user interface agents interact with.

Structured data, product feeds, JSON-LD, and API surfaces have traditionally been treated as backend infrastructure. In the non-human web, these data layers become the primary way a business reaches customers. Product feed accuracy (specs, pricing, stock levels, images) matters more than homepage design when AI systems generate the page from that feed.

Trust Is The Moat

AI can generate a page. It cannot generate a reason to seek you out by name.

Direct traffic, email subscribers, community members, and brand reputation persist when the page itself becomes replaceable. An AI agent can build a product page, but no AI agent can build the trust that makes a consumer (or their agent) request a specific brand by name.

The brands that matter in the non-human web are the ones people tell their agents to find. “Get me a fleece jacket” is a commodity query. “Get me a fleece jacket from Patagonia” is a brand moat.

The Measurement Problem

How do you measure a page you didn’t build? How do you A/B test against something Google generates dynamically? How do you attribute a conversion that happened inside ChatGPT, initiated by an agent acting on behalf of a user who never saw your website?

Traditional web analytics (page views, sessions, bounce rate, time on site) assume two things: a human visitor and a page you control. On the non-human web, neither assumption holds. A Google-generated landing page isn’t yours. A ChatGPT checkout session doesn’t register in your analytics.

I don’t have a clean answer here, and neither does anyone else. Measurement is the genuinely unsolved problem of the non-human web. New metrics will need to track agent discoverability, agent conversion rate, and data feed quality. But as of March 2026, the measurement infrastructure hasn’t caught up to the technology it needs to measure.

Four Predictions For 2026-2027

Four things to watch over the next 12-18 months.

Google ships patent US12536233B1, or something like it. The technology for scoring and replacing landing pages exists. The business incentive exists. Google has a history of introducing features in ads first, then expanding (Google Shopping went from free to paid to essential). AI-generated landing pages will likely appear in shopping ads first, then broaden to other verticals. Landing page quality scores in Google Ads serve as the early warning system for which pages Google considers replaceable.

Agent traffic becomes measurable. Analytics platforms will need to distinguish human sessions from agent sessions. BrightEdge reports AI agents account for roughly 33% of organic search activity as of early 2026. WP Engine’s traffic data shows 1 AI bot visit for every 31 human visits by Q4 2025, up from 1 per 200 at the start of that year. Agent traffic ratios will accelerate further as Chrome auto browse rolls out globally beyond the US. New metrics around agent conversion rate and agent discoverability will emerge from necessity.

The protocol stack consolidates. MCP, A2A, NLWeb, and WebMCP form a coherent stack covering tool access, agent communication, content querying, and browser-level integration. Expect more interoperability between these protocols and fewer competing standards. The Agentic AI Foundation (AAIF) accelerates consolidation. Within 18 months, “does your site support MCP?” will be as standard a question as “is your site mobile-friendly?”

Brand differentiation gets harder and more important. When AI generates pages and agents do the shopping, the only defensible position is being the brand people (and their agents) seek out by name. Direct relationships, owned audiences, trust signals. Everything else is a commodity.

The Web Splits In Two

When Shopify auto-opted merchants into agentic shopping, I asked whether your website just became optional. The answer is more nuanced than optional or essential. It’s becoming something different.

The web isn’t dying. It’s splitting.

The transactional web (product listings, checkout flows, information retrieval, comparison shopping) is going non-human first. AI generates the landing pages. AI agents visit and transact on those pages. Humans approve decisions at the endpoints. Google’s patent lives in the transactional web, and the economics of conversion optimization push hardest toward automation in this layer.

The experiential web (brand storytelling, community, content that rewards sustained attention, design that creates emotional response) stays human. Not because AI can’t generate brand experiences, but because the value of those experiences comes from the human connection behind them. Nobody tells their agent to “go enjoy a brand experience on my behalf.”

Your website’s new job description: data source for the agents, trust anchor for the humans, brand home for both. The companies that treat their structured data, product feeds, and API surfaces with the same care they give their homepage design are the ones that show up in both worlds.

The non-human web isn’t replacing the human web. It’s growing alongside it. Your job is to show up in both.

More Resources:


This was originally published on No Hacks.


Featured Image: Yaaaaayy/Shutterstock

Cloudflare’s New Markdown for AI Bots: What You Need To Know via @sejournal, @MattGSouthern

Cloudflare launched a feature that converts HTML pages to markdown when AI systems request it. Sites on its network can now serve lighter content to bots without building separate pages.

The feature, called Markdown for Agents, works through HTTP content negotiation. An AI crawler sends a request with Accept: text/markdown in the header. Cloudflare intercepts it, fetches the original HTML from the origin server, converts it to markdown, and delivers the result.

The launch arrives days after Google’s John Mueller called the idea of serving markdown to AI bots “a stupid idea” and questioned whether bots can even parse markdown links properly.

What’s New

Cloudflare described the feature as treating AI agents as “first-class citizens” alongside human visitors. The company used its own blog post as an example. The HTML version consumed 16,180 tokens while the markdown conversion used 3,150 tokens.

“Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside,” the company wrote.

The conversion happens at Cloudflare’s edge network, not at the origin server. Websites enable it per zone through the dashboard, and it’s available in beta at no additional cost for Pro, Business, and Enterprise plan customers, plus SSL for SaaS customers.

Cloudflare noted that some AI coding tools already send the Accept: text/markdown header. The company named Claude Code and OpenCode as examples.

Each converted response includes an x-markdown-tokens header that estimates the token count of the markdown version. Developers can use this to manage context windows or plan chunking strategies.

Content-Signal Defaults

Converted responses include a Content-Signal header set to ai-train=yes, search=yes, ai-input=yes by default, signaling the content can be used for AI training, search use, and AI input (including agentic use). Whether a given bot honors those signals depends on the bot operator. Cloudflare said the feature will offer custom Content-Signal policies in the future.

The Content Signals framework, which Cloudflare announced during Birthday Week 2025, lets site owners set preferences for how their content gets used. Enabling markdown conversion also applies a default usage signal, not just a format change.

How This Differs From What Mueller Criticized

Mueller was criticizing a different practice. Some site owners build separate markdown pages and serve them to AI user agents through middleware. Mueller raised concerns about cloaking and broken linking, and questioned whether bots could even parse markdown properly.

Cloudflare’s feature uses a different mechanism. Instead of detecting user agents and serving alternate pages, it relies on content negotiation. The same URL serves different representations based on what the client requests in the header.

Mueller’s comments addressed user-agent-based serving, not content negotiation. In a Reddit thread about Cloudflare’s feature, Mueller responded with the same position. He wrote, “Why make things even more complicated (parallel version just for bots) rather than spending a bit of time improving the site for everyone?”

Google defines cloaking as showing different content to users and search engines with the intent to manipulate rankings and mislead users. The cloaking concern may apply differently here. With user-agent sniffing, the server decides what to show based on who’s asking. With content negotiation, the client requests a format and the server responds. The content is the same information in a different format, not different content for different visitors.

The practical result is still similar from a crawler’s perspective. Googlebot requesting standard HTML would see a full webpage. An AI agent requesting markdown would see a stripped-down text version of the same page.

New Radar Tracking

Cloudflare also added content type tracking to Cloudflare Radar for AI bot traffic. The data shows the distribution of content types returned to AI agents and crawlers, broken down by MIME type.

You can filter by individual bot to see what content types specific crawlers receive. Cloudflare showed OAI-SearchBot as an example, displaying the volume of markdown responses served to OpenAI’s search crawler.

The data is available through Cloudflare’s public APIs and Data Explorer.

Why This Matters

If you already run your site through Cloudflare, you can enable markdown conversion with a single toggle instead of building separate markdown pages.

Enabling Markdown for Agents also sets the Content-Signal header to ai-train=yes, search=yes, ai-input=yes by default. Publishers who have been careful about AI access to their content should review those defaults before toggling the feature on.

Looking Ahead

Cloudflare said it plans to add custom Content-Signal policy options to Markdown for Agents in the future.

Mueller’s criticism focused on separate markdown pages, not on standard content negotiation. Google hasn’t addressed whether serving markdown through content negotiation falls under its cloaking guidelines.

The feature is opt-in and limited to paid Cloudflare plans. Review the Content-Signal defaults before enabling it.

Google Updates Googlebot File Size Limit Docs via @sejournal, @MattGSouthern

Google updated its Googlebot documentation to clarify information about file size limits.

The change involves moving information about default file size limits from the Googlebot page to Google’s broader crawler documentation. Google also updated the Googlebot page to be more specific about Googlebot’s own limits.

What’s New

Google’s documentation changelog describes the update as a two-part clarification.

The default file size limits that previously lived on the Googlebot page now appear in the crawler documentation. Google said the original location wasn’t the most logical place because the limits apply to all of Google’s crawlers and fetchers, not just Googlebot.

With the defaults now housed in the crawler documentation, Google updated the Googlebot page to describe Googlebot’s specific file size limits more precisely.

The crawling infrastructure docs list a 15 MB default for Google’s crawlers and fetchers, while the Googlebot page now lists 2 MB for supported file types and 64 MB for PDFs when crawling for Google Search.

The crawler overview describes a default limit across Google’s crawling infrastructure, while the Googlebot page describes Google Search–specific limits for Googlebot. Each resource referenced in the HTML, such as CSS and JavaScript, is fetched separately.

Why This Matters

This fits a pattern Google has been running since late 2025. In November, Google migrated its core crawling documentation to a standalone site, separating it from Search Central. The reasoning was that Google’s crawling infrastructure serves products beyond Search, including Shopping, News, Gemini, and AdSense.

In December, more documentation followed, including faceted navigation guidance and crawl budget optimization.

The latest update continues that reorganization. The 15 MB file size limit was first documented in 2022, when Google added it to the Googlebot help page. Mueller confirmed at the time that the limit wasn’t new. It had been in effect for years. Google was just putting it on the record.

When managing crawl budgets or troubleshooting indexing on content-heavy pages, Google’s docs now describe the limits differently depending on where you look.

The crawling infrastructure overview lists 15 MB as the default for all crawlers and fetchers. The Googlebot page lists 2 MB for HTML and supported text-based files, and 64 MB for PDFs. Google’s changelog does not explain how these figures relate to one another.

Default limits now live in the crawler overview documentation, while Googlebot-specific limits are on the Googlebot page.

Looking Ahead

Google’s documentation reorganization suggests there will likely be more updates to the crawling infrastructure site in the coming months. By separating crawler-wide defaults from product-specific documentation, Google can more easily document new crawlers and fetchers as they are introduced.

OpenAI Search Crawler Passes 55% Coverage In Hostinger Study via @sejournal, @MattGSouthern

Hostinger analyzed 66 billion bot requests across more than 5 million websites and found that AI crawlers are following two different paths.

LLM training bots are losing access to the web as more sites block them. Meanwhile, AI assistant bots that power search tools like ChatGPT are expanding their reach.

The analysis draws on anonymized server logs from three 6-day windows, with bot classification mapped to AI.txt project classifications.

Training Bots Are Getting Blocked

The starkest finding involves OpenAI’s GPTBot, which collects data for model training. Its website coverage dropped from 84% to 12% over the study period.

Meta’s ExternalAgent was the largest training-category crawler by request volume in Hostinger’s data. Hostinger says this training-bot group shows the strongest declines overall, driven in part by sites blocking AI training crawlers.

These numbers align with patterns I’ve tracked through multiple studies. BuzzStream found that 79% of top news publishers now block at least one training bot. Cloudflare’s Year in Review showed GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives across top domains.

The data quantifies what those studies suggested. Hostinger interprets the drop in training-bot coverage as a sign that more sites are blocking those crawlers, even when request volumes remain high.

Assistant Bots Tell a Different Story

While training bots face resistance, the bots that power AI search tools are expanding access.

OpenAI’s OAI-SearchBot, which fetches content for ChatGPT’s search feature, reached 55.67% average coverage. TikTok’s bot grew to 25.67% coverage with 1.4 billion requests. Apple’s bot reached 24.33% coverage.

These assistant crawls are user-triggered and more targeted. They serve users directly rather than collecting training data, which may explain why sites treat them differently.

Classic Search Remains Stable

Traditional search engine crawlers held steady throughout the study. Googlebot maintained 72% average coverage with 14.7 billion requests. Bingbot stayed at 57.67% coverage.

The stability contrasts with changes in the AI category. Google’s main crawler faces a unique position since blocking it affects search visibility.

SEO Tools Show Decline

SEO and marketing crawlers saw declining coverage. Ahrefs maintained the largest footprint at 60% coverage, but the category overall shrank. Hostinger attributes this to two factors. These tools increasingly focus on sites actively doing SEO work. And website owners are blocking resource-intensive crawlers.

I reported on the resource concerns when Vercel data showed GPTBot generating 569 million requests in a single month. For some publishers, the bandwidth costs became a business problem.

Why This Matters

The data confirms a pattern that’s been building over the past year. Site operators are drawing a line between AI crawlers they’ll allow and those they won’t.

The decision comes down to function. Training bots collect content to improve models without sending traffic back. Assistant bots fetch content to answer specific user questions, which means they can surface your content in AI search results.

Hostinger suggests a middle path: block training bots while allowing assistant bots that drive discovery. This lets you participate in AI search without contributing to model training.

Looking Ahead

OpenAI recommends allowing OAI-SearchBot if you want your site to appear in ChatGPT search results, even if you block GPTBot.

OpenAI’s documentation clarifies the difference. OAI-SearchBot controls inclusion in ChatGPT search results and respects robots.txt. ChatGPT-User handles user-initiated browsing and may not be governed by robots.txt in the same way.

Hostinger recommends checking server logs to see what’s actually hitting your site, then making blocking decisions based on your goals. If you’re concerned about server load, you can use CDN-level blocking. If you want to potentially increase your AI visibility, review current AI crawler user agents and allow only the specific bots that support your strategy.


Featured Image: BestForBest/Shutterstock

Google’s Mueller Explains ‘Page Indexed Without Content’ Error via @sejournal, @MattGSouthern

Google Search Advocate John Mueller responded to a question about the “Page Indexed without content” error in Search Console, explaining the issue typically stems from server or CDN blocking rather than JavaScript.

The exchange took place on Reddit after a user reported their homepage dropped from position 1 to position 15 following the error’s appearance.

What’s Happening?

Mueller clarified a common misconception about the cause of “Page Indexed without content” in Search Console.

Mueller wrote:

“Usually this means your server / CDN is blocking Google from receiving any content. This isn’t related to anything JavaScript. It’s usually a fairly low level block, sometimes based on Googlebot’s IP address, so it’ll probably be impossible to test from outside of the Search Console testing tools.”

The Reddit user had already attempted several diagnostic steps. They ran curl commands to fetch the page as Googlebot, checked for JavaScript blocking, and tested with Google’s Rich Results Test. Desktop inspection tools returned “Something went wrong” errors while mobile tools worked normally.

Mueller noted that standard external testing methods won’t catch these blocks.

He added:

“Also, this would mean that pages from your site will start dropping out of the index (soon, or already), so it’s a good idea to treat this as something urgent.”

The affected site uses Webflow as its CMS and Cloudflare as its CDN. The user reported the homepage had been indexing normally with no recent changes to the site.

Why This Matters

I’ve covered this type of problem repeatedly over the years. CDN and server configurations can inadvertently block Googlebot without affecting regular users or standard testing tools. The blocks often target specific IP ranges, which means curl tests and third-party crawlers won’t reproduce the problem.

I covered when Google first added “indexed without content” to the Index Coverage report. Google’s help documentation at the time noted the status means “for some reason Google could not read the content” and specified “this is not a case of robots.txt blocking.” The underlying cause is almost always something lower in the stack.

The Cloudflare detail caught my attention. I reported on a similar pattern when Mueller advised a site owner whose crawling stopped across multiple domains simultaneously. All affected sites used Cloudflare, and Mueller pointed to “shared infrastructure” as the likely culprit. The pattern here looks familiar.

More recently, I covered a Cloudflare outage in November that triggered 5xx spikes affecting crawling. That was a widespread incident. This case appears to be something more targeted, likely a bot protection rule or firewall setting that treats Googlebot’s IP addresses differently from other traffic.

Search Console’s URL Inspection tool and Live URL test remain the primary ways to identify these blocks. When those tools return errors while external tests pass, server-level blocking becomes the likely cause. Mueller made a similar point in August when advising on crawl rate drops, suggesting site owners “double-check what actually happened” and verify “if it was a CDN that actually blocked Googlebot.”

Looking Ahead

If you’re seeing the “Page Indexed without content” error, check the CDN and server configurations for rules that affect Googlebot’s IP ranges. Google publishes its crawler IP addresses, which can help identify whether security rules are targeting them.

The Search Console URL Inspection tool is the most reliable way to see what Google receives when crawling a page. External testing tools won’t catch IP-based blocks that only affect Google’s infrastructure.

For Cloudflare users specifically, check bot management settings, firewall rules, and any IP-based access controls. The configuration may have changed through automatic updates or new default settings rather than manual changes.