Google’s Mueller Calls Markdown-For-Bots Idea ‘A Stupid Idea’ via @sejournal, @MattGSouthern

Some developers have been experimenting with bot-specific Markdown delivery as a way to reduce token usage for AI crawlers.

Google Search Advocate John Mueller pushed back on the idea of serving raw Markdown files to LLM crawlers, raising technical concerns on Reddit and calling the concept “a stupid idea” on Bluesky.

What’s Happening

A developer posted on r/TechSEO, describing plans to use Next.js middleware to detect AI user agents such as GPTBot and ClaudeBot. When those bots hit a page, the middleware intercepts the request and serves a raw Markdown file instead of the full React/HTML payload.

The developer claimed early benchmarks showed a 95% reduction in token usage per page, which they argued should increase the site’s ingestion capacity for retrieval-augmented generation (RAG) bots.

Mueller responded with a series of questions.

“Are you sure they can even recognize MD on a website as anything other than a text file? Can they parse & follow the links? What will happen to your site’s internal linking, header, footer, sidebar, navigation? It’s one thing to give it a MD file manually, it seems very different to serve it a text file when they’re looking for a HTML page.”

On Bluesky, Mueller was more direct. Responding to technical SEO consultant Jono Alderson, who argued that flattening pages into Markdown strips out meaning and structure,

Mueller wrote:

“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”

Alderson argued that collapsing a page into Markdown removes important context and structure, and framed Markdown-fetching as a convenience play rather than a lasting strategy.

Other voices in the Reddit thread echoed the concerns. One commenter questioned whether the effort could limit crawling rather than enhance it. They noted that there’s no evidence that LLMs are trained to favor documents that are less resource-intensive to parse.

The original poster defended the theory, arguing LLMs are better at parsing Markdown than HTML because they’re heavily trained on code repositories. That claim is untested.

Why This Matters

Mueller has been consistent on this. In a previous exchange, he responded to a question from Lily Rayabout creating separate Markdown or JSON pages for LLMs. His position then was the same. He said to focus on clean HTML and structured data rather than building bot-only content copies.

That response followed SE Ranking’s analysis of 300,000 domains, which found no connection between having an llms.txt file and how often a domain gets cited in LLM answers. Additionally, Mueller has compared llms.txt to the keywords meta tag, a format major platforms haven’t documented as something they use for ranking or citations.

So far, public platform documentation hasn’t shown that bot-only formats, such as Markdown versions of pages, improve ranking or citations. Mueller raised the same objections across multiple discussions, and SE Ranking’s data found nothing to suggest otherwise.

Looking Ahead

Until an AI platform publishes a spec requesting Markdown versions of web pages, the best practice remains as it is. Keep HTML clean, reduce unnecessary JavaScript that blocks content parsing, and use structured data where platforms have documented schemas.

Google’s Crawl Team Filed Bugs Against WordPress Plugins via @sejournal, @MattGSouthern

Google’s crawl team has been filing bugs directly against WordPress plugins that waste crawl budget at scale.

Gary Illyes, Analyst at Google, shared the details on the latest Search Off the Record podcast. His team filed an issue against WooCommerce after identifying its add-to-cart URL parameters as a top source of crawl waste. WooCommerce picked up the bug and fixed it quickly.

Not every plugin developer has been as responsive. An issue filed against a separate action-parameter plugin is still sitting unclaimed. And Google says its outreach to the developer of a commercial calendar plugin that generates infinite URL paths fell on deaf ears.

What Google Found

The details come from Google’s internal year-end crawl issue report, which Illyes reviewed during the podcast with fellow Google Search Relations team member Martin Splitt.

Action parameters accounted for roughly 25% of all crawl issues reported in 2025. Only faceted navigation ranked higher, at 50%. Together, those two categories represent about three-quarters of every crawl issue Google flagged last year.

The problem with action parameters is that each one creates what appears to be a new URL by adding text like ?add_to_cart=true. Parameters can stack, doubling or tripling the crawlable URL space on a site.

Illyes said these parameters are often injected by CMS plugins rather than built intentionally by site owners.

The WooCommerce Fix

Google’s crawl team filed a bug report against the plugin, flagging the add-to-cart parameter behavior as a source of crawl waste affecting sites at scale.

Illyes describes how they identified the issue:

“So we would try to dig into like where are these coming from and then sometimes you can identify that perhaps these action parameters are coming from WordPress plug-ins because WordPress is quite a popular CMS content management system. And then you would find that yes, these plugins are the ones that add to cart and add to wish list.”

And then what you would do if you were a Gary is to try to see if they are open source in the sense that they have a repository where you can report bugs and issues and in both of these cases the answer was yes. So we would file issues against these uh plugins.”

WooCommerce responded and shipped a fix. Illyes noted the turnaround was fast, but other plugin developers with similar issues haven’t responded. Illyes didn’t name the other plugins.

He added:

“What I really, really loved is that the good folks at Woolcommerce almost immediately picked up the issue and they solved it.”

Why This Matters

This is the same URL parameter problem Illyes warned about before and continued flagging. Google then formalized its faceted navigation guidelines into official documentation and revised its URL parameter best practices.

The data shows those warnings and documentation updates didn’t solve the problem because the same issues still dominate crawl reports.

The crawl waste is often baked into the plugin layer. That creates a real bind for websites with ecommerce plugins. Your crawl problems may not be your fault, but they’re still your responsibility to manage.

Illyes said Googlebot can’t determine whether a URL space is useful “unless it crawled a large chunk of that URL space.” By the time you notice the server strain, the damage is already happening.

Google consistently recommends robots.txt, as blocking parameter URLs proactively is more effective than waiting for symptoms.

Looking Ahead

Google filing bugs against open-source plugins could help reduce crawl waste at the source. The full podcast episode with Illyes and Splitt is available with a transcript.

Google Updates Googlebot File Size Limit Docs via @sejournal, @MattGSouthern

Google updated its Googlebot documentation to clarify information about file size limits.

The change involves moving information about default file size limits from the Googlebot page to Google’s broader crawler documentation. Google also updated the Googlebot page to be more specific about Googlebot’s own limits.

What’s New

Google’s documentation changelog describes the update as a two-part clarification.

The default file size limits that previously lived on the Googlebot page now appear in the crawler documentation. Google said the original location wasn’t the most logical place because the limits apply to all of Google’s crawlers and fetchers, not just Googlebot.

With the defaults now housed in the crawler documentation, Google updated the Googlebot page to describe Googlebot’s specific file size limits more precisely.

The crawling infrastructure docs list a 15 MB default for Google’s crawlers and fetchers, while the Googlebot page now lists 2 MB for supported file types and 64 MB for PDFs when crawling for Google Search.

The crawler overview describes a default limit across Google’s crawling infrastructure, while the Googlebot page describes Google Search–specific limits for Googlebot. Each resource referenced in the HTML, such as CSS and JavaScript, is fetched separately.

Why This Matters

This fits a pattern Google has been running since late 2025. In November, Google migrated its core crawling documentation to a standalone site, separating it from Search Central. The reasoning was that Google’s crawling infrastructure serves products beyond Search, including Shopping, News, Gemini, and AdSense.

In December, more documentation followed, including faceted navigation guidance and crawl budget optimization.

The latest update continues that reorganization. The 15 MB file size limit was first documented in 2022, when Google added it to the Googlebot help page. Mueller confirmed at the time that the limit wasn’t new. It had been in effect for years. Google was just putting it on the record.

When managing crawl budgets or troubleshooting indexing on content-heavy pages, Google’s docs now describe the limits differently depending on where you look.

The crawling infrastructure overview lists 15 MB as the default for all crawlers and fetchers. The Googlebot page lists 2 MB for HTML and supported text-based files, and 64 MB for PDFs. Google’s changelog does not explain how these figures relate to one another.

Default limits now live in the crawler overview documentation, while Googlebot-specific limits are on the Googlebot page.

Looking Ahead

Google’s documentation reorganization suggests there will likely be more updates to the crawling infrastructure site in the coming months. By separating crawler-wide defaults from product-specific documentation, Google can more easily document new crawlers and fetchers as they are introduced.

WordPress Publishes AI Guidelines To Combat AI Slop via @sejournal, @martinibuster

WordPress published guidelines for using AI for coding plugins, themes, documentation, and media assets. The purpose of the guidelines, guided by five principles, is to keep WordPress contributions transparent, GPL-compatible, and human-accountable, while maintaining high quality standards for AI-assisted work.

The new guidelines lists the following five principles:

  1. “You are responsible for your contributions (AI can assist, but it isn’t a contributor).
  2. Disclose meaningful AI assistance in your PR description and/or Trac ticket comment.
  3. License compatibility matters: contributions must remain compatible with GPLv2-or-later, including AI-assisted output.
  4. Non-code assets count too (docs, screenshots, images, educational materials).
  5. Quality over volume: avoid low-signal, unverified “AI slop”; reviewers may close or reject work that doesn’t meet the bar.”

Transparency

The purpose of the transparency guidelines is to encourage contributors to disclose that AI was used and how it was used so that reviewers can be aware when evaluating the work.

License Compatibility And Tool Choice

Licensing is a big deal with WordPress because it’s designed to be a fully open source publishing platform under the GPLv2 licensing framework. Everything that’s made for WordPress, including plugins and themes, must also be open source. It’s an essential element of everything created with WordPress.

The guidelines specify that AI cannot be used if the output is not licensable under GPLv2.

It also states:

“Do not use tools whose terms forbid using their output in GPL-licensed projects or impose additional restrictions on redistribution.

Do not rely on tools to “launder” incompatible licenses. If an AI output reproduces non-free or incompatible code, it cannot be included.”

AI Slop

Of course, the guidelines address the issue of AI slop. In this case, AI slop is defined as hallucinated references (such as links or APIs that do not exist), overly complicated code where simpler solutions exist, and GitHub PRs that are generic or do not reflect actual testing or experience.

The AI Slop guidelines has recommendations of what they expect from contributors:

“Use AI to draft, then review yourself.

Submit PRs (or patches) that are small, concise and with atomic and well defined commit messages to make reviewing easier.

Run and document real tests.

Link to real Trac tickets, GitHub issues, or documentation that you have verified.”

The guidelines are clear that the WordPress contributors who are responsible for overseeing, reviewing, and deciding whether changes are accepted into a specific part of the project may close or reject contributions that they determine to be AI slop “with little added human insight.”

Takeaways

The new WordPress AI guidelines appear to be about preserving trust in the contribution process as AI becomes more common across development, documentation, and media creation. It in no way discourages the use of AI but rather encourages its use in a responsible manner.

Requiring disclosure, enforcing GPL compatibility, and giving maintainers the authority to reject low-quality submissions, the guidelines set boundaries that protect both the legal integrity of the WordPress project and the time of its reviewers.

Featured Image by Shutterstock/Ivan Moreno sl

LinkedIn Shares What Works For AI Search Visibility via @sejournal, @MattGSouthern

LinkedIn published findings from its internal testing on what drives visibility in AI-generated search results.

The company, reportedly among the most-cited sources in AI responses, shared what worked for improving its presence in LLMs and AI Overviews. For practitioners adjusting to AI search, this is a rare look at what a heavily-cited source tested and measured.

In a blog post, Inna Meklin, Director of Digital Marketing at LinkedIn, and Cassie Dell, Group Manager, Organic Growth at LinkedIn, detailed the tactics that got results.

Content Structure And Markup

LinkedIn found that how you organize content affects whether LLMs can extract and surface it. The authors wrote that headings and information hierarchy matter because “the more structured and logical your content is, the easier it is for LLMs to understand and surface.”

Semantic HTML markup also played a role, with clear structure helping LLMs interpret what each section is for. The authors called this “AI readability.”

The takeaway is that content structure isn’t just a UX consideration anymore. Proper heading hierarchy and clean markup may affect whether your content gets cited.

Expert Authorship And Timestamps

LinkedIn’s testing also pointed to credibility signals. The authors wrote:

“LLMs favor content that signals credibility and relevance, authored by real experts, clearly time-stamped, and written in a conversational, insight-driven style.”

Named authors with visible credentials and clear publication dates appeared to perform better in LinkedIn’s testing than anonymous or undated content.

The Measurement Change

LinkedIn added new KPIs alongside traffic for awareness-stage content, tracking citation share, visibility rate, and LLM mentions using AI visibility software. The company also said it’s creating a new traffic source in its internal analytics specifically for LLM-driven visits, and monitoring LLM bot behavior in CMS logs.

The authors acknowledged the measurement challenge:

“We simply couldn’t quantify how visibility within LLM responses impacts the bottom line.”

For teams still reporting traffic as the primary SEO metric, there’s a gap here. If non-brand informational content is increasingly consumed inside AI answers rather than on your site, traffic may undercount your actual reach.

Why This Matters

What caught my attention is how much this overlaps with what AI platforms themselves are saying.

SEJ’s Roger Montti recently interviewed Jesse Dwyer from Perplexity about what drives AI search visibility. Dwyer explained that Perplexity retrieves content at the sub-document level, pulling granular fragments rather than reasoning over full pages. That means how you structure content affects whether it gets extracted at all.

LinkedIn’s findings point in the same direction from the publisher side. Structure and markup matter because LLMs parse content in fragments. The credibility signals LinkedIn identified, like expert authorship and timestamps, appear to affect which fragments get surfaced.

When a heavily-cited source and an AI search platform land on the same conclusions independently, you have something to work with beyond speculation.

Looking Ahead

The authors are adopting a different mindset that practitioners can learn from:

“We are moving away from ‘search, click, website’ thinking toward a new model: Be seen, be mentioned, be considered, be chosen.”

LinkedIn indicated Part 3 of the series will include a guide on optimizing owned content for AI search, covering answer blocks and explicit definitions.

Controversial Proposal To Label Sections Of AI Generated Content via @sejournal, @martinibuster

A new proposal was published for creating an HTML attribute that can be helpful for notifying crawlers what part of a web page is generated by AI. The proposal is quickly becoming relevant because of new rules coming into effect in Europe this summer, but some are questioning whether this is the right solution to that problem.

AI Disclosure

The proposal was created by David E. Weekly (LinkedIn profile), who noted that there are currently proposals that provide a more general signal that an entire web page is AI generated but nothing that labels only a section of a web page in a page that is otherwise authored by a human.

Weekly’s proposal acknowledges the reality that many web pages are partially AI generated. One example is the AI generated summaries of news content. The proposal specifically mentions news sites that contain a sidebar with AI generated summaries.

The proposal suggests creating an HTML attribute that can be applied at the section level using the

Google Shows How To Get More Traffic From Top Stories Feature via @sejournal, @martinibuster

Google added new documentation to Search Central covering their Preferred Sources program that helps news websites get into the Top Stories feature. The documentation explains what publishers can do to make it more likely to be ranked in Top Stories and get more traffic.

Top Stories

Given that Top Stories is about breaking news, freshness may be a factor for ranking.Top Stories surfaces local news as well as breaking news. Schema structured data is not necessary to rank in Top Stories but adding Schema.org Article structured data helps Google better understand what the page is about. While the Top Stories display resembles Google’s carousel feature, the ItemList structured data for Carousel displays has no effect.

Source Preferences Tool

The preferred sources program is available only to English language web pages globally. Google also states that sites that are already in the Preferred Sources tool are eligible to deep link to encourage users to add your site as a preferred source. https://www.google.com/preferences/source

According to Google:

If your site appears in the source preferences tool, you can use the following methods to guide your readers to select your site as a preferred source:

Add the deeplink to your social posts or promotions. Use the following URL format, which takes users directly to your site in the source preferences tool:

https://google.com/preferences/source?q=Your_Website's_URL

For example, if your site is https://example.com, use the following URL:

https://google.com/preferences/source?q=example.com

Do What You Can For More Traffic From Top Stories

Getting traffic out of Google appears to be getting increasingly difficult. So it’s useful to take advantage of every available opportunity.

Featured Image by Shutterstock/RealPeopleStudio


 
		
	
WordPress Announces AI Agent Skill For Speeding Up Development via @sejournal, @martinibuster

WordPress announced wp-playground, a new AI agent skill designed to be used with the Playground CLI so AI agents can run WordPress for testing and check their work as they write code. The skill helps agents test code quickly while they work.

Playground CLI

Playground is a WordPress sandbox that enables users to run a full WordPress site without setting it all up on a traditional server. It is used for testing plugins, creating and adjusting themes, and experimenting safely without affecting a live site.

The new AI agent skill is for use with Playground CLI, which runs locally and requires knowledge of terminal commands, Node.js, and npm to manage local WordPress environments.

The wp-playground skill starts WordPress automatically and determines where generated code should exist inside the installation. The skill then mounts the code into the correct directory, which allows the agent to move directly from generated code to a running the WordPress site without manual setup.

Once WordPress is running, the agent can test behavior and verify results using common tools. In testing, agents interacted with WordPress through tools like curl and Playwright, checked outcomes, applied fixes, and then re-tested using the same environment. This process creates a repeatable loop where the agent can confirm whether a change works before making further changes.

The skill also includes helper scripts that manage startup and shutdown. These scripts reduce the time it takes for WordPress to become ready for testing from about a minute to only a few seconds. The Playground CLI can also log into WP-Admin automatically, which removes another manual step during testing.

The creator of the AI agent skill, Brandon Payton, is quoted explaining how it works:

“AI agents work better when they have a clear feedback loop. That’s why I made the wp-playground skill. It gives agents an easy way to test WordPress code and makes building and experimenting with WordPress a lot more accessible.”

The WordPress AI agent skill release also introduces a new GitHub repository dedicated to hosting WordPress agent skill. Planned ideas include persistent Playground sites tied to a project directory, running commands against existing Playground instances, and Blueprint generation.

Featured Image by Shutterstock/Here

AI Recommendations Change With Nearly Every Query: Sparktoro via @sejournal, @MattGSouthern

AI tools produce different brand recommendation lists nearly every time they answer the same question, according to a new report from SparkToro.

The data showed a <1-in-100 chance that ChatGPT or Google>

Rand Fishkin, SparkToro co-founder, conducted the research with Patrick O’Donnell from Gumshoe.ai, an AI tracking startup. The team ran 2,961 prompts across ChatGPT, Claude, and Google Search AI Overviews (with AI Mode used when Overviews didn’t appear) using hundreds of volunteers over November and December.

What The Data Found

The authors tested 12 prompts requesting brand recommendations across categories, including chef’s knives, headphones, cancer care hospitals, digital marketing consultants, and science fiction novels.

Each prompt was run 60-100 times per platform. Nearly every response was unique in three ways: the list of brands presented, the order of recommendations, and the number of items returned.

Fishkin summarized the core finding:

“If you ask an AI tool for brand/product recommendations a hundred times nearly every response will be unique.”

Claude showed slightly higher consistency in producing the same list twice, but was less likely to produce the same ordering. None of the platforms came close to the authors’ definition of reliable repeatability.

The Prompt Variability Problem

The authors also examined how real users write prompts. When 142 participants were asked to write their own prompts about headphones for a traveling family member, almost no two prompts looked similar.

The semantic similarity score across those human-written prompts was 0.081. Fishkin compared the relationship to:

“Kung Pao Chicken and Peanut Butter.”

The prompts shared a core intent but little else.

Despite the prompt diversity, the AI tools returned brands from a relatively consistent consideration set. Bose, Sony, Sennheiser, and Apple appeared in 55-77% of the 994 responses to those varied headphone prompts.

What This Means For AI Visibility Tracking

The findings question the value of “AI ranking position” as a metric. Fishkin wrote: “any tool that gives a ‘ranking position in AI’ is full of baloney.”

However, the data suggests that how often a brand appears across many runs of similar prompts is more consistent. In tight categories like cloud computing providers, top brands appeared in most responses. In broader categories like science fiction novels, the results were more scattered.

This aligns with other reports we’ve covered. In December, Ahrefs published data showing that Google’s AI Mode and AI Overviews cite different sources 87% of the time for the same query. That report focused on a different question: the same platform but with different features. This SparkToro data examines the same platform and prompt, but with different runs.

The pattern across these studies points in the same direction. AI recommendations appear to vary at every level, whether you’re comparing across platforms, across features within a platform, or across repeated queries to the same feature.

Methodology Notes

The research was conducted in partnership with Gumshoe.ai, which sells AI tracking tools. Fishkin disclosed this and noted that his starting hypothesis was that AI tracking would prove “pointless.”

The team published the full methodology and raw data on a public mini-site. Survey respondents used their normal AI tool settings without standardization, which the authors said was intentional to capture real-world variation.

The report is not peer-reviewed academic research. Fishkin acknowledged methodological limitations and called for larger-scale follow-up work.

Looking Ahead

The authors left open questions about how many prompt runs are needed to obtain reliable visibility data and whether API calls yield the same variation as manual prompts.

When assessing AI tracking tools, the findings suggest you should ask providers to demonstrate their methodology. Fishkin wrote:

“Before you spend a dime tracking AI visibility, make sure your provider answers the questions we’ve surfaced here and shows their math.”


Featured Image: NOMONARTS/Shutterstock

Google Analytics To Become A Growth Engine For Business via @sejournal, @brookeosmundson

On the first episode of the Google Ads Decoded podcast, host Ginny Marvin sat down with Eleanor Stribling, Group Product Manager for Google Analytics.

In the episode, Stribling noted an ambitious two-phase vision for the GA4 platform.

After acknowledging GA4’s rough transition from Universal Analytics, especially for marketers, she shared where the platform is headed over the next few years.

What Stribling Shared on Google Ads Decoded

After discussing the foundations of the importance of data strength, Stribling broke down the vision of GA4 into two timelines.

Over the next year or two, GA4 will focus on becoming a cross-channel, full-funnel measurement platform. She states the goal of this is:

To be that one place where you can really understand the impact of your media with data that makes sense and resonates and that you can take and make a business decision with.

This means moving beyond outdated siloed channel reporting to understand how all your media works together across the complete customer journey.

The longer-term vision she shared looks 3+ years beyond what GA4 is capable of today.

Stribling says GA4 will become a decision-making platform for businesses, essentially a growth engine that translates data into business outcomes.

“Making a world-class analyst available to every single person,” is how Stribling described this vision. AI will be the layer that makes this shift possible.

It will be interesting to see how Google’s vision for this will build out over the next few years. Considering they already have the reporting visualization tool, Looker Studio, my prediction is that there will be better or easier integration into it.

Beyond just better integration with Looker Studio, trying to become a growth engine or decision-making platform sounds like they’re trying to set themselves apart from the competition of other reporting platforms out there today, like Funnel or Power BI.

What’s Coming in the Advertising Workspace

Stribling pointed to the Advertising Workspace in GA4 as an area where marketers will see significant changes over the next year.

Expect improvements to reporting that better illustrate the user journey. Google is also building out budgeting and planning tools that let you upload cost data from other media buys and create spend plans based on your goals.

The platform will also suggest optimizations for in-flight campaigns, offering AI-powered recommendations to help you get closer to your campaign objectives.

Personally, I’m excited to see if they make the Explorer report building any more intuitive for marketers. I think it’s highly under-utilized right now because you’re essentially starting from a blank slate. It takes time, effort, and the right type of mindset to really sit down and try to re-learn an Analytics platform.

Why This Matters & Looking Ahead

GA4’s reputation amongst marketers hasn’t been stellar since it replaced Universal Analytics. In the podcast episode, Marvin reiterated that as a long-time marketer:

The platform felt designed for developers rather than marketers, and the transition left many advertisers frustrated.

Stribling’s comments signal that Google has been listening. Google seems to be heavily investing in making GA4 more accessible, while simultaneously building towards a future where the platform goes beyond its traditional reporting.

The two-phase vision shared is ambitious, particularly the long-term vision of GA4 as a business decision engine. Whether Google will move full steam ahead on this remains up in the air, but it seems that the direction GA4 is going is beyond just a measurement tool.

For now, the practical move for marketers is to keep working on your data strength. This includes auditing your tagging setup, testing the existing AI features that already exist today, and reviewing key conversion and event data.