AI Bots Don’t Need Markdown Pages

Markdown is a lightweight, text-only language easily readable by both humans and machines. One of the newest search visibility tactics is to serve a Markdown version of web pages to generative AI bots. The aim is to assist the bots in fetching the content by reducing crawl resources, thereby encouraging them to access the page.

I’ve seen isolated tests by search optimizers showing an increase in visits from AI bots after Markdown, although none translated into better visibility. A few off-the-shelf tools, such as Cloudflare’s, make implementing Markdown easier.

Serving separate versions of a page to people and bots is not new. Called “cloaking,” the tactic is long considered spam under Google’s Search Central guidelines.

The AI scenario is different, however, because it’s not an attempt to manipulate algorithms, but rather making it easier for bots to access and read a page.

Effective?

That doesn’t make the tactic effective, however. Think carefully before implementing it, for the following reasons.

  • Functionality. The Markdown version of a page may not function correctly. Buttons, in particular, could fail.
  • Architecture. Markdown pages can lose essential elements, such as a footer, header, internal links (“related products”), and user-generated reviews via third-party providers. The effect is to remove critical context, which serves as a trust signal for large language models.
  • Abuse. If the Markdown tactic becomes mainstream, sites will inevitably inject unique product data, instructions, or other elements for AI bots only.

Creating unique pages for bots often dilutes essential signals, such as link authority and branding. A much better approach has always been to create sites that are equally friendly to humans and bots.

Moreover, a goal of LLM agents is to interact with the web as humans do. Serving different versions serves no purpose.

Representatives of Google and Bing echoed this sentiment a few weeks ago. John Mueller is Google’s senior search analyst:

LLMs have trained on – read & parsed – normal web pages since the beginning, it seems a given that they have no problems dealing with HTML. Why would they want to see a page that no user sees?

Fabrice Canel is Bing’s principal product manager:

… really want to double crawl load? We’ll crawl anyway to check similarity. Non-user versions (crawlable AJAX and like) are often neglected, broken. Human eyes help fix people- and bot-viewed content.

AI-SEO Is A Change Management Problem via @sejournal, @Kevin_Indig

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

AI-SEO transformation will fail at the alignment layer, not the tactics layer. 25 years of transformation research, spanning 10,800+ participants across industries, reveals that the gap between successful and failed initiatives isn’t technical skill. It’s organizational readiness.

What you’ll get:

  • Why AI SEO implementation challenges are people and process problems, not technical ones.
  • The specific alignment failures that kill AI-SEO initiatives before tactics ever get tested.
  • A sequenced approach that transforms you from channel executor to organizational translator.

The underlying infrastructure of AI SEO – retrieval-augmented generation, citation selection, answer synthesis – operates on different principles than the crawl-index-rank paradigm SEO teams previously mastered. And unlike past shifts, the old playbook doesn’t bend to fit the new reality.

AI SEO is different. It’s not just an algorithm update: This is a search product change and a user behavior movement.

Our classic instinct is to respond with tactics: prompt optimization, entity markup increase, LLM-specific structured data, citation acquisition strategies.

These aren’t wrong. But long-term, it’s likely AI SEO strategies will fail, and the reason isn’t tactical incompetence or lack of staying up-to-date and flexible. It’s internal organizational misalignment.

Organizations with structured change management are 8× more likely to meet transformation objectives. The same principle applies to AI-SEO. (Image Credit: Kevin Indig)

Your marketing team – and your executive team – is being asked to transform their understanding of SEO during a period of unprecedented change fatigue. Those who have survived two decades of algorithm updates are expertly adaptable, but reeducation is required because LLMs are a new product, not just another layer of search.

And this, of course, is the alignment layer fail.

Image Credit: Kevin Indig

In AI SEO, misalignment has specific symptoms:

  1. Conflicting definitions of success: One stakeholder wants “rankings in ChatGPT.” Another wants brand mentions. A third wants citation links. A fourth wants traffic recovery. Every experiment gets judged against a different standard, and no one has agreed which matters most or how they’ll be measured. (Although our AI Overview and AI Mode studies confirm brand mentions are more valuable than citations.)
  2. Metrics mismatch with leadership expectations: Executives ask for increased traffic in a growing zero-click environment. Classic SEO reports on influence metrics; leadership sees declining sessions and questions the investment. In our December 2025 Growth Memo reader survey, 84% of respondents said they feel their current LLM visibility measurement approach is inaccurate. Teams can’t prove value because no one has agreed on how value would be proven.
  3. Turf fragmentation: AI SEO touches SEO, content, brand, product, PR, and (at times) legal. Without explicit ownership and a baseline, agreed-upon understanding of your brand’s AI SEO approach, each team runs experiments in its silo. No one synthesizes learning. Conflicting tactics cancel each other out.
  4. Premature tactics without a shared foundation: This looks like “Let’s test prompts” without agreeing on what success means; “Let’s scale AI content to mitigate click loss” without understanding AI-assisted versus AI-generated content limits; “Let SEO handle AI” while product, PR, and legal stay uninvolved.
  5. Panic-testing instead of strategic reorientation: Teams deploy short-term tactics reactively rather than reorienting the whole ship for better long-term outcomes.

This is classic change management failure: unclear mandate, fragmented ownership, mismatched incentives. No amount of tactical excellence or smart strategy pivots can fix it.

Layering AI SEO tactics + tools on top without structured change management compounds fatigue and accelerates burnout. The “scrappy resilience” that has carried the industry in the past can’t be assumed to instantly apply to this new channel without a strategic transition.

A baseline understanding of organizational change management matters in the AI SEO era … because most organizational transformations fail or underperform.

Your AI-SEO initiative is no different, even if changes in SEO seem contained to your marketing and product teams and stakeholders, rather than the larger organization or brand as a whole.

I’d argue that AI SEO falls into the category of industry transformation that affects your brand and org. And from decades of research, failure and underperformance are the statistical norm for these big transitions – seasoned leaders know this already. No wonder they’re skeptical of your AI SEO plans.

One McKinsey survey found fewer than one-third of teams succeed at both improving performance and sustaining improvements during significant shifts. BCG’s forensic analysis of 825 executives across 70 companies found transformation success at 30%.

Multiple major consulting firms’ independent research shows that most change transformations underperform.

Assuming that tactical excellence alone will carry you – without strategic reeducation and thoughtful change management as our industry shifts – is assuming you’re the exception to the rule.

The correlation between the quality of managing a big shift and your project’s success is dramatic:

Image Credit: Kevin Indig

The gap between excellent and poor represents a nearly 8x improvement. Even the jump from poor to fair quadruples success rates.

BCG’s 2020 analysis reinforces this from a different angle, noting six critical factors that increase successful transformation odds from 30% to 80%:

  • Integrated strategy with clear goals: This is where a carefully crafted AI SEO strategy comes in, one that not only outlines growth goals, but also clear testing and what successful outcomes look like.
  • Leadership commitment from the CEO through middle management: If you’re a consultant or agency, this step can’t be skipped, especially if they have an in-house team assisting in executing the strategy.
  • High-caliber talent deployment: Or I would argue, high-quality reeducation of existing talent – make sure all operators have a baseline shared understanding of what has changed about SEO, how LLM outputs work, what the brand’s goals are, and how it will be executed.
  • Flexible, agile governance: Teams should have the ability to deal with individual challenges without losing sight of the broader goals, including removing barriers quickly.
  • Effective monitoring: Establish core, agreed-upon KPIs to measure what winning would look like, and note what actions were taken when.
  • Modern/updated technology: Your SEO team needs the right tools to succeed, but they also need to know how to use them effectively. Don’t skip allotting time for integration of new workflows and AI monitoring systems.

Marketing teams that treat AI-SEO simply as a technical project to execute or tactics to update are leaving an 8× multiplier on the table.

  • BCG’s 2024 AI implementation study found that roughly 70% of change implementation hurdles relate to people and processes. Only about 10% of challenges were purely technical.
  • A 2024 Kyndryl survey found that while 95% of senior executives reported investing in AI, only 14% felt they had successfully aligned workforce strategies.

Your brand’s ability to test, update tactics, learn AI workflows, implement structured data, and optimize for LLM retrieval is not the bottleneck you need to be concerned about.

The real concern is whether your team – leadership, cross-functional team partners, and frontline executors/operators – is aligned on what AI SEO means, why and how you’re making changes from your classic SEO approach, what success looks like, and who owns outcomes.

Active and visible executive sponsorship is the No. 1 contributor to change success, cited 3-to-1 more frequently than any other factor, according to 25 years of benchmarking research by Prosci. Your first step as the person leading the AI SEO charge for your brand (or across your clients) is to earn executive buy-in.

But the head of SEO cannot transform a brand’s understanding and approach to AI SEO alone. Bain’s 2024 research emphasized that successful transformations “drive change from the middle of the organization out.”

Keep in mind, financial benefits can compound quickly: One research analysis of 600 organizations found “change accelerators” experience greater revenue growth than companies with below-average change effectiveness.

Image Credit: Kevin Indig

Alignment isn’t just a feeling; it’s observable. You’ll know when you get there:

  • Stakeholders can talk through AI SEO without hyperfocusing on tools.
  • Teams agree on what to stop prioritizing (not just what to start).
  • Cross-functional partners have explicit ownership stakes.

Alignment isn’t happening when:

  • Everyone is good with “experimenting with” or “investing in” LLM visibility, but no one owns outcomes.
  • Success gets retroactively defined, or
  • Leadership asks, “What happened to traffic?” when you report influence metrics.

Noah Greenberg, CEO at Stacker, outlined this pretty clearly in a recent LinkedIn post: Step 0 in your AI SEO transformation is to become the expert.

Screenshot from LinkedIn by Kevin Indig, February 2026

New responsibilities:

  • Translating new, confusing AI-based search concepts into plain language (see this clever LinkedIn post by Lilly Ray as a perfect illustration).
  • Educating stakeholders on the structural differences between classic search engines and LLM retrieval – guiding teams to explain why your CEO doesn’t see the same LLM output when they look up the brand vs. what you’re reporting.
  • Explaining the tradeoffs, not just opportunities.
  • Setting expectations executives won’t like at first, but need to hear (traffic loss or slower growth than in years prior).

This is uncomfortable. Less direct control. More indirect influence. Higher stakes.

Your mindset – as the change agent for your clients or organization – centers on three principles:

  1. Honesty over confidence. What we don’t know: the precise value of an AI mention. What we do know: your brand not appearing for related topics is a measurable miss.
  2. Progress over perfection. Alignment doesn’t require certainty. It requires shared uncertainty, agreeing on what you’re testing and how you’ll learn.
  3. Translation over broadcasting. The same strategic message needs adaptation for ICs (how their work changes), managers (how they report success), and executives (how budgets should shift). Uniform communication fails; translated communication scales.

Do this in order:

  1. Write the one-sentence AI SEO mandate for your organization. If you can’t explain AI SEO in one sentence to leadership, you’re not ready to execute.
  2. Complete a high-level SWOT. Identify where your organization has existing strengths and gaps. The Brand SEO scorecard from The Great Decoupling will walk you through.
  3. Replace or supplement legacy KPIs. Add LLM visibility estimates alongside classic KPIs (rankings, sessions) to start the transition. Reporting both builds the case for the shift without abandoning the old model cold.
  4. Name cross-functional owners explicitly. Who owns brand mentions in LLM outputs: SEO, PR, or brand? Who owns citation link acquisition: SEO or content? Ambiguity is the enemy.
  5. Provide baseline education at every level. ICs need to understand how LLM retrieval differs from crawl-index-rank. Executives need to understand why slowed organic traffic or zero-click growth doesn’t mean zero impact.
  6. Kill one SEO practice without a fight. Success means everyone understands why, and you don’t receive pushback. If you can’t retire one outdated tactic without internal conflict, you haven’t achieved alignment.
  7. Only then change workflows and tactics. Tactics deployed on an unaligned organization waste resources and burn credibility. Tactics deployed on an aligned organization compound advantage.

Featured Image: Paulo Bobita/Search Engine Journal

Web Almanac Data Reveals CMS Plugins Are Setting Technical SEO Standards (Not SEOs) via @sejournal, @chrisgreenseo

If more than half the web runs on a content management system, then the majority of technical SEO standards are being positively shaped before an SEO even starts work on it. That’s the lens I took into the 2025 Web Almanac SEO chapter (for clarity, I co-authored the 2025 Web Almanac SEO chapter referenced in this article).

Rather than asking how individual optimization decisions influence performance, I wanted to understand something more fundamental: How much of the web’s technical SEO baseline is determined by CMS defaults and the ecosystems around them.

SEO often feels intensely hands-on – perhaps too much so. We debate canonical logic, structured data implementation, crawl control, and metadata configuration as if each site were a bespoke engineering project. But when 50%+ of pages in the HTTP Archive dataset sit on CMS platforms, those platforms become the invisible standard-setters. Their defaults, constraints, and feature rollouts quietly define what “normal” looks like at scale.

This piece explores that influence using 2025 Web Almanac and HTTP Archive data, specifically:

  • How CMS adoption trends track with core technical SEO signals.
  • Where plugin ecosystems appear to shape implementation patterns.
  • And how emerging standards like llms.txt are spreading as a result.

The question is not whether SEOs matter. It’s whether we’ve been underestimating who sets the baseline for the modern web.

The Backbone Of Web Design

The 2025 CMS chapter of the Web Almanac saw a milestone hit with CMS adoption; over 50% of pages are on CMSs. In case you were unsold on how much of the web is carried by CMSs, over 50% of 16 million websites is a significant amount.

Screenshot from Web Almanac, February 2026

With regard to which CMSs are the most popular, this again may not be surprising, but it is worth reflecting on with regard to which has the most impact.

Image by author, February 2026

WordPress is still the most used CMS, by a long way, even if it has dropped marginally in the 2024 data. Shopify, Wix, Squarespace, and Joomla trail a long way behind, but they still have a significant impact, especially Shopify, on ecommerce specifically.

SEO Functions That Ship As Defaults In CMS Platforms

CMS platform defaults are important, this – I believe – is that a lot of basic technical SEO standards are either default setups or for the relatively small number of websites that have dedicated SEOs or people who at least build to/work with SEO best practice.

When we talk about “best practice,” we’re on slightly shaky ground for some, as there isn’t a universal, prescriptive view on this one, but I would consider:

  • Descriptive “SEO-friendly” URLs.
  • Editable title and meta description.
  • XML sitemaps.
  • Canonical tags.
  • Meta robots directive changing.
  • Structured data – at least a basic level.
  • Robots.txt editing.

Of the main CMS platforms, here is what they – self-reportedly – have as “default.” Note: For some platforms – like Shopify – they would say they’re SEO-friendly (and to be honest, it’s “good enough”), but many SEOs would argue that they’re not friendly enough to pass this test. I’m not weighing into those nuances, but I’d say both Shopify and those SEOs make some good points.

CMS SEO-friendly URLs Title & meta description UI XML sitemap Canonical tags Robots meta support Basic structured data Robots.txt
WordPress Yes Partial (theme-dependent) Yes Yes Yes Limited (Article, BlogPosting) No (plugin or server access required)
Shopify Yes Yes Yes Yes Limited Product-focused Limited (editable via robots.txt.liquid, constrained)
Wix Yes Guided Yes Yes Limited Basic Yes (editable in UI)
Squarespace Yes Yes Yes Yes Limited Basic No (platform-managed, no direct file control)
Webflow Yes Yes Yes Yes Yes Manual JSON-LD Yes (editable in settings)
Drupal Yes Partial (core) Yes Yes Yes Minimal (extensible) Partial (module or server access)
Joomla Yes Partial Yes Yes Yes Minimal Partial (server-level file edit)
Ghost Yes Yes Yes Yes Yes Article No (server/config level only)
TYPO3 Yes Partial Yes Yes Yes Minimal Partial (config or extension-based)

Based on the above, I would say that most SEO basics can be covered by most CMSs “out of the box.” Whether they work well for you, or you cannot achieve the exact configuration that your specific circumstances require, are two other important questions – ones which I am not taking on. However, it often comes down to these points:

  1. It is possible for these platforms to be used badly.
  2. It is possible that the business logic you need will break/not work with the above.
  3. There are many more advanced SEO features that aren’t out of the box, that are just as important.

We are talking about foundations here, but when I reflect on what shipped as “default” 15+ years ago, progress has been made.

Fingerprints Of Defaults In The HTTP Archive Data

Given that a lot of CMSs ship with these standards, do these SEO defaults correlate with CMS adoption? In many ways, yes. Let’s explore this in the HTTP Archive data.

Canonical Tag Adoption Correlates With CMS

Combining canonical tag adoption data with (all) CMS adoption over the last four years, we can see that for both mobile and desktop, the trends seem to follow each other pretty closely.

Image by author, February 2026
Image by author, February 2026

Running a simple Pearson correlation over these elements, we can see this strong correlation even clearer, with canonical tag implementation and the presence of self-canonical URLs.

Image by author, February 2026

What differs is the mobile correlation of canonicalized URLs; that seems to be a negative correlation on mobile and a lower (but still positive) correlation on desktop. A drop in canonicalized pages is largely causing this negative correlation, and the reasons behind this could be many (and harder to be sure of).

Canonical tags are a crucial element for technical SEO; their continued adoption does certainly seem to track the growth in CMS use, too.

Schema.org Data Types Correlate With CMS

Schema.org types against CMS adoption show similar trends, but are less definitive overall. There are many different types of Schema.org, but if we plot CMS adoption against the ones most common to SEO concerns, we can observe a broadly rising picture.

Image by author, February 2026

With the exception of Schema.org WebSite, we can see CMS growth and structured data following similar trends.

But we must note that Schema.org adoption is quite considerably lower than CMSs overall. This could be due to most CMS defaults being far less comprehensive with Schema.org. When we look at specific CMS examples (shortly), we’ll see far-stronger links.

Schema.org implementation is still mostly intentional, specialist, and not as widespread as it could be. If I were a search engine or creating an AI Search tool, would I rely on universal adoption of these, seeing the data like this? Possibly not.

Robots.txt

Given that robots.txt is a single file that has some agreed standards behind it, its implementation is far simpler, so we could anticipate higher levels of adoption than Schema.org.

The presence of a robots.txt is pretty important, mostly to limit crawl of search engines to specific areas of the site. We are starting to see an evolution – we noted in the 2025 Web Almanac SEO chapter – that the robots.txt is used even more as a governance piece, rather than just housekeeping. A key sign that we’re using our key tools differently in the AI search world.

But before we consider the more advanced implementations, how much of a part does a CMS have in ensuring a robots.txt is present? Looks like over the last four years, CMS platforms are driving a significant amount more of robots.txt files serving a 200 response:

Image by author, February 2026

What is more curious, however, is when you consider the file of the robots.txt files. Non-CMS platforms have robots.txt files that are significantly larger.

Image by author, February 2026

Why could this be? Are they more advanced in non-CMS platforms, longer files, more bespoke rules? Most probably in some cases, but we’re missing another impact of a CMSs standards – compliant (valid) robots.txt files.

A lot of robots.txt files serve a valid 200 response, but often they’re not txt files, or they’re redirecting to 404 pages or similar. When we limit this list to only files that contain user-agent declarations (as a proxy), we see a different story.

Image by author, February 2026

Approaching 14% of robots.txt files served on non-CMS platforms are likely not even robots.txt files.

A robots.txt is easy to set up, but it is a conscious decision. If it’s forgotten/overlooked, it simply won’t exist. A CMS makes it more likely to have a robots.txt, and what’s more, when it is in place, it makes it easier to manage/maintain – which IS key.

WordPress Specific Defaults

CMS platforms, it seems, cover the basics, but more advanced options – which still need to be defaults – often need additional SEO tools to enable.

Interrogating WordPress-specific sites with the HTTP Archive data will be easiest as we get the largest sample, and the Wapalizer data gives a reliable way to judge the impact of WordPress-specific SEO tools.

From the Web Almanac, we can see which SEO tools are the most installed on WordPress sites.

Screenshot from Web Almanac, February 2026

For anyone working within SEO, this is unlikely to be surprising. If you are an SEO and worked on WordPress, there is a high chance you have used either of the top three. What IS worth considering right now is that while Yoast SEO is by far the most prevalent within the data, it is seen on barely over 15% of sites. Even the most popular SEO plugin on the most popular CMS is still a relatively small share.

Of these top three plugins, let’s first consider what the differences of their “defaults” are. These are similar to some of WordPress’s, but we can see many more advanced features that come as standard.

SEO Capability All-in-One SEO Yoast SEO Rank Math
Title tag control Yes (global + per-post) Yes Yes
Meta description control Yes Yes Yes
Meta robots UI Yes (index/noindex etc.) Yes Yes
Default meta robots output Explicit index,follow Explicit index,follow Explicit index,follow
Canonical tags Auto self-canonical Auto self-canonical Auto self-canonical
Canonical override (per URL) Yes Yes Yes
Pagination canonical handling Limited Historically opinionated More configurable
XML sitemap generation Yes Yes Yes
Sitemap URL filtering Basic Basic More granular
Inclusion of noindex URLs in sitemap Possible by default Historically possible Configurable
Robots.txt editor Yes (plugin-managed) Yes Yes
Robots.txt comments/signatures Yes Yes Yes
Redirect management Yes Limited (free) Yes
Breadcrumb markup Yes Yes Yes
Structured data (JSON-LD) Yes (templated) Yes (templated) Yes (templated, broad)
Schema type selection UI Yes Limited Extensive
Schema output style Plugin-specific Plugin-specific Plugin-specific
Content analysis/scoring Basic Heavy (readability + SEO) Heavy (SEO score)
Keyword optimization guidance Yes Yes Yes
Multiple focus keywords Paid Paid Free
Social metadata (OG/Twitter) Yes Yes Yes
Llms.txt generation Yes – enabled by default Yes – one-check enable Yes – one-check enable
AI crawler controls Via robots.txt Via robots.txt Via robots.txt

Editable metadata, structured data, robots.txt, sitemaps, and, more recently, llms.txt are the most notable. It is worth noting that a lot of the functionality is more “back-end,” so not something we’d be as easily able to see in the HTTP Archive data.

Structured Data Impact From SEO Plugins

We can see (above) that structured data implementation and CMS adoption do correlate; what is more interesting here is to understand where the key drivers themselves are.

Viewing the HTTP Archive data with a simple segment (SEO plugins vs. no SEO plugins), from the most recent scoring paints a stark picture.

Image by author, February 2026

When we limit the Schema.org @types to the most associated with SEO, it is really clear that some structured data types are pushed really hard using SEO plugins. They are not completely absent. People may be using lesser-known plugins or coding their own solutions, but ease of implementation is implicit in the data.

Robots Meta Support

Another finding from the SEO Web Almanac 2025 chapter was that “follow” and “index” directives were the most prevalent, even though they’re technically redundant, as having no meta robots directives is implicitly the same thing.

Screenshot from Web Almanac 2025, February 2026

Within the chapter number crunching itself, I didn’t dig in much deeper, but knowing that all major SEO WordPress plugins have “index,follow” as default, I was eager to see if I could make a stronger connection in the data.

Where SEO plugins were present on WordPress, “index, follow” was set on over 75% of root pages vs. <5% of WordPress sites without SEO plugins>

Image by author, February 2026

Given the ubiquity of WordPress and SEO plugins, this is likely a huge contributor to this particular configuration. While this is redundant, it isn’t wrong, but it is – again – a key example of whether one or more of the main plugins establish a de facto standard like this, it really shapes a significant portion of the web.

Diving Into LLMs.txt

Another key area of change from the 2025 Web Almanac was the introduction of the llms.txt file. Not an explicit endorsement of the file, but rather a tacit acknowledgment that this is an important data point in the AI Search age.

From the 2025 data, just over 2% of sites had a valid llms.txt file and:

  • 39.6% of llms.txt files are related to All-in-One SEO.
  • 3.6% of llms.txt files are related to Yoast SEO.

This is not necessarily an intentional act by all those involved, especially as Rank Math enables this by default (not an opt-in like Yoast and All-in-One SEO).

Image by author, February 2026

Since the first data was gathered on July 25, 2025 if we take a month-by-month view of the data, we can see further growth since. It is hard not to see this as growing confidence in this markup OR at least, that it’s so easy to enable, more people are likely hedging their bets.

Conclusion

The Web Almanac data suggests that SEO, at a macro level, moves less because of individual SEOs and more because WordPress, Shopify, Wix, or a major plugin ships a default.

  • Canonical tags correlate with CMS growth.
  • Robots.txt validity improves with CMS governance.
  • Redundant “index,follow” directives proliferate because plugins make them explicit.
  • Even llms.txt is already spreading through plugin toggles before it even gets full consensus.

This doesn’t diminish the impact of SEO; it reframes it. Individual practitioners still create competitive advantage, especially in advanced configuration, architecture, content quality, and business logic. But the baseline state of the web, the technical floor on which everything else is built, is increasingly set by product teams shipping defaults to millions of sites.

Perhaps we should consider that if CMSs are the infrastructure layer of modern SEO, then plugin creators are de facto standards setters. They deploy “best practice” before it becomes doctrine

This is how it should work, but I am also not entirely comfortable with this. They normalize implementation and even create new conventions simply by making them zero-cost. Standards that are redundant have the ability to endure because they can.

So the question is less about whether CMS platforms impact SEO. They clearly do. The more interesting question is whether we, as SEOs, are paying enough attention to where those defaults originate, how they evolve, and how much of the web’s “best practice” is really just the path of least resistance shipped at scale.

An SEO’s value should not be interpreted through the amount of hours they spend discussing canonical tags, meta robots, and rules of sitemap inclusion. This should be standard and default. If you want to have an out-sized impact on SEO, lobby an existing tool, create your own plugin, or drive interest to influence change in one.

More Resources:  


Featured Image: Prostock-studio/Shutterstock

Google Discover Update: Early Data Shows Fewer Domains In US via @sejournal, @MattGSouthern

NewzDash published an analysis comparing Discover visibility before and after Google’s February 2026 Discover core update, using panel data from millions of US users tracked through its DiscoverPulse tool.

It compared pre-update (Jan 25-31) and post-update (Feb 8-14) windows across the top 1,000 domains and top 1,000 articles in the US, California, and New York.

For transparency, NewzDash is a news SEO tracking platform that sells Discover monitoring tools.

What The Data Shows

Google said the update targeted more locally relevant content, less sensational and clickbait content, and more in-depth, timely content from sites with topic expertise. The NewzDash data has early readings on all three.

NewzDash compared Discover feeds in California, New York, and the US as a whole. The three feeds mostly overlapped, but each state got local stories the others didn’t. New York-local domains appeared roughly five times more often in the New York feed than in the California feed, and vice versa.

In California, local articles in the top 100 placements rose from 10 to 16 in the post-update window. The local layer included content from publishers like SFGate and LA Times that didn’t appear in the national top 100 during the same period.

Clickbait reduction was harder to confirm. NewzDash acknowledged that headline markers alone can’t prove clickbait decreased. It did find that what it called ‘templated curiosity-gap patterns’ appeared to lose visibility. Yahoo’s presence in the US top 1,000 dropped from 11 to 6 articles, with zero items in the top 100 post-update.

Unique content categories grew across all three geographic views, but unique publishers shrank in the US (172 to 158 domains) and California (187 to 177). That combination suggests Discover is covering more topics but sending that distribution to a narrower set of publishers.

This pattern aligns with what early December core update analysis showed about specialized sites gaining ground over generalists.

X.com’s Growing Discover Presence

X.com posts from institutional accounts climbed from 3 to 13 items in the US top 100 Discover placements and from 2 to 14 in New York’s top 100.

NewzDash noted it had tracked X.com’s Discover growth since November and said the update appeared to accelerate the trend. Most top-performing X items came from established media brands.

The analysis noted it couldn’t prove or disprove whether X posts are cannibalizing publisher traffic in Discover, calling the data a “directional sanity check.” The open question is whether routing through X adds friction that could reduce click-through to owned pages.

Why This Matters

As we continue to monitor the Discover core update, we now have early data on what it seems to favor. Regional publishers with locally relevant content showed up more often in NewzDash’s post-update top lists.

Discover covered more topics in the post-update window, but fewer sites were getting that traffic in the US and California. Publishers without a clear topic focus could be on the wrong side of that trend.

Looking Ahead

This analysis covers an early window while the rollout is still being completed. The post-update measurement period overlaps with the Super Bowl, Winter Olympics, and ICC Men’s T20 World Cup, any of which could independently inflate News and Sports category visibility.

Google said it plans to expand the Discover core update beyond English-language US users in the months ahead.


Featured Image: joingate/Shutterstock

4 Sites That Recovered From Google’s December 2025 Core Update – What They Changed via @sejournal, @marie_haynes

The December 2025 core update had a significant impact on a large number of sites. Each of the sites below that have done well are either long term clients, past clients or sites that I have done a site review for. While we can never say with certainty what changed as the result of a change to Google’s core algorithms and systems, I’ll share some observations on what I think helped these sites improve.

1. Trust Matters Immensely

This first client, a medical eCommerce site, reached out to me in mid 2024 and we started on a long term engagement. A few days into our relationship they were strongly negatively impacted by the August 2024 core update. It was devastating.

When you are impacted by a core update, in most cases, you remain suppressed until another core update happens. It usually takes several core updates. And given that these only happen a few times a year, this site remained suppressed for quite some time.

We worked on a lot of things:

  • Improving blog post quality so it was not “commodity content”.
  • Improving page load time.
  • Optimizing images.
  • Improving FAQ content on product pages to help answer customer questions.
  • Creating helpful guides.
  • Improving product descriptions to better answer questions their customers have.
  • Adding more information about the E-E-A-T of authors.
  • Adding more authors with medical E-E-A-T.
  • Getting more reviews from satisfied customers.

While I think that all of the above helped contribute to a better assessment of quality for this site, I actually think that what helped the most had very little to do with SEO, but rather, was the result of the business working hard to truly improve upon customer service.

Core updates are tightly connected to E-E-A-T. Google says that trust is the most important aspect of E-E-A-T. The quality rater guidelines, which serve as guidance to help Google’s quality raters who help train their AI systems to improve in producing high quality search algorithms, mention “trust” 191 times.

For online stores, the raters are told that reliable customer service is vitally important.

Image Credit: Marie Haynes

A few bad reviews aren’t likely to tank your rankings, but this business had previously had significant logistical problems with shipping. They had been working hard to rectify these. Yet, if I asked AI Mode to tell me about the reputation of this company compared to their competitors, it would always tell me that there were serious concerns.

Here’s an interesting prompt you can use in AI Mode:

Make a chart showing the perceived trust in [url or brand] over time.

You can see that finally in 2025 the overall trust in this brand improved.

Image Credit: Marie Haynes

My suspicion is that these trust issues were the main driver in their core update suppression. I can’t say whether it was the improvement in customer trust that made a difference, the improvements in quality we made, or perhaps both. But these results were so good to see.

Image Credit: Marie Haynes

They continue to improve. Google recommends them more often in Popular Products carousels, ranks them more highly for many important terms and more importantly, drives far more sales for them now.

2. Original Content Takes A Lot Of Work

The next site is another site that was impacted by a core update.

This site is an affiliate site that writes about a large ticket product. They have a lot of competition from some big players in their industry. When I reviewed their site, one thing was obvious to me. While they had a lot of content, most of it offered essentially the same value as everyone else. This was frustrating considering they actually did purchase and review these products. What they were writing about was mostly a collection of known facts on these products rather than their personal experience. And what was experiential was buried in massive walls of text that were difficult for readers to navigate.

Google’s guidance on core updates recommends that if you were impacted, you should consider rewriting or restructuring your content to make it easier for your audience to read and navigate the page.

Image Credit: Marie Haynes

This site put an incredible amount of work into improving their content quality:

  • They purchased the products they reviewed and took detailed photos of everything they discussed. And videos. Really helpful videos.
  • The blog posts were written by an expert in their field. This already was the case, but we worked on making it more clear what their expertise was and why it was helpful.
  • We brainstormed with AI to help us come up with ideas for adding helpful unique information that was borne from their experience and not likely to be found on other sites.
  • We used Microsoft Clarity to identify aspects on pages that were frustrating users and worked to improve them.
  • We added interactive quizzes to help readers and drive engagement.
  • We worked on improving freshness for every important post, ensuring they were up to date with the latest information.
  • We worked to really get in the shoes of a searcher and understand what they wanted to see. We made sure that this information was easy to find even if a reader was skimming.
  • We broke up large walls of text into chunks with good headings that were easy to skim and navigate.
  • We noindexed pages that were talking on YMYL topics for which they lacked expertise.
  • We worked on improving core web vitals. (Note: I don’t think this is a huge ranking factor, but in this case the largest contentful paint was taking forever and likely frustrated users.)

Once again, it took many months of tireless work before improvements were seen! Rankings improved to the first page for many important keywords and some moved from page 4 to position #1-3.

Image Credit: Marie Haynes

3. Work To Improve User Experience

This next site was not a long term client, but rather, a site review I did for an eCommerce site in an YMYL niche. The SEO working on this site applied many of my recommendations and made some other smart changes as well including:

  • Improving site navigation and hierarchy.
  • Improved UX. They have a nicer, more modern font. The site looks more professional.
  • Improved customer checkout flow which improved checkout abandonment rates.
  • Improved their About Us page to add more information to demonstrate the brand’s experience and history. Note: I don’t think this matters immensely to Google’s algorithms as most of their assessment of trust is made from off-site signals, but it may help users feel more comfortable with engaging.
  • Produced content around some topics that were gaining public attention. This did help to truly earn some new links and mentions from authoritative sources.

After making these changes, the site was able to procure a knowledge panel for brand searches. And, search traffic is climbing.

Image Credit: Marie Haynes

4. First Hand Experience Can Really Help

This next site is another one that I did a site review for. It is a city guide that monetizes through affiliate links and sponsors. For every page I looked at I came to the same conclusion: There was nothing on this page that couldn’t be covered by an AI Overview. Almost every piece of information was essentially paraphrased from somewhere else on the web.

The most recent update to the rater guidelines increased the use of the word “paraphrased” from 3 mentions to 25. I think this applies to a lot of sites!

Image Credit: Marie Haynes

and

Image Credit: Marie Haynes

and also,

Image Credit: Marie Haynes

Yet, when I spoke with the site owner she shared to me that they had on-site writers who were truly writing from their experience.

While I don’t know specifically what changes this site owner has made, I looked at several pages that had seen nice improvements in conjunction with the core update and noticed the following improvements:

  • They’ve added video to some posts – filmed by their team.
  • There’s original photography from their team – not taken from elsewhere on the web. Not every photo is original, but quite a few of them are.
  • Added information to help readers make their decision, like “This place is best for…” or, “Must try dishes include…”
  • They wrote about their actual experiences. Rather than just sharing what dishes were available at a restaurant, they share which ones they tried and how they felt they stood out compared to other restaurants.
  • They’ve worked to keep content updated and fresh.

This site saw some nice improvements. However, they still have ground to gain as they previously were doing much better in the days before the helpful content updates.

Image Credit: Marie Haynes

Some Thoughts For Sites That Have Not Done Well

The December 2025 core update had a devastating negative impact on many sites. If you were impacted, your answer is unlikely to lie in technical SEO fixes, disavowing links or building new links. Google’s ranking systems are a collection of AI systems that work together with one goal in mind – to present searchers with pages that they are likely to find helpful. Many components of the ranking systems are deep learning systems which means that they improve on these recommendations over time.

I’d recommend the following for you:

1. Consider Whether The Brand Has Trust Issues

You can try the AI Mode prompt I used above. A few bad reviews is not going to cause you a core update suppression. But, a prolonged history of repeated customer service frustrations, fraud or anything else that significantly impacts your reputation can seriously impact your ability to rank. This is especially true if you are writing on YMYL topics.

2. Look At How Your Content Is Structured

It is a helpful exercise to look at which pages Google’s algorithms are ranking for your queries. If they don’t seem to make sense to you, look at how quickly they get people to the answer they are trying to find. I have found that often sites that are impacted make their readers scroll through a lot of fluff or ads to get to the important bits. Improve your headings – not for search engines, but for readers who are skimming. Put the important parts at the top. Or, if that’s not feasible, make it really easy for people to find the “main content”.

Here’s a good exercise – Open up the rater guidelines. These are guidelines for human raters who help Google understand if the AI systems are producing good, helpful rankings. CTRL-F for “main content” and see what you can learn.

3. Really Ask Yourself Whether Your Content Is Mostly “Commodity Content”

Commodity content is information that is widely available in many places on the web. There was a time when a business could thrive by writing pages that aggregate known information on a topic. Now that Google has AI Overviews and AI Mode, this type of page is much less valuable. You will still see some pages cited in AI Overviews that essentially parrot what is already in the AIO. Usually these are authoritative sites which are helpful for readers who want to see information from an authority rather than an AI answer.

Liz Reid from Google said these interesting words in an interview with the WSJ:

“What people click on in AI Overviews is content that is richer and deeper. That surface level AI generated content, people don’t want that, because if they click on that they don’t actually learn that much more than they previously got. They don’t trust the result any more across the web. So what we see with AI Overviews is that we sort of surface these sites and get fewer, what we call bounced clicks. A bounced click is like, you click on this site and you’re like, “Ah, I didn’t want that” and you go back. And so AI Overviews give some content and then we get to surface sort of deeper, richer content, and we’ll look to continue to do that over time so that we really do get that creator content and not AI generated.”

Here is a good exercise to try on some of the pages that have declined with the core update. Give your url or copy your page’s content into your favourite LLM and use this prompt:

“What are 10 concepts that are discussed in this page? For each concept tell me whether this topic has been widely written about online. Does this content I am sharing with you add anything truly uniquely interesting and original to the body of knowledge that already exists? Your goal here is to be brutally honest and not just flatter me. I want to know if this page is likely to be considered commodity content or whether it truly is content that is richer and deeper than other pages available on the web.”

You can follow this up with this prompt:

“Give me 10 ideas that I can use to truly create content that goes deeper on these topics? How can I draw from my real world experience to produce this kind of content?”

Concluding Thoughts

I’ve been studying Google updates for a long time – since the early days of Panda and Penguin updates. I built a business on helping sites recover following Google update hits. However, over the years I have found it is increasingly more difficult for a site that is impacted by a Google update to recover. This is why today, although I do still love doing site reviews to give you ideas for improving, I generally decline doing work with sites that have been strongly impacted by Google updates. While recovery is possible, it generally takes a year or more of hard work and even then, recovery is not guaranteed as Google’s algorithms and people’s preferences are continually changing.

The sites that saw nice recovery with this Google update were sites that worked on things like:

  • Truly improving the world’s perception of their customer service.
  • Creating original and insightful content that was substantially better than other pages that exist.
  • Using their own imagery and videos in many cases.
  • Working hard to improve user experience.

If you missed it I recently published this video that talks about what we learned about the role of user satisfaction signals in Google’s algorithms. Traditional ranking factors create an initial pool of results. AI systems rerank them, working to predict what the searcher will find most helpful. And the quality raters as well as live users in live user tests help fine-tune these systems.

And here are some more blog posts that you may find helpful:

Ultimately, Google’s systems work to reward content that users are likely to find satisfying. Your goal is to be the most helpful result there is!

More Resources:


Read Marie’s newsletter AI News You Can Use, subscribe now.


Featured Image: Jack_the_sparow/Shutterstock

SEO Fundamental: Google Explains Why It May Not Use A Sitemap via @sejournal, @martinibuster

Google’s John Mueller answered a question about why a Search Console was providing a sitemap fetch error even though server logs show that GoogleBot successfully fetched it.

The question was asked on Reddit. The person who started the discussion listed a comprehensive list of technical checks that they did to confirm that the sitemap returns a 200 response code, uses a valid XML structure, indexing is allowed and so on.

The sitemap is technically valid in every way but Google Search Console keeps displaying an error message about it.

The Redditor explained:

“I’m encountering very tricky issue with sitemap submission immediately resulted `Couldn’t fetch` status and `Sitemap could not be read` error in the detail view. But i have tried everything I can to ensure the sitemap is accessible and also in server logs, can confirm that GoogleBot traffic successfully retrieved sitemap with 200 success code and it is a validated sitemap with URL – loc and lastmod tags.

…The configuration was initially setup and sitemap submitted in Dec 2025 and for many months, there’s no updates to sitemap crawl status – multiple submissions throughout the time all result the same immediate failure. Small # of pages were submitted manually and all were successfully crawled, but none of the rest URLs listed in sitemap.xml were crawled.”

Google’s John Mueller answered the question, implying that the error message is triggered by an issue related to the content.

Mueller responded:

“One part of sitemaps is that Google has to be keen on indexing more content from the site. If Google’s not convinced that there’s new & important content to index, it won’t use the sitemap.”

While Mueller did not use the phrase “site quality,” site quality is implied because he says that Google has to be “keen on indexing more content from the site” that is “new and important.”

That implies two things, that maybe the site doesn’t produce much new content and that the content might not be important. The part about content being important is a very broad description that can mean a lot of things and not all of those reasons necessarily mean that the content is low quality.

Sometimes the ranked sites are missing an important form of content or a structure that makes it easier for users to understand a topic or come to a decision. It could be an image, it could be a step by step, it could be a video, it could be a lot of things but not necessarily all of them. When in doubt, think like a site visitor and try to imagine what would be the most helpful for them. Or it could  be that the content is trivial because it’s thin or not unique. Mueller was broad but I think circling back to what makes a site visitor happy is the way to identify ways to improve content.

Featured Image by Shutterstock/Asier Romero

Information Retrieval Part 3: Vectorization And Transformers (Not The Film)

Information retrieval systems are designed to satisfy a user. To make a user happy with the quality of their recall. It’s important we understand that. Every system and its inputs and outputs are designed to provide the best user experience.

From the training data to similarity scoring and the machine’s ability to “understand” our tired, sad bullshit – this is the third in a series I’ve titled, information retrieval for morons.

Image Credit: Harry Clarkson-Bennett

TL;DR

  1. In the vector space model, the distance between vectors represents the relevance (similarity) between the documents or items.
  2. Vectorization has allowed search engines to perform concept searching instead of word searching. It is the alignment of concepts, not letters or words.
  3. Longer documents contain more similar terms. To combat this, document length is normalized, and relevance is prioritized.
  4. Google has been doing this for over a decade. Maybe for over a decade, you have too.

Things You Should Know Before We Start

Some concepts and systems you should be aware of before we dive in.

I don’t remember all of these, and neither will you. Just try to enjoy yourself and hope that through osmosis and consistency, you vaguely remember things over time.

  • TF-IDF stands for term frequency-inverse document frequency. It is a numerical statistic used in NLP and information retrieval to measure a term’s relevance within a document corpus.
  • Cosine similarity measures the cosine of the angle between two vectors, ranging from -1 to 1. A smaller angle (closer to 1) implies higher similarity.
  • The bag-of-words model is a way of representing text data when modelling text with machine learning algorithms.
  • Feature extraction/encoding models are used to convert raw text into numerical representations that can be processed by machine learning models.
  • Euclidean distance measures the straight-line distance between two points in vector space to calculate data similarity (or dissimilarity).
  • Doc2Vec (an extension of Word2Vec), designed to represent the similarity (or lack of it) in documents as opposed to words.

What Is The Vector Space Model?

The vector space model (VSM) is an algebraic model that represents text documents or items as “vectors.” This representation allows systems to create a distance between each vector.

The distance calculates the similarity between terms or items.

Commonly used in information retrieval, document ranking, and keyword extraction, vector models create structure. This structured, high-dimensional numerical space enables the calculation of relevance via similarity measures like cosine similarity.

Terms are assigned values. If a term appears in the document, its value is non-zero. Worth noting that terms are not just individual keywords. They can be phrases, sentences, and entire documents.

Once queries, phrases, and sentences are assigned values, the document can be scored. It has a physical place in the vector space as chosen by the model.

In this case, words, represented on a graph to denote relationships between them (Image Credit: Harry Clarkson-Bennett)

Based on its score, documents can be compared to one another based on the inputted query. You generate similarity scores at scale. This is known as semantic similarity, where a set of documents is scored and positioned in the index based on their meaning.

Not just their lexical similarity.

I know this sounds a bit complicated, but think of it like this:

Words on a page can be manipulated. Keyword stuffed. They’re too simple. But if you can calculate meaning (of the document), you’re one step closer to a quality output.

Why Does It Work So Well?

Machines don’t just like structure. They bloody love it.

Fixed-length (or styled) inputs and outputs create predictable, accurate results. The more informative and compact a dataset, the better quality classification, extraction, and prediction you will get.

The problem with text is that it doesn’t have much structure. At least not in the eyes of a machine. It’s messy. This is why it has such an advantage over the classic Boolean Retrieval Model.

In Boolean Retrieval Models, documents are retrieved based on whether they satisfy the conditions of a query that uses Boolean logic. It treats each document as a set of words or terms and uses AND, OR, and NOT operators to return all results that fit the bill.

Its simplicity has its uses, but cannot interpret meaning.

Think of it more like data retrieval than identifying and interpreting information. We fall into the term frequency (TF) trap too often with more nuanced searches. Easy, but lazy in today’s world.

Whereas the vector space model interprets actual relevance to the query and doesn’t require exact match terms. That’s the beauty of it.

It’s this structure that creates much more precise recall.

The Transformer Revolution (Not Michael Bay)

Unlike Michael Bay’s series, the real transformer architecture replaced older, static embedding methods (like Word2Vec) with contextual embeddings.

While static models assign one vector to each word, transformers generate dynamic representations that change based on the surrounding words in a sentence.

And yes, Google has been doing this for some time. It’s not new. It’s not GEO. It’s just modern information retrieval that “understands” a page.

I mean, obviously not. But you, as a hopefully sentient, breathing being, understand what I mean. But transformers, well, they fake it:

  1. Transformers weight input by data by significance.
  2. The model pays more attention to words that demand or provide extra context.

Let me give you an example.

“The bat’s teeth flashed as it flew out of the cave.”

Bat is an ambiguous term. Ambiguity is bad in the age of AI.

But transformer architecture links bat with “teeth,” “flew,” and “cave,” signaling that bat is far more likely to be a bloodsucking rodent* than something a gentleman would use to caress the ball for a boundary in the world’s finest sport.

*No idea if a bat is a rodent, but it looks like a rat with wings.

BERT Strikes Back

BERT. Bidirectional Encoder Representations from Transformers. Shrugs.

This is how Google has worked for years. By applying this type of contextually aware understanding to the semantic relationships between words and documents. It’s a huge part of the reason why Google is so good at mapping and understanding intent and how it shifts over time.

BERT’s more recent updates (DeBERTa) allow words to be represented by two vectors – one for meaning and one for its position in the document. This is known as Disentangled Attention. It provides more accurate context.

Yep, sounds weird to me, too.

BERT processes the entire sequence of words simultaneously. This means context is applied from the entirety of the page content (not just the few surrounding terms).

Synonyms Baby

Launching in 2015, RankBrain was Google’s first deep learning system. Well, that I know of anyway. It was designed to help the search algorithm understand how words relate to concepts.

This was kind of the peak search era. Anyone could start a website about anything. Get it up and ranking. Make a load of money. Not need any kind of rigor.

Halcyon days.

With hindsight, these days weren’t great for the wider public. Getting advice on funeral planning and commercial waste management from a spotty 23-year-old’s bedroom in Halifax.

As new and evolving queries surged, RankBrain and the subsequent neural matching were vital.

Then there was MUM. Google’s ability to “understand” text, images and visual content across multiple languages simultenously.

Document length was an obvious problem 10 years ago. Maybe less. Longer articles, for better or worse, always did better. I remember writing 10,000-word articles on some nonsense about website builders and sticking them on a homepage.

Even then that was a rubbish idea…

In a world where queries and documents are mapped to numbers, you could be forgiven for thinking that longer documents will always be surfaced over shorter ones.

Remember 10-15 years ago when everyone was obsessed when every article being 2,000 words.

“That’s the optimal length for SEO.”

If you see another “What time is X” 2,000-word article, you have my permission to shoot me.

You can’t knock the fact this is a better experience (Image Credit: Harry Clarkson-Bennett)

Longer documents will – as a result of containing more terms – have higher TF values. They also contain more distinct terms. These factors can conspire to raise the scores of longer documents

Hence why, for a while, they were the zenith of our crappy content production.

Longer documents can broadly be lumped into two categories:

  1. Verbose documents that essentially repeat the same content (hello, keyword stuffing, my old friend).
  2. Documents covering multiple topics, in which the search terms probably match small segments of the document, but not all of it.

To combat this obvious issue, a form of compensation for document length is used, known as Pivoted Document Length Normalization. This adjusts scores to counteract the natural bias longer documents have.

Pivoted normalization rescales term weights using a linear adjustment around the average document length (Image Credit: Harry Clarkson-Bennett)

The cosine distance should be used because we do not want to favour longer (or shorter) documents, but to focus on relevance. Leveraging this normalization prioritizes relevance over term frequency.

It’s why cosine similarity is so valuable. It is robust to document length. A short and long answer can be seen as topically identical if they point in the same direction in the vector space.

Great question.

Well, no one’s expecting you to understand the intricacies of a vector database. You don’t really need to know that databases create specialized indices to find close neighbors without checking every single record.

This is just for companies like Google to strike the right balance between performance, cost, and operational simplicity.

Kevin Indig’s latest excellent research shows that 44.2% of all citations in ChatGPT originate from the first 30% of the text. The probability of citation drops significantly after this initial section, creating a “ski ramp” effect.

Image Credit: Harry Clarkson-Bennett

Even more reason not to mindlessly create massive documents because someone told you to.

In “AI search,” a lot of this comes down to tokens. According to Dan Petrovic’s always excellent work, each query has a fixed grounding budget of approximately 2,000 words total, distributed across sources by relevance rank.

In Google, at least. And your rank determines your score. So get SEO-ing.

Position 1 gives you double the prominence of position 5 (Image Credit: Harry Clarkson-Bennett)

Metehan’s study on what 200,000 Tokens Reveal About AEO/GEO really highlights how important this is. Or will be. Not just for our jobs, but biases and cultural implications.

As text is tokenized (compressed and converted into a sequence of integer IDs), this has cost and accuracy implications.

  • Plain English prose is the most token-efficient format at 5.9 characters per token. Let’s call it 100% relative efficiency. A baseline.
  • Turkish prose has just 3.6. This is 61% as efficient.
  • Markdown tables 2.7. 46% as efficient.

Languages are not created equal. In an era where capital expenditures (CapEx) costs are soaring, and AI firms have struck deals I’m not sure they can cash, this matters.

Well, as Google has been doing this for some time, the same things should work across both interfaces.

  1. Answer the flipping question. My god. Get to the point. I don’t care about anything other than what I want. Give it to me immediately (spoken as a human and a machine).
  2. So frontload your important information. I have no attention span. Neither do transformer models.
  3. Disambiguate. Entity optimization work. Connect the dots online. Claim your knowledge panel. Authors, social accounts, structured data, building brands and profiles.
  4. Excellent E-E-A-T. Deliver trustworthy information in a manner that sets you apart from the competition.
  5. Create keyword-rich internal links that help define what the page and content are about. Part disambiguation. Part just good UX.
  6. If you want something focused on LLMs, be more efficient with your words.
    • Using structured lists can reduce token consumption by 20-40% because they remove fluff. Not because they’re more efficient*.
    • Use commonly known abbreviations to also save tokens.

*Interestingly, they are less efficient than traditional prose.

Almost all of this is about giving people what they want quickly and removing any ambiguity. In an internet full of crap, doing this really, really works.

Last Bits

There is some discussion around whether markdown for agents can help strip out the fluff from HTML on your site. So agents could bypass the cluttered HTML and get straight to the good stuff.

How much of this could be solved by having a less fucked up approach to semantic HTML, I don’t know. Anyway, one to watch.

Very SEO. Much AI.

More Resources:


Read Leadership in SEO. Subscribe now.


Featured Image: Anton Vierietin/Shutterstock

Google AI Mode Link Update, Click Share Data & ChatGPT Fan-Outs – SEO Pulse via @sejournal, @MattGSouthern

Welcome to the week’s SEO Pulse: updates affect how links appear in AI search results, where organic clicks are going, and which languages ChatGPT uses to find sources.

Here’s what matters for you and your work.

Google Redesigns Links In AI Overviews And AI Mode

Robby Stein, VP of Product for Google Search, announced on X that AI Overviews and AI Mode are getting a redesigned link experience on both desktop and mobile.

Key Facts: On desktop, groups of links will now appear in a pop-up when you hover over them, showing site names, favicons, and short descriptions. Google is also rolling out more descriptive and prominent link icons across desktop and mobile.

Why This Matters

This is the latest in a series of link-visibility updates Stein has announced since last summer, when he called showing more inline links Google’s “north star” for AI search. The pattern is consistent. Google keeps iterating on how links surface inside AI-generated responses.

The hover pop-up is a new interaction pattern for AI Overviews. Instead of small inline citations that are easy to miss, users now get a preview card with enough context to decide whether to click. That changes the calculus for publishers wondering how much traffic AI results actually send.

What The Industry Is Saying

SEO consultant Lily Ray (Amsive) wrote on X that she had been seeing the new link cards and was “REALLY hoping it sticks.”

Read our full coverage: Google Says Links Will Be More Visible In AI Overviews

43% Of ChatGPT Fan-Out Queries For Non-English Prompts Run In English

A report from AI search analytics firm Peec AI found that a large share of ChatGPT’s fan-out queries run in English, even when the original prompt was in another language.

Key Facts: Peec AI analyzed over 10 million prompts and 20 million fan-out queries from its platform data. Across non-English prompts analyzed, 43% of the fan-out queries ran in English. Nearly 78% of non-English prompt sessions included at least one English-language fan-out query.

Why This Matters

When ChatGPT Search builds an answer, it can rewrite the user’s prompt into “one or more targeted queries,” according to OpenAI’s documentation. OpenAI does not describe how language is chosen for those rewritten queries. Peec AI’s data suggests that English gets inserted into the process even when the user and their location are clearly non-English.

SEO and content teams working in non-English markets may face a disadvantage in ChatGPT’s source selection that doesn’t map to traditional ranking signals. Language filtering appears to happen before citation signals come into play.

Read our full coverage: ChatGPT Search Often Switches To English In Fan-Out Queries: Report

Google’s Search Relations Team Can’t Say You Still Need A Website

Google’s Search Relations team was asked directly whether you still need a website in 2026. They didn’t give a definitive yes.

Key Facts: In a new episode of the Search Off the Record podcast, Gary Illyes and Martin Splitt spent about 28 minutes exploring the question. Both acknowledged that websites still offer advantages, including data sovereignty, control over monetization, and freedom from platform content moderation. But neither argued that the open web offers something irreplaceable.

Why This Matters

Google Search is built around crawling and indexing web content. The fact that Google’s own Search Relations team treats “do I need a website?” as a business decision rather than an obvious yes is worth noting.

Illyes offered the closest thing to a position. He said that if you want to make information available to as many people as possible, a website is probably still the way to go. But he called it a personal opinion, not a recommendation.

The conversation aligns with increasingly fragmented user journeys, now spanning AI chatbots, social feeds, community platforms, and traditional search. For practitioners advising clients on building websites, the answer increasingly depends on where the audience is, not where it used to be.

Read our full coverage: Google’s Search Relations Team Debates If You Still Need A Website

Theme Of The Week: The Ground Keeps Moving Under Organic

Each story this week shows a different force pulling attention, clicks, or visibility away from the organic channel as practitioners have known it.

Google is redesigning how links appear in AI responses, acknowledging the traffic concern. ChatGPT’s background queries introduce a language filter that can exclude non-English content before relevance signals even apply. And Google’s own team won’t say that websites are the default answer for visibility anymore.

These stories reinforce the idea of spreading your content across different platforms to reach more people. And track where your clicks are really coming from.

More Resources:


Featured Image: TippaPatt/Shutterstock; Paulo Bobita/Search Engine Journal

35-Year SEO Veteran: Great SEO Is Good GEO — But Not Everyone’s Been Doing Great SEO via @sejournal, @theshelleywalsh

As SEOs, we are used to being adaptable to changing algorithms, so LLM optimization should be a simple extension of that process.

To discuss the industry debates surrounding the differences between SEO and GEO and clarify whether they are the same or different, I spoke with SEO veteran Grant Simmons.

Grant has over 30 years of experience helping brands grow and has spent decades focused on meaning, intent, and topical authority long before LLMs entered the conversation.

I spoke with Grant about signal alignment, how Google’s latest continuation patents reveal the mechanics of LLM citations, and what SEOs are getting wrong about topical focus.

We talk about writing for the machines, but we’re really writing for human need because it’s all driven by the prompt or the query.” – Grant Simmons

You can watch the full interview with Grant on IMHO below, or continue reading the article summary.

Great SEO Is Good GEO

At Google Search Live in December 2025, John Mueller said, “Good SEO is good GEO.”

I asked Grant what he thought were the differences between optimizing for search engines and for machines, and if he thought there were any overlaps.

Grant’s approach echoes what John Mueller said, but “Not everyone has been doing great SEO,” he explained. “Great SEO was always about building topical authority.”

He continued to say, “Essentially, machines (whether it’s Google or whether it’s an LLM) have to understand the underlying meaning of the content so they can present the best answer.

They have to understand the query or the prompt, then they have to send the best answer. So in that way, it’s very similar.”

Where Grant sees divergence is in how the systems evaluate content. Google has historically ranked pages, and even with passage ranking, it still considers the page and the site as a whole. LLMs operate differently.

“LLMs are looking more at that passage side, you know, something that’s easily extractable, something that has value semantically related to the query or the prompt. And so there’s that fundamental difference.”

Grant also stressed that great SEO has always been holistic, touching social media, PR, content, and brand messaging. Having brand awareness, brand visibility, and brand consistency across all channels is a significant factor in LLM representation. And this is exactly the kind of work that the best SEOs do.

“We’re marketers. We should make sure, not just from a standpoint of what we do in SEO and GEO for our clients, which is connecting a need and intent to the product or service that satisfies that intent, we’re also doing the same in our own marketing. We have to understand what our clients are looking for.

“[GEO] is the same [as SEO] if you’re doing it well. It’s not the same if you weren’t. And of course, there’s nuance.”

My thoughts are that SEOs who have been in the industry the longest are experiencing less disruption because they have seen it all before. They learned to be adaptable in the early years when there was so much flux as we progressed from multiple search engines to just one. Whereas for anyone new to the industry, they don’t have the same background points of reference.

Why Consensus Matters To Be Surfaced By LLMs

I went on to ask Grant about Google’s latest continuation patents, which describe two distinct systems that work together.

The first is what Grant describes as a response confidence engine. This system evaluates whether a passage can be corroborated, whether the information has consensus across the web.

“If they return a passage and they can corroborated that it is true, and when we say true, it’s true in the sense of more than one person is saying it, that doesn’t mean it’s true, but it means the consensus is there,” Grant explained. “The consensus generally wins out.”

The second system is what Grant calls a linkifying engine. Once a passage has been confirmed through consensus, this engine determines whether a specific sentence or sub-element within that passage, what Grant calls a “chunklet,” can be matched and linked to a source.

“Consensus decides whether it’s surfaced in the first place. The linkify engine actually decides whether it’s linkable, whether a citation is actually going to happen,” Grant said.

Getting mentioned by an LLM is one thing. Getting an actual link back to your content requires that the specific passage is both verifiable through consensus and uniquely attributable to your source.

Golden Knowledge Content Wins

So, what kind of content earns this kind of AI visibility? Grant described it as “golden knowledge,” content that is unique in some meaningful way.

“Generally, data-driven, your own data, your own opinion that’s proof-backed, evidence-backed. Taking a different view of things,” Grant said. “But in the same way of taking a different view, there still has to be some kind of consensus. If other people are agreeing with you, that is really important. Your content needs the uniqueness and the data-driven aspect, but it still has to align with the overall consensus on the web.”

Grant was also clear that while we often talk about writing for machines, the orientation should remain human-centered: “We talk about writing for the machines, but we’re really writing for human need because it’s all driven by the prompt or the query.”

This balance between uniqueness and consensus is perhaps the most actionable takeaway. Content that simply restates what everyone else is saying won’t stand out. But content that takes a position without corroboration elsewhere won’t pass the confidence threshold to be surfaced. The sweet spot is original, data-driven insight that others can and do validate.

The Biggest Mistakes SEOs Make With Topical Focus

When I asked Grant about the most common mistakes he sees with topical diversification on pages, his answer was clear: trying to be everything to everyone.

“When you think about intent, suddenly you understand that pages have a right to exist,” Grant said. “I call it path to satisfaction. Understanding who the audience is and what they need to find, you have to provide a path to that satisfaction.”

Grant pointed out that most SEOs inherit existing sites rather than building from scratch. The temptation is to focus on the surface-level optimizations, such as title tags, meta descriptions, and headers, without reviewing whether a page is actually focused on a specific intent or whether it has what he calls “drift.”

“What they won’t do is fundamentally review the page and understand whether that page is focused on a specific intent or whether it has this drift,” Grant explained. “Cleaning out those outliers, topics that you’re covering when you don’t really mean to, is essentially diffusing what the page means. Those are the things that I think SEOs miss out on.”

This ties directly back to LLM citability. If a page lacks clear topical focus, it becomes harder for AI systems to extract a self-contained passage that answers a specific query. Tightening that focus isn’t just good SEO; it’s the foundation of being visible in AI-generated responses.

Grant’s Strategy Recommendation For 2026

I finished by asking Grant what he’s recommending to his clients right now.

“Let’s double down on what’s working,” Grant said. “LLM traffic is so small today that optimizing for LLMs is important for the future but not for today’s metrics. Let’s improve our SEO. Let’s get to that great SEO level. And as we’re doing that, we are incorporating the elements that will help you show up for GEO, that will help show up on these other surfaces.”

His focus is on great content, topical authority, uniqueness, data-driven approaches, citations, and digital PR. In Grant’s words: “Getting content so good that LLMs can’t ignore you, Google can’t ignore you, and publications can’t ignore you.”

It’s the Steve Martin philosophy applied to SEO: “Be so good they can’t ignore you,” and, coincidence or not, the rule I have applied for the last 15 years in SEO.

Watch the full interview with Grant Simmons here:

Thank you to Grant Simmons for offering his insights and being my guest on IMHO.

More Resources:


Featured Image: Shelley Walsh/Search Engine Journal

Are Citations In AI Search Affected By Google Organic Visibility Changes? via @sejournal, @lilyraynyc

recently wrote about an unconfirmed Google algorithm update that rolled out in mid-January 2026, which negatively impacted the organic search visibility of dozens of major brands. For most of the impacted sites I analyzed, the impact was disproportionately targeted to the company’s blog, or another folder containing informational articles and resources.

That same organic trajectory has continued into mid-February for all of the subfolders I analyzed, using the Sistrix U.S. Visibility Index:

Image Credit: Lily Ray

Zooming out, this is what the drops look like when you look at the visibility trends across the whole domains, not just the blogs:

Image Credit: Lily Ray

Here is another example of the visibility impact on the company blog for the biggest company in the list (in terms of both ARR and organic visibility):

Image Credit: Lily Ray

And this is what the impact looks like when you look at the company’s full domain’s visibility in organic search:

Image Credit: Lily Ray

Needless to say, these recent organic visibility drops were extreme, relative to the sites’ overall SEO trajectories over the past few years. Drilling down into 11 of the sites that saw extreme declines over the last month, I wanted to see if this new data could help answer another question:

Do drops in Google organic search visibility coincide with similar drops in AI search citations?

My working hypothesis is that these drops are no longer just isolated to traditional search. Instead, I suspect we will find that, for most LLMs, AI search citation trends mirror what happens in Google’s organic search results, for two reasons:

1. The Direct Pipeline: Google’s AI Ecosystem

For Google’s own AI products – AI Mode and Gemini – the correlation should be strongest. Presumably, Google is using its own index and top-ranking search results to formulate AI search responses; therefore, dropping in organic rankings should logically cause those pages to be cited and referenced less frequently in generative answers.

2. The Downstream Effects: Third-Party LLMs (ChatGPT & Perplexity)

The link between Google organic rankings and third-party LLMs like ChatGPT and Perplexity is more nuanced, as we don’t know exactly which search engines these LLMs are surfacing for web search.

While there is a growing body of evidence (and industry reporting) suggesting that ChatGPT likely scrapes Google during live web searches, we still technically lack official confirmation from the source. Perplexity, on the other hand, is currently believed to utilize the Brave Search API as a core part of its retrieval process, alongside its own specialized “PerplexityBot” crawler.

To test this out, I wanted to drill down into the subfolders that saw substantial visibility drops on Google in recent weeks, to see whether the trend line for AI search citations followed suit.

To start, I honed in on a list of 11 sites whose subfolders saw substantial organic traffic drops between January 20, 2026 and February 16, 2026.

I used the Ahrefs MCP server with Claude Cowork to pull in estimated global monthly organic traffic numbers for each path (subfolder) in the list. Because most of the traffic declines started around January 21, 2026, I pulled the projected monthly organic numbers for January 20, 2026 and the most recent date, February 16, 2026.

I also redacted the site names, leaving the name of the subfolder and a brief, anonymized summary about the company type and the subfolder’s purpose:

Image Credit: Lily Ray

These subfolders experienced anywhere from a -5.7% to -53.1% drop in estimated monthly organic search traffic since January 20th, 2026.

Using Ahrefs Brand Radar, you can drill down to see the number of AI search citations that a given subfolder has received across various LLMs over time. For example, here is the ChatGPT citation trend line for the first subfolder listed in the above table (U.S. data):

Image Credit: Lily Ray

This is the corresponding chart showing the organic traffic trend for this same subfolder, which began dropping around January 21, 2026:

Image Credit: Lily Ray

I used the Ahrefs MCP server with Claude Cowork to pull global traffic and citation data, and to analyze this same pattern for 11 of the subfolders that saw big drops.

Note on methodology: While 11 subfolders are a small sample size, I was specifically looking for a “clean” data set – subfolders experiencing a similar algorithmic demotion on Google during the unconfirmed January 2026 update. By narrowing the scope, I could better isolate whether a loss of traditional search visibility translates directly into a citation drop in AI search.

Below are the high-level summaries of how organic traffic and citation counts changed across Google and various LLMs, including AI Mode, ChatGPT, Perplexity, and Gemini:

Image Credit: Lily Ray
Image Credit: Lily Ray

Findings:

  • The data shows a broad decline in both SEO traffic & AI search citations: Every subfolder in the study (11 of 11) experienced a drop in both Google organic traffic and total AI search citations, with a significant average citation decline of -22.5%.
  • Google’s AI Mode (-23.8%) and ChatGPT (-27.8%) showed the most severe declines, closely mirroring the -26.7% average drop in organic traffic.
  • While Gemini also saw broad declines (10 of 11 sites), Perplexity proved to be the most resilient, with only 4 of the 11 sites seeing a drop and a much milder average change of -2.9%.
    • This data supports the theory that Perplexity is primarily using non-Google search surfaces to generate its responses.

Looking at the changes in estimated organic traffic for each subfolder compared to total AI search citations between January 20 and February 16, 2026, the correlation is clear: Significant losses in organic search visibility are almost universally mirrored by a corresponding decline in AI search citations.

Image Credit: Lily Ray

Drilling down into specific LLMs, including Google’s AI Mode, ChatGPT, Perplexity, and Gemini, shows how the decline was nearly universal for most platforms, whereas Perplexity frequently displayed a significant divergence, showing positive citation growth for the majority of the subfolders despite their organic traffic losses.

Image Credit: Lily Ray

ChatGPT (green) consistently shows the deepest declines across almost every subfolder – often exceeding AI Mode and Gemini. This is intriguing because ChatGPT isn’t a Google product, yet it appears more sensitive to these organic ranking shifts than Google’s own Gemini.

This appears to be another clue that ChatGPT is reliant on Google’s search index during retrieval.

AI Mode and Gemini tend to move in the same direction but not the same magnitude. Despite both being Google products, AI Mode declines are generally steeper than Gemini’s. This could suggest they weight or source from Google’s organic index differently – perhaps AI Mode is more tightly coupled to live SERP rankings while Gemini draws from a broader or cached knowledge base.

The few sites where Perplexity did decline (e.g., Site J, Site K) are also the ones showing relatively smaller organic drops. So even in the cases where Perplexity tracked downward, it doesn’t appear to be correlated with the severity of the Google organic loss – further evidence that Perplexity is likely pulling from a different retrieval pipeline.

The below table shows all the organic search vs. AI search citation data in one place:

Image Credit: Lily Ray

The table reveals a clear pattern: every subfolder that lost organic visibility on Google also saw a decline in total AI search citations, with an average drop of -22.5% across all LLMs.

ChatGPT was the most severely impacted platform, with citation declines reaching as high as -42.3% (Site E) and exceeding -34% for five of the eleven subfolders – often surpassing even the organic traffic loss itself.

Google’s AI Mode followed a similar trajectory, while Gemini showed more moderate declines across the board.

The most notable outlier is Perplexity, which actually showed citation growth for 7 of the 11 subfolders, reinforcing the theory that it retrieves from a non-Google search index.

Perhaps the most interesting finding is that ChatGPT – a non-Google product – appears more tightly coupled to Google’s organic rankings than Google’s own Gemini, suggesting that ChatGPT’s web retrieval pipeline is heavily dependent on Google’s search results.

One recommendation I’ve been making since AI search entered the SEO conversation is that you shouldn’t invest in AEO/GEO tactics that could be detrimental to SEO performance. For example, using hidden prompt injections, cloaking, or self-promotional listicles (tactics that some have advocated for to boost AI search visibility) might be temporarily beneficial for AI search, but could cause massive headaches with Google and Bing’s organic search ranking algorithms down the line.

Now, we have even more evidence that AI search is fundamentally connected to SEO performance: If you drop in organic search, you can likely expect a corresponding drop in citations not only from Google’s own AI search products, but from other LLMs like ChatGPT, which appear to also be heavily reliant on Google’s search results.

The one notable exception is Perplexity, which showed citation growth for the majority of subfolders hit by the recent algorithm update. That said, it’s important to weigh this against the scale of traffic and LLM usage at stake. According to a recent article by Similarweb, ChatGPT received 5.8 billion web visits in August 2025, compared to 148.2 million for Perplexity.

To add to this, when you factor in Google’s organic search traffic – which still dwarfs all the AI search platforms combined – the vast majority of your search-driven visibility across both search engines and AI chatbots is still flowing through a pipeline where Google’s rankings dictate the outcome.

For the past year, the SEO industry has been asking how closely traditional SEO and AEO/GEO are really tied together. I think this data helps answer that question: Not only is a strong SEO foundation critical for AI search visibility, but tactics that hurt your organic rankings can have a cascading negative impact on your AI search citations as well. In other words, the fastest way to lose visibility in AI search might be to lose it in Google first.

More Resources:


This post was originally published on Lily Ray NYC Substack.


Featured Image: PeopleImages/Shutterstock