Web Almanac Data Reveals CMS Plugins Are Setting Technical SEO Standards (Not SEOs) via @sejournal, @chrisgreenseo

If more than half the web runs on a content management system, then the majority of technical SEO standards are being positively shaped before an SEO even starts work on it. That’s the lens I took into the 2025 Web Almanac SEO chapter (for clarity, I co-authored the 2025 Web Almanac SEO chapter referenced in this article).

Rather than asking how individual optimization decisions influence performance, I wanted to understand something more fundamental: How much of the web’s technical SEO baseline is determined by CMS defaults and the ecosystems around them.

SEO often feels intensely hands-on – perhaps too much so. We debate canonical logic, structured data implementation, crawl control, and metadata configuration as if each site were a bespoke engineering project. But when 50%+ of pages in the HTTP Archive dataset sit on CMS platforms, those platforms become the invisible standard-setters. Their defaults, constraints, and feature rollouts quietly define what “normal” looks like at scale.

This piece explores that influence using 2025 Web Almanac and HTTP Archive data, specifically:

  • How CMS adoption trends track with core technical SEO signals.
  • Where plugin ecosystems appear to shape implementation patterns.
  • And how emerging standards like llms.txt are spreading as a result.

The question is not whether SEOs matter. It’s whether we’ve been underestimating who sets the baseline for the modern web.

The Backbone Of Web Design

The 2025 CMS chapter of the Web Almanac saw a milestone hit with CMS adoption; over 50% of pages are on CMSs. In case you were unsold on how much of the web is carried by CMSs, over 50% of 16 million websites is a significant amount.

Screenshot from Web Almanac, February 2026

With regard to which CMSs are the most popular, this again may not be surprising, but it is worth reflecting on with regard to which has the most impact.

Image by author, February 2026

WordPress is still the most used CMS, by a long way, even if it has dropped marginally in the 2024 data. Shopify, Wix, Squarespace, and Joomla trail a long way behind, but they still have a significant impact, especially Shopify, on ecommerce specifically.

SEO Functions That Ship As Defaults In CMS Platforms

CMS platform defaults are important, this – I believe – is that a lot of basic technical SEO standards are either default setups or for the relatively small number of websites that have dedicated SEOs or people who at least build to/work with SEO best practice.

When we talk about “best practice,” we’re on slightly shaky ground for some, as there isn’t a universal, prescriptive view on this one, but I would consider:

  • Descriptive “SEO-friendly” URLs.
  • Editable title and meta description.
  • XML sitemaps.
  • Canonical tags.
  • Meta robots directive changing.
  • Structured data – at least a basic level.
  • Robots.txt editing.

Of the main CMS platforms, here is what they – self-reportedly – have as “default.” Note: For some platforms – like Shopify – they would say they’re SEO-friendly (and to be honest, it’s “good enough”), but many SEOs would argue that they’re not friendly enough to pass this test. I’m not weighing into those nuances, but I’d say both Shopify and those SEOs make some good points.

CMS SEO-friendly URLs Title & meta description UI XML sitemap Canonical tags Robots meta support Basic structured data Robots.txt
WordPress Yes Partial (theme-dependent) Yes Yes Yes Limited (Article, BlogPosting) No (plugin or server access required)
Shopify Yes Yes Yes Yes Limited Product-focused Limited (editable via robots.txt.liquid, constrained)
Wix Yes Guided Yes Yes Limited Basic Yes (editable in UI)
Squarespace Yes Yes Yes Yes Limited Basic No (platform-managed, no direct file control)
Webflow Yes Yes Yes Yes Yes Manual JSON-LD Yes (editable in settings)
Drupal Yes Partial (core) Yes Yes Yes Minimal (extensible) Partial (module or server access)
Joomla Yes Partial Yes Yes Yes Minimal Partial (server-level file edit)
Ghost Yes Yes Yes Yes Yes Article No (server/config level only)
TYPO3 Yes Partial Yes Yes Yes Minimal Partial (config or extension-based)

Based on the above, I would say that most SEO basics can be covered by most CMSs “out of the box.” Whether they work well for you, or you cannot achieve the exact configuration that your specific circumstances require, are two other important questions – ones which I am not taking on. However, it often comes down to these points:

  1. It is possible for these platforms to be used badly.
  2. It is possible that the business logic you need will break/not work with the above.
  3. There are many more advanced SEO features that aren’t out of the box, that are just as important.

We are talking about foundations here, but when I reflect on what shipped as “default” 15+ years ago, progress has been made.

Fingerprints Of Defaults In The HTTP Archive Data

Given that a lot of CMSs ship with these standards, do these SEO defaults correlate with CMS adoption? In many ways, yes. Let’s explore this in the HTTP Archive data.

Canonical Tag Adoption Correlates With CMS

Combining canonical tag adoption data with (all) CMS adoption over the last four years, we can see that for both mobile and desktop, the trends seem to follow each other pretty closely.

Image by author, February 2026
Image by author, February 2026

Running a simple Pearson correlation over these elements, we can see this strong correlation even clearer, with canonical tag implementation and the presence of self-canonical URLs.

Image by author, February 2026

What differs is the mobile correlation of canonicalized URLs; that seems to be a negative correlation on mobile and a lower (but still positive) correlation on desktop. A drop in canonicalized pages is largely causing this negative correlation, and the reasons behind this could be many (and harder to be sure of).

Canonical tags are a crucial element for technical SEO; their continued adoption does certainly seem to track the growth in CMS use, too.

Schema.org Data Types Correlate With CMS

Schema.org types against CMS adoption show similar trends, but are less definitive overall. There are many different types of Schema.org, but if we plot CMS adoption against the ones most common to SEO concerns, we can observe a broadly rising picture.

Image by author, February 2026

With the exception of Schema.org WebSite, we can see CMS growth and structured data following similar trends.

But we must note that Schema.org adoption is quite considerably lower than CMSs overall. This could be due to most CMS defaults being far less comprehensive with Schema.org. When we look at specific CMS examples (shortly), we’ll see far-stronger links.

Schema.org implementation is still mostly intentional, specialist, and not as widespread as it could be. If I were a search engine or creating an AI Search tool, would I rely on universal adoption of these, seeing the data like this? Possibly not.

Robots.txt

Given that robots.txt is a single file that has some agreed standards behind it, its implementation is far simpler, so we could anticipate higher levels of adoption than Schema.org.

The presence of a robots.txt is pretty important, mostly to limit crawl of search engines to specific areas of the site. We are starting to see an evolution – we noted in the 2025 Web Almanac SEO chapter – that the robots.txt is used even more as a governance piece, rather than just housekeeping. A key sign that we’re using our key tools differently in the AI search world.

But before we consider the more advanced implementations, how much of a part does a CMS have in ensuring a robots.txt is present? Looks like over the last four years, CMS platforms are driving a significant amount more of robots.txt files serving a 200 response:

Image by author, February 2026

What is more curious, however, is when you consider the file of the robots.txt files. Non-CMS platforms have robots.txt files that are significantly larger.

Image by author, February 2026

Why could this be? Are they more advanced in non-CMS platforms, longer files, more bespoke rules? Most probably in some cases, but we’re missing another impact of a CMSs standards – compliant (valid) robots.txt files.

A lot of robots.txt files serve a valid 200 response, but often they’re not txt files, or they’re redirecting to 404 pages or similar. When we limit this list to only files that contain user-agent declarations (as a proxy), we see a different story.

Image by author, February 2026

Approaching 14% of robots.txt files served on non-CMS platforms are likely not even robots.txt files.

A robots.txt is easy to set up, but it is a conscious decision. If it’s forgotten/overlooked, it simply won’t exist. A CMS makes it more likely to have a robots.txt, and what’s more, when it is in place, it makes it easier to manage/maintain – which IS key.

WordPress Specific Defaults

CMS platforms, it seems, cover the basics, but more advanced options – which still need to be defaults – often need additional SEO tools to enable.

Interrogating WordPress-specific sites with the HTTP Archive data will be easiest as we get the largest sample, and the Wapalizer data gives a reliable way to judge the impact of WordPress-specific SEO tools.

From the Web Almanac, we can see which SEO tools are the most installed on WordPress sites.

Screenshot from Web Almanac, February 2026

For anyone working within SEO, this is unlikely to be surprising. If you are an SEO and worked on WordPress, there is a high chance you have used either of the top three. What IS worth considering right now is that while Yoast SEO is by far the most prevalent within the data, it is seen on barely over 15% of sites. Even the most popular SEO plugin on the most popular CMS is still a relatively small share.

Of these top three plugins, let’s first consider what the differences of their “defaults” are. These are similar to some of WordPress’s, but we can see many more advanced features that come as standard.

SEO Capability All-in-One SEO Yoast SEO Rank Math
Title tag control Yes (global + per-post) Yes Yes
Meta description control Yes Yes Yes
Meta robots UI Yes (index/noindex etc.) Yes Yes
Default meta robots output Explicit index,follow Explicit index,follow Explicit index,follow
Canonical tags Auto self-canonical Auto self-canonical Auto self-canonical
Canonical override (per URL) Yes Yes Yes
Pagination canonical handling Limited Historically opinionated More configurable
XML sitemap generation Yes Yes Yes
Sitemap URL filtering Basic Basic More granular
Inclusion of noindex URLs in sitemap Possible by default Historically possible Configurable
Robots.txt editor Yes (plugin-managed) Yes Yes
Robots.txt comments/signatures Yes Yes Yes
Redirect management Yes Limited (free) Yes
Breadcrumb markup Yes Yes Yes
Structured data (JSON-LD) Yes (templated) Yes (templated) Yes (templated, broad)
Schema type selection UI Yes Limited Extensive
Schema output style Plugin-specific Plugin-specific Plugin-specific
Content analysis/scoring Basic Heavy (readability + SEO) Heavy (SEO score)
Keyword optimization guidance Yes Yes Yes
Multiple focus keywords Paid Paid Free
Social metadata (OG/Twitter) Yes Yes Yes
Llms.txt generation Yes – enabled by default Yes – one-check enable Yes – one-check enable
AI crawler controls Via robots.txt Via robots.txt Via robots.txt

Editable metadata, structured data, robots.txt, sitemaps, and, more recently, llms.txt are the most notable. It is worth noting that a lot of the functionality is more “back-end,” so not something we’d be as easily able to see in the HTTP Archive data.

Structured Data Impact From SEO Plugins

We can see (above) that structured data implementation and CMS adoption do correlate; what is more interesting here is to understand where the key drivers themselves are.

Viewing the HTTP Archive data with a simple segment (SEO plugins vs. no SEO plugins), from the most recent scoring paints a stark picture.

Image by author, February 2026

When we limit the Schema.org @types to the most associated with SEO, it is really clear that some structured data types are pushed really hard using SEO plugins. They are not completely absent. People may be using lesser-known plugins or coding their own solutions, but ease of implementation is implicit in the data.

Robots Meta Support

Another finding from the SEO Web Almanac 2025 chapter was that “follow” and “index” directives were the most prevalent, even though they’re technically redundant, as having no meta robots directives is implicitly the same thing.

Screenshot from Web Almanac 2025, February 2026

Within the chapter number crunching itself, I didn’t dig in much deeper, but knowing that all major SEO WordPress plugins have “index,follow” as default, I was eager to see if I could make a stronger connection in the data.

Where SEO plugins were present on WordPress, “index, follow” was set on over 75% of root pages vs. <5% of WordPress sites without SEO plugins>

Image by author, February 2026

Given the ubiquity of WordPress and SEO plugins, this is likely a huge contributor to this particular configuration. While this is redundant, it isn’t wrong, but it is – again – a key example of whether one or more of the main plugins establish a de facto standard like this, it really shapes a significant portion of the web.

Diving Into LLMs.txt

Another key area of change from the 2025 Web Almanac was the introduction of the llms.txt file. Not an explicit endorsement of the file, but rather a tacit acknowledgment that this is an important data point in the AI Search age.

From the 2025 data, just over 2% of sites had a valid llms.txt file and:

  • 39.6% of llms.txt files are related to All-in-One SEO.
  • 3.6% of llms.txt files are related to Yoast SEO.

This is not necessarily an intentional act by all those involved, especially as Rank Math enables this by default (not an opt-in like Yoast and All-in-One SEO).

Image by author, February 2026

Since the first data was gathered on July 25, 2025 if we take a month-by-month view of the data, we can see further growth since. It is hard not to see this as growing confidence in this markup OR at least, that it’s so easy to enable, more people are likely hedging their bets.

Conclusion

The Web Almanac data suggests that SEO, at a macro level, moves less because of individual SEOs and more because WordPress, Shopify, Wix, or a major plugin ships a default.

  • Canonical tags correlate with CMS growth.
  • Robots.txt validity improves with CMS governance.
  • Redundant “index,follow” directives proliferate because plugins make them explicit.
  • Even llms.txt is already spreading through plugin toggles before it even gets full consensus.

This doesn’t diminish the impact of SEO; it reframes it. Individual practitioners still create competitive advantage, especially in advanced configuration, architecture, content quality, and business logic. But the baseline state of the web, the technical floor on which everything else is built, is increasingly set by product teams shipping defaults to millions of sites.

Perhaps we should consider that if CMSs are the infrastructure layer of modern SEO, then plugin creators are de facto standards setters. They deploy “best practice” before it becomes doctrine

This is how it should work, but I am also not entirely comfortable with this. They normalize implementation and even create new conventions simply by making them zero-cost. Standards that are redundant have the ability to endure because they can.

So the question is less about whether CMS platforms impact SEO. They clearly do. The more interesting question is whether we, as SEOs, are paying enough attention to where those defaults originate, how they evolve, and how much of the web’s “best practice” is really just the path of least resistance shipped at scale.

An SEO’s value should not be interpreted through the amount of hours they spend discussing canonical tags, meta robots, and rules of sitemap inclusion. This should be standard and default. If you want to have an out-sized impact on SEO, lobby an existing tool, create your own plugin, or drive interest to influence change in one.

More Resources:  


Featured Image: Prostock-studio/Shutterstock

Agentic Commerce Optimization: A Technical Guide To Prepare For Google’s UCP via @sejournal, @alexmoss

In January, I wrote about the birth of agentic commerce through both Agentic Commerce Protocol (ACP) and Universal Commerce Protocol (UCP), and how this could impact us all as consumers, business owners, and SEOs. As we still sit on waitlists for both, this doesn’t mean that we can’t prepare for it.

UCP fixes a real-life problem for many, minimizing the fragmented commerce journey. Instead of building separate integrations for every agent platform as we have been mostly doing in the past, you can now [theoretically] integrate once and will integrate seamlessly with other tools and platforms.

But note here that, as opposed to ACP which focuses more so on the checkout → fulfillment → payment journey, UCP goes beyond this with six capabilities covering the entire commerce lifecycle.

This, of course, will impact an SEO’s ambit. As we shift from optimizing for clicks to optimizing for selection, we also need to ensure that it’s you/your client that is selected through data integrity, product signals, and AI-readable commerce capabilities. Structured data has always served an important role for the internet as a whole and will continue to be the driving force on how you can serve agents, crawlers, and humans in the best way possible.

I allude to a possible new acronym “ACO” – Agentic Commerce Optimization – and the following could be considered the closest we can get to guidelines on how we undertake it.

UCP Isn’t Coming, It’s Here

UCP was only announced in January, but there’s already confirmation that its capabilities are rolling out. On Feb. 11, 2026, Vidhya Srinivasan (VP/GM of Advertising & Commerce at Google) announced that Wayfair and Etsy now use UCP so that you can purchase directly within AI Mode, and was observed the next day by Brodie Clark.

UCP’s Six Layered Capabilities

On the day UCP was released, Google explained its methodology.

From this, I defined six core capabilities:

  1. Product Discovery – how agents find and surface your inventory during research.
  2. Cart Management – multi-item baskets, dynamic pricing, complex basket rules.
  3. Identity Linking – OAuth 2.0 authorization for personalized experiences and loyalty.
  4. Checkout – session creation, tax calculation, payment handling.
  5. Order Management – webhook-based lifecycle and logistical updates.
  6. Vertical Capabilities – extensible modules for specialized use cases like travel booking windows or subscription schedules.

UCP’s schema authoring guide shows how capabilities are defined through versioned JSON schemas, which act as the foundation of the protocol. When it comes to considering this as an SEO, properties such as offers, aggregateRating, and shippingDetails aren’t just for surfacing rich snippets, etc., for product discovery, they’re now what agents query during the entire process.

Schema Is, And Will Continue To Be, Essential

UCP’s technical specification uses its own JSON schema-based vocabulary. Whilst it doesn’t build on schema.org directly, it remains critical in the broader ecosystem. As Pascal Fleury Fleury said at Google Search Central Live in December, “schema is the glue that binds all these ontologies together”. UCP handles the transaction; schema.org helps agents decide who to transact with.

Ensure you’re on top of and populate product schema as much as you can. It may seem like SEO 101. Regardless, audit all of this now to ensure you’re not missing anything when UCP really rolls out.

This includes checks on:

  • Product schema (with complete coverage): All core fields: name, description, SKU, GTIN, brand, related images, and offers.
  • Offers must include: Price, priceCurrency, availability, URL, seller. Add aggregateRating and review to ensure you have positive third-party perspective.
  • Ensure all product variants output correctly.
  • Include shippingDetails with delivery estimates.
  • Organization and Brand: Assists with “Merchant of Record” verification. If you’re not an Organization, then fallback to Person.
  • Designated FAQPage: Ensure you have an FAQpage as these can be incorporated alongside product-level FAQs and used as part of its decision-making.

Prepare Your Merchant Center Feed

UCP will utilize your existing Merchant Center feed as the discovery layer. This means that beyond the normal on-site schema you provide, Merchant Center itself requires more details that you can populate within its platform.

  • Return policies (required to be a Merchant of Record): Complete all return costs, return windows, and policy links. These will be used not just within the checkout and transactional areas, but again a consideration for selection at all. Advanced accounts need policies at each sub-account level.
  • Customer support information: Not only would initial information be offered to the customer, but there may be ways in which entry-level customer support queries can be completely managed, thus increasing customer satisfaction while minimizing customer support agent capacity.
  • Agentic checkout eligibility: Add the native_commerce attribute to your feed, as products are only eligible here if this is set up.
  • Product identifiers: Each product must have an ID, and correlate to the product ID when using the checkout API.
  • Product consumer warnings: Any product warning should assert the consumer_notice attribute.

Google recommends that this be done through a supplemental data source in Merchant Center rather than modifying your primary feed, which would prevent incorrect formatting or other invalidation.

Lastly, double-check if the products you’re selling aren’t included within its product restrictions list, as there are several that, if you do offer those things, you should consider how to manage alongside the abilities of UCP.

Optimizing Conversational Commerce Attributes

Within the UCP blog post announcement, Srinivasan introduced a way for more clarity with conversational commerce attributes:

“…we’re announcing dozens of new data attributes in Merchant Center designed for easy discovery in the conversational commerce era, on surfaces like AI Mode, Gemini and Business Agent. These new attributes complement retailers’ existing data feeds and go beyond traditional keywords to include things like answers to common product questions, compatible accessories or substitutes.”

These provide further clarity (and therefore minimize hallucinations) during the discovery process in order to be selected or ruled out.

Not only would this incorporate product and brand-related FAQs, but take this a step further to also consider:

  • Compatibility: Potential up-sell opportunities.
  • Substitution: An opportunity for dealing with out-of-stock items.
  • Related products: Great for cross-sell opportunities.

Furthermore, this can be used to become even more specific, moving beyond basic attributes to agent-parseable details. Now, if a product is “purple” on a basic level, “dark purple” or even something unobvious, such as “Wolf” (real example below), may be more appropriate for finer detail while still falling under “purple.” The same can be considered for sizes, materials (or a mixture of materials), etc.

Multi-Modal Fan-Out Selection

When executed well, optimizing for conversational commerce attributes will increase the possibility of selection within fan-out query results. When considering some of these attributes, it is worth looking at tools, such as WordLift’s Visual Fan-Out simulator, which illustrates how a single image decomposes into multiple search intents, revealing which attributes agents may prioritize when performing query fan-out. But how would this look?

As an example, I used one product image and browsed downward three horizons. Using On’s Cloudsurfer Max as an example (used with permission):

Cloudsurfer Max in the colour “Wolf”
Image credit: On

Using just one product image, this is what is presented on the surface:

Screenshot from WordLift’s Visual Fan-Out simulator, February 2026

It immediately noticed that the product was On, and specifically from the Cloudsurfer range. Great start! Now let’s see what it sees over the horizon:

Screenshot from WordLift’s Visual Fan-Out simulator, February 2026
Screenshot from WordLift’s Visual Fan-Out simulator, February 2026
Screenshot from WordLift’s Visual Fan-Out simulator, February 2026

Here, you can draw inspiration or direction on how best to place yourself for potential and likely fan-out queries. With this example, I found it interesting that Horizon 2 mentions performance running gear as a large category, then when performing fan-out on that showed the related products around gear in general. This shows how wide LLMs consider selection and how you can present attributes to attract selection.

UCP’s Roadmap Is Expanding Into Multi-Verticals

UCP is already planning to go beyond one single purchase but expands beyond retail into travel, services, and other verticals. Its roadmap details several priorities over the coming year, including:

  • Multi‑item carts and complex baskets: Moving beyond single‑item checkout to native multi‑item carts, bundling, promotions, tax/shipping logic, and more realistic fulfillment handling.
  • Loyalty and account linking: Standardized loyalty program management and account linking so agents can apply points, member pricing, and benefits across merchants.
  • Post‑purchase support: Support for order tracking, returns, and customer‑service handoff so agents can manage customer support post-sale.
  • Personalization signals: Richer signals for cross‑sell/upsell, wishlists, history, and context‑based recommendations.
  • New verticals: Expansion beyond retail into travel, services, digital goods, and food/restaurant use cases via extensions to the protocol.

Each of the points above is worth further reading and consideration if this is something your brand may offer. Furthermore, its plans to expand beyond retail into travel, services, digital goods, and hospitality mean that, if you’re working within any of these verticals, you need to be even more prepared to ensure eligibility.

Social Proof And Third-Party Perspective

Regardless of how well you may optimize on-site to prepare for UCP, all this data integrity still needs to be validated by trusted third-party sources.

Third-party platforms, such as Trustpilot and G2, appear to be frequently cited and trusted among most of the LLMs, so I’d still advise that you continue to collect those positive brand and product reviews in order to satisfy consensus, resulting in more opportunities to be selected during product discovery.

TL;DR – Prepare Now

If you own or manage any form of ecommerce site, now is the time to ensure you’re preparing for UCP’s rollout as soon as possible. It’s only a matter of time, and with AI Mode spreading into default experiences, getting ahead of the rollout is essential.

  1. Join the UCP waitlist.
  2. Prepare Merchant Center: return policies, native_commerce attribute.
  3. Ensure your developers research and understand the UCP documentation.
  4. Populate conversational attributes: question-answers, compatibility, substitutes.
  5. Audit and improve any schema where applicable.

This is moving faster than most previous commerce shifts, and brands that wait for full rollout signals will already be behind. This isn’t a short-term LLM gimmick but is part of the largest change in the ecommerce space.

More Resources:


Featured Image: Roman Samborskyi/Shutterstock

4 Pillars To Turn Your “Sticky-Taped” Tech Stack Into a Modern Publishing Engine

This post was sponsored by WP Engine. The opinions expressed in this article are the sponsor’s own.

In the race for audience attention, digital marketers at media companies often have one hand tied behind their backs. The mission is clear: drive sustainable revenue, increase engagement, and stay ahead of technological disruptions such as LLMs and AI agents.

Yet, for many media organizations, execution is throttled by a “Sticky-taped stack,” which is a fragile, patchwork legacy CMS structure and ad-hoc plugins. For a digital marketing leader, this isn’t just a technical headache; it’s a direct hit to the bottom line.

It’s time to examine the Fragmentation Tax, and why a new publishing standard is required to reclaim growth.

Fragmentation Tax: How A Siloed CMS, Disconnected Data & Tech Debt Are Costing You Growth

The Fragmentation Tax is the hidden cost of operational inefficiency. It drains budgets, burns out teams, and stunts the ability to scale. For digital marketing and growth leads, this tax is paid in three distinct “currencies”:

1. Siloed Data & Strategic Blindness.

When your ad server, subscriber database, and content tools exist as siloed work streams, you lose the ability to see the full picture of the reader’s journey.

Without integrated attribution, marketers are forced to make strategic pivots based on vanity metrics like generic pageviews rather than true business intelligence, such as conversion funnels or long-term reader retention.

2. The Editorial Velocity Gap.

In the era of breaking news, being second is often the same as being last. If an editorial team is forced into complex, manual workflows because of a fragmented tech stack, content reaches the market too late to capture peak search volume or social trends. This friction creates a culture of caution precisely when marketing needs a culture of velocity to capture organic traffic.

3. Tech Debt vs. Innovation.

Tech debt is the future cost of rework created by choosing “quick-and-dirty” solutions. This is a silent killer of marketing budgets. Every hour an engineering team spends fixing plugin conflicts or managing security fires caused by a cobbled-together infrastructure is an hour stolen from innovation.

The 4 Publishing Pillars That Improve SEO & Monetization

To stop paying this tax, media organizations are moving away from treating their workflows as a collection of disparate parts. Instead, they are adopting a unified system that eliminates the friction between engineering, editorial, and growth.

A modern publishing standard addresses these marketing hurdles through four key operational pillars:

Pillar 1: Automated Governance (Built-In SEO & Tracking Integrity)

Marketing integrity relies on consistency.

In a fragmented system, SEO metadata, tracking pixels, and brand standards are often managed manually, leading to human error.

A unified approach embeds governance directly into the workflow.

By using automated checklists, organizations ensure that no article goes live until it meets defined standards, protecting the brand and ensuring every piece of content is optimized for discovery from the moment of publication.

Pillar 2: Fearless Iteration (Continuous SEO & CRO Optimization Without Risk)

High-traffic articles are a marketer’s most valuable asset. However, in a legacy stack, updating a live story to include, for instance, a Call-to-Action (CTA), is often a high-risk maneuver that could break site layouts.

A modern unified approach allows for “staged” edits, enabling teams to draft and review iterations on live content without forcing those changes live immediately. This allows for a continuous improvement cycle that protects the user experience and site uptime.

Pillar 3: Cross-Functional Collaboration (Reducing Workflow Bottlenecks Between Editorial, SEO & Engineering)

Any type of technology disruption requires a team to collaborate in real-time. The “Sticky-taped” approach often forces teams to work in separate tools, creating bottlenecks.

A modern unified standard utilizes collaborative editing, separating editorial functions into distinct areas for text, media, and metadata. This allows an SEO specialist or a growth marketer to optimize a story simultaneously with the journalist, ensuring the content is “market-ready” the instant it’s finished.

Pillar 4: Native Breaking News Capabilities (Capturing Real-Time Search Demand)

Late-breaking or real-time events, such as global geopolitical shifts or live sports, require in-the-moment storytelling to keep audiences informed, engaged, and on-site. Traditionally, “Live Blogs” relied on clunky third-party embeds that fragmented user data and slowed page loads.

A unified standard treats breaking news as a native capability, enabling rapid-fire updates that keep the audience glued to the brand’s own domain, maximizing ad impressions and subscription opportunities.

Conclusion: Trading Toil for Agility

Ultimately, shifting to a unified standard is about reducing inefficiencies caused by “fighting the tools.” By removing the technical toil that typically hides insights in siloed tools, media organizations can finally trade operational friction for strategic agility.

When your site’s foundation is solid and fast, editors can hit “publish” without worrying about things breaking. At the same time, marketers can test new ways to grow the audience without waiting weeks for developers to update code. This setup clears the way for everyone to move faster and focus on what actually matters: telling great stories and connecting with readers.

The era of stitching software together with “sticky tape” is over. For modern media companies to thrive amid constant digital disruption, infrastructure must be a launchpad, not a hindrance. By eliminating the Fragmentation Tax, marketing leaders can finally stop surviving and start growing.

Jason Konen is director of product management at WP Engine, a global web enablement company that empowers companies and agencies of all sizes to build, power, manage, and optimize their WordPressⓇ websites and applications with confidence.

Image Credits

Featured Image: Image by WP Engine. Used with permission.

In-Post Images: Image by WP Engine. Used with permission.

Hidden HTTP Page Can Cause Site Name Problems In Google via @sejournal, @MattGSouthern

Google’s John Mueller shared a case where a leftover HTTP homepage was causing unexpected site-name and favicon problems in search results.

The issue, which Mueller described on Bluesky, is easy to miss because Chrome can automatically upgrade HTTP requests to HTTPS, making the HTTP version easy to overlook.

What Happened

Mueller described the case as “a weird one.” The site used HTTPS, but a server-default HTTP homepage was still accessible at the HTTP version of the domain.

Mueller wrote:

“A hidden homepage causing site-name & favicon problems in Search. This was a weird one. The site used HTTPS, however there was a server-default HTTP homepage remaining.”

The tricky part is that Chrome can upgrade HTTP navigations to HTTPS, which makes the HTTP version easy to miss in normal browsing. Googlebot doesn’t follow Chrome’s upgrade behavior.

Mueller explained:

“Chrome automatically upgrades HTTP to HTTPS so you don’t see the HTTP page. However, Googlebot sees and uses it to influence the sitename & favicon selection.”

Google’s site name system pulls the name and favicon from the homepage to determine what to display in search results. The system reads structured data from the website, title tags, heading elements, og:site_name, and other signals on the homepage. If Googlebot is reading a server-default HTTP page instead of the actual HTTPS homepage, it’s working with the wrong signals.

How To Check For This

Mueller suggested two ways to see what Googlebot sees.

First, he joked that you could use AI. Then he corrected himself.

Mueller wrote:

“No wait, curl on the command line. Or a tool like the structured data test in Search Console.”

Running curl http://yourdomain.com from the command line would show the raw HTTP response without Chrome’s auto-upgrade. If the response returns a server-default page instead of your actual homepage, that’s the problem.

If you want to see what Google retrieved and rendered, use the URL Inspection tool in Search Console and run a Live Test. Google’s site name documentation also notes that site names aren’t supported in the Rich Results Test.

Why This Matters

The display of site names and favicons in search results is something we’ve been documenting since Google first replaced title tags with site names in 2022. Since then, the system has gone through multiple growing pains. Google expanded site name support to subdomains in 2023, then spent nearly a year fixing a bug where site names on internal pages didn’t match the homepage.

This case introduces a new complication. The problem wasn’t in the structured data or the HTTPS homepage itself. It was a ghost page in the HTTP version, which you’d have no reason to check because your browser never showed it.

Google’s site name documentation explicitly mentions duplicate homepages, including HTTP and HTTPS versions, and recommends using the same structured data for both. Mueller’s case shows what can go wrong when an HTTP version contains content different from the HTTPS homepage you intended to serve.

The takeaway for troubleshooting site-name or favicon problems in search results is to check the HTTP version of your homepage directly. Don’t rely on what Chrome shows you.

Looking Ahead

Google’s site name documentation specifies that WebSite structured data must be on “the homepage of the site,” defined as the domain-level root URI. For sites running HTTPS, that means the HTTPS homepage is the intended source.

If your site name or favicon looks wrong in search results and your HTTPS homepage has the correct structured data, check whether an HTTP version of the homepage still exists. Use curl or the URL Inspection tool’s Live Test to view it directly. If a server-default page is sitting there, removing it or redirecting HTTP to HTTPS at the server level should resolve the issue.

Google’s Crawl Team Filed Bugs Against WordPress Plugins via @sejournal, @MattGSouthern

Google’s crawl team has been filing bugs directly against WordPress plugins that waste crawl budget at scale.

Gary Illyes, Analyst at Google, shared the details on the latest Search Off the Record podcast. His team filed an issue against WooCommerce after identifying its add-to-cart URL parameters as a top source of crawl waste. WooCommerce picked up the bug and fixed it quickly.

Not every plugin developer has been as responsive. An issue filed against a separate action-parameter plugin is still sitting unclaimed. And Google says its outreach to the developer of a commercial calendar plugin that generates infinite URL paths fell on deaf ears.

What Google Found

The details come from Google’s internal year-end crawl issue report, which Illyes reviewed during the podcast with fellow Google Search Relations team member Martin Splitt.

Action parameters accounted for roughly 25% of all crawl issues reported in 2025. Only faceted navigation ranked higher, at 50%. Together, those two categories represent about three-quarters of every crawl issue Google flagged last year.

The problem with action parameters is that each one creates what appears to be a new URL by adding text like ?add_to_cart=true. Parameters can stack, doubling or tripling the crawlable URL space on a site.

Illyes said these parameters are often injected by CMS plugins rather than built intentionally by site owners.

The WooCommerce Fix

Google’s crawl team filed a bug report against the plugin, flagging the add-to-cart parameter behavior as a source of crawl waste affecting sites at scale.

Illyes describes how they identified the issue:

“So we would try to dig into like where are these coming from and then sometimes you can identify that perhaps these action parameters are coming from WordPress plug-ins because WordPress is quite a popular CMS content management system. And then you would find that yes, these plugins are the ones that add to cart and add to wish list.”

And then what you would do if you were a Gary is to try to see if they are open source in the sense that they have a repository where you can report bugs and issues and in both of these cases the answer was yes. So we would file issues against these uh plugins.”

WooCommerce responded and shipped a fix. Illyes noted the turnaround was fast, but other plugin developers with similar issues haven’t responded. Illyes didn’t name the other plugins.

He added:

“What I really, really loved is that the good folks at Woolcommerce almost immediately picked up the issue and they solved it.”

Why This Matters

This is the same URL parameter problem Illyes warned about before and continued flagging. Google then formalized its faceted navigation guidelines into official documentation and revised its URL parameter best practices.

The data shows those warnings and documentation updates didn’t solve the problem because the same issues still dominate crawl reports.

The crawl waste is often baked into the plugin layer. That creates a real bind for websites with ecommerce plugins. Your crawl problems may not be your fault, but they’re still your responsibility to manage.

Illyes said Googlebot can’t determine whether a URL space is useful “unless it crawled a large chunk of that URL space.” By the time you notice the server strain, the damage is already happening.

Google consistently recommends robots.txt, as blocking parameter URLs proactively is more effective than waiting for symptoms.

Looking Ahead

Google filing bugs against open-source plugins could help reduce crawl waste at the source. The full podcast episode with Illyes and Splitt is available with a transcript.

What is NLWeb (Natural Language Web)?

Natural language is quickly becoming the default way people interact with online tools. Instead of typing a few keywords, users now ask full questions, give detailed instructions, and are starting to expect clear, conversational answers. So, how can you make sure your content provides the answer to their question? Or better yet, how can you make it possible for them to interact with your website in a similar way? That’s where Microsoft’s NLWeb comes in. 

Meet NLWeb, Microsoft’s new open project

NLWeb, short for Natural Language Web, is an open project recently launched by Microsoft. The aim of this project is to bring conversational interfaces directly to websites, rather than users having to use an external chatbot that’s in control of what’s shown. Instead of relying on traditional navigation or search bars, NLWeb is designed to allow users to ask questions and explore content in a more personal, conversational way. 

At its core, NLWeb connects website content to AI-powered tools. It enables AI to understand what a website is about, what information it contains, and how that information should be interpreted for the purpose of returning personalized results. With this project, Microsoft is moving toward a more interoperable, standards-based, and open web that allows everyone to prepare their website for the future of search.  

This project was initiated and realized by R.V. Guha, CVP and Technical Fellow at Microsoft. Guha is one of the creators of widely used web standards such as RSS and Schema.org.  

How NLWeb works

NLWeb works by combining structured data, standardized APIs and AI models capable of understanding natural language. Every NLWeb instance acts as a Model Context Protocol (MCP) server, which makes your content discoverable for all the agents operating in the MCP ecosystem. This makes it easy for these agents to find your website.  

Using structured data, website owners then present their content in a machine-readable way. AI applications can then consume this data and answer user questions accurately by matching them to the most relevant information. The result is a conversational experience powered by existing content, either directly on a website or through using an online search tool. A conversational interface for both human users and AI agents collecting information. 

An important thing to note is that NLWeb is an open project. It’s not a closed ecosystem, meaning that Microsoft wants to make it accessible to everyone. The idea is to make it easy for any website owner to create an intelligent, natural language experience for their site, while also preparing their content to interact with and be discovered by other online agents, such as AI tools and search engines.  

How does natural language work? 

Natural language simply refers to the way we speak and write. This means using full sentences that allow room for intent, context and nuance. More than keywords or short commands, natural language reflects how people think and what they are looking for exactly. 

To give you an example: a focus keyphrase might be running shoes trail. But using natural language, the request would look more like this: What are the best running shoes for trail running in wet conditions? 

Natural language in AI tools 

Modern AI tools are designed to understand this kind of input. The large language models behind these tools can analyze intent and context to generate responses that fulfill the given request. This is why conversational interfaces feel more intuitive than traditional search or forms. 

Tools like AI chat assistants, voice search, and even traditional search engines rely heavily on natural language understanding and users have quickly adapted to it. 

The current state of search 

The way people find information online is changing fast. A change that is heavily influenced by the use of AI-powered tools. We now expect personalized answers instead of a list of results to sort through ourselves. AI chatbots also give us the option to follow up on our original search query, which turns search into a conversation instead of a series of clicks. 

Research from McKinsey & Company shows that AI adoption and natural language interfaces are becoming mainstream, with 50% of consumers already using AI-driven tools for information discovery. The majority even say it’s the top digital source they use to make buying decisions. As these habits continue to grow, websites that aren’t optimized for natural language risk becoming invisible in AI-generated answers. 

Why this is interesting for you 

The shift to natural language isn’t just a technical trend. As discussed above, it directly impacts your online visibility and competitive position. 

If users ask an AI system for information, only a handful of sources will be referenced in the response. This is because, like search engines, AI platforms also need to be able to read the information on your website. Being one of those sources can be the difference between being discovered or being overlooked. 

NLWeb collaborates with Yoast 

With NLWeb, you are communicating your website’s content clearly and in a standardized way. That means your brand, products, or expertise can appear in AI-powered answers instead of your competitors. To help as many website owners as possible benefit from this shift, Yoast is collaborating with NLWeb.   

The best part? If you’re a user of any of our Yoast plans designed for WordPress, you’re well ahead here. Yoast’s integration with NLWeb will roll out in phases, starting with functionality that helps our users using WordPress express their content in ways AI systems can interpret accurately, without any additional setup required. So sit tight and let us help you prepare your website for the new world of search! 

NLWeb aims to make your content understandable not just for people, but for the AI systems that are increasingly relevant to your website’s discovery. 

Read more: Yoast collaborates with Microsoft to help AI understand Open Web »

What is the open web?

The open web is the part of the internet built on open standards that anyone can use. This concept creates a democratic digital space where people can build on each other’s work without restrictions, just like how WordPress.org is built. For website owners, understanding and leveraging the open web is increasingly crucial. Especially with the rise of AI-powered systems and the general direction that online search is taking. So, let’s explore what the open web is and what it means for your website.

What is the open web?

The open web refers to the part of the internet built on open, shared standards that are available to everyone. It’s powered by technologies like HTTP, HTML, RSS, and Schema.org, which make it easy for websites and online systems to interact with each other. But it is more than just technical protocols. It also includes open‑source code, public APIs, and the free flow of data and content across sites, services, and devices. Creating a democratic digital space where people can build on each other’s work without heavy restrictions.

Because these standards are not owned or patented, the open web remains largely decentralized. This allows content to be accessed, understood, and reused across devices and platforms. This not only encourages innovation but also ensures that information is discoverable without being locked behind proprietary ecosystems.

The benefits of an open web

The open web is built on publicly available protocols that enable access, collaboration, and innovation at a global scale. 

The most important benefits include:

  • Collaboration and innovation: Open protocols enable developers to build on each other’s work without proprietary restrictions.
  • Accessibility: Users and AI agents alike can access and interact with web content regardless of device, platform, or underlying technology.
  • Democratization: No single company controls access to information, giving publishers greater autonomy.
  • Inclusion: The open web creates a more level playing field, where everyone gets a chance to participate in the digital economy.

The open web vs the deep web

To give you a better idea of what the open web is, it helps to know about the “deep web” and closed or “walled garden” platforms. The deep web covers content not indexed by search engines, while closed systems or walled gardens restrict access and keep data siloed.

On the open web, anyone can access information freely. A good example of that is Wikipedia. Accessible to anyone looking for information on a topic and anyone who wants to contribute to its content. Closed-off platforms, like proprietary apps or social media ecosystems, create places where content is only available if you pay or use a specific service. Well-known examples of this are social media platforms such as Facebook and Instagram. Another example is a news website that requires a paid subscription to get access.

In essence, the open web keeps information discoverable, accessible, and interoperable – instead of locked inside a handful of platforms.

AI and the open web

The popularity of AI-powered search makes open web principles more important than ever. Decentralized and accessible information allows AI tools to interact with content directly and use it freely to generate an answer for a user. 

“We believe the future of AI is grounded in the open web.” 

Ramanathan Guha, CVP and Technical Fellow at Microsoft. 

Microsoft’s open project NLWeb is a prime example. It provides a standardized layer that enables AI agents to discover, understand, and interact with websites efficiently, without needing separate integrations for every platform. 

What this means for website owners

For website owners, including small business owners, embracing the open web means making your content freely available in ways that AI can interpret. By using structured data standards like Schema.org, your website becomes discoverable to AI tools. Increasing your reach and ensuring that your content remains part of the future of search. 

Yoast and Microsoft: collaborating towards a more open web

Yoast is proud to collaborate with NLWeb, a Microsoft project that makes your content easier to understand for AI agents without extra effort from website owners. Allowing your content to remain discoverable, reach a wider audience with and show up in AI-powered search results.  

The open web strives towards an accessible web where content is available for everyone. A web where it doesn’t matter how big your website or marketing budget is. Giving everyone the chance to be found and represented in AI-powered search. NLWeb helps turn this vision into reality by connecting today’s open web with tomorrow’s AI-driven search ecosystem 

Read more: Yoast collaborates with Microsoft to help AI understand Open Web »

The Hidden SEO Cost Of A Slow WordPress Site & How It Affects AI Visibility via @sejournal, @wp_rocket

This post was sponsored by WP Media. The opinions expressed in this article are the sponsor’s own.

You’ve built a WordPress site you’re proud of. The design is sharp, the content is solid, and you’re ready to compete. But there’s a hidden cost you might not have considered: a slow site doesn’t just hurt your SEO-it now affects your AI visibility too.

With AI-powered search platforms such as ChatGPT and Google’s AI Overviews and AI Mode reshaping how people discover information, speed has never mattered more. And optimizing for it might be simpler than you think.

The conventional wisdom? “Speed optimization is technical and complicated.” “It requires a developer.” “It’s not that big a deal anyway.” These myths spread because performance optimization is genuinely challenging. But dismissing it because it’s hard? That’s leaving lots of untapped revenue on the table.

Here’s what you need to know about the speed-SEO-AI connection-and how to get your site up to speed without having to reinvent yourself as a performance engineer.

Why Visitors Won’t Wait For Your Site To Load (And What It Costs You)

Let’s start with the basics. When’s the last time you waited patiently for a slow website to load? Exactly.

slow-website

Google’s research shows that as page load time increases from one second to three seconds, the probability of a visitor bouncing increases by 32%. Push that to five seconds, and bounce probability jumps to 90%.

Think about it. You’re spending money on ads, content, and SEO to get people to your site-and then losing nearly half of them before they see anything because your pages load too slowly.

For e-commerce, the stakes are even higher:

  • A site loading in 1 second has a conversion rate 5x higher than one loading in 5 seconds.
  • 79% of shoppers who experience performance issues say they won’t return to buy again.
  • Every 1-second delay reduces customer satisfaction by 16%.

A slow site isn’t just losing one sale. It’s potentially losing you customers for life.

Website Speeds That AI and Visitors Expect

Google stopped being subtle about this in 2020. With the introduction of Core Web Vitals, page speed became an official ranking factor. If your WordPress site meets these benchmarks, you’re signaling quality to Google. If it doesn’t, you’re handing competitors an advantage.

Here’s the challenge: only 50% of WordPress sites currently meet Google’s Core Web Vitals standards.

That means half of WordPress websites have room to improve-and an opportunity to gain ground on competitors who haven’t prioritized performance.

The key metric to watch is Largest Contentful Paint (LCP)-how qhttps://wp-rocket.me/blog/website-load-time-speed-statistics/uickly your main content loads. Google wants this under 2.5 seconds. Hit that target, and you’re in good standing.

What most site owners miss: speed improvements compound. Better Core Web Vitals leads to better rankings, which leads to more traffic, which leads to more conversions. The sites that optimize first capture that momentum.

The AI Visibility Advantage: Why Speed Matters More Than Ever

Here’s where it gets really interesting-and where early movers have an edge.

The rise of AI-powered search tools like ChatGPT, Perplexity, and Google’s AI Overviews is fundamentally changing how people discover information. And here’s what most haven’t realized yet: page speed influences AI visibility too.

A recent study by SE Ranking analyzed 129,000 domains across over 216,000 pages to identify what factors influence ChatGPT citations. The findings on page speed were striking:

  • Fast pages (FCP under 0.4 seconds): averaged 6.7 citations from ChatGPT
  • Slow pages (FCP over 1.13 seconds): averaged just 2.1 citations

That’s a threefold difference in AI visibility based largely on how fast your pages load.

Why does this matter? Because 50% of consumers use AI-powered search today in purchase decisions. Sites that load fast are more likely to be cited, recommended, and discovered by a growing audience that starts their search with AI.

The opportunity: Speed optimization now serves double duty-it boosts your traditional SEO and positions you for visibility in an AI-first search landscape.

How To Improve Page Speed Metrics & Increase AI Citations

Speed, SEO, and AI visibility are now deeply connected.

Every day your site underperforms, you’re missing opportunities.

Your Page Speed Optimization Roadmap

Here’s your action plan:

  1. Audit your current speed.
  2. Identify the bottlenecks.
  3. Implement a comprehensive solution. Rather than patching issues one plugin at a time, use an all-in-one performance tool that addresses caching, code optimization, and media loading together.
  4. Monitor and maintain. Speed isn’t a one-time fix. Track your metrics regularly to ensure you’re maintaining performance as you add content and features.

Step 1: Audit Your Current Website Speed

To best identify where the source of your slow website lies and build a baseline to test against, you must perform a website speed test audit.

  1. Visit Google’s PageSpeed Insights tool.
  2. Compare your Core Web Vitals results scores to your industry’s CWV baseline.
  3. Identify which scores are lowest before moving to step 2.

Step 2: Identify Your Page Speed Bottlenecks

Is it unoptimized images? Render-blocking JavaScript? Too many plugins? Understanding the issue helps you choose the right solution.

In fact, this is where most of your competitors drop the ball, allowing you to pick it up and outperform their websites on SERPs. For business owners focused on running their company, this often falls to the bottom of the priority list.

Why? Because traditional website speed optimization involves a daunting technical website testing checklist that includes, but isn’t limited to:

  • Implementing caching
  • Minifying CSS and JavaScript files
  • Lazy loading images and videos
  • Removing unused CSS
  • Delaying JavaScript execution
  • Optimizing your database
  • Configuring a CDN

Step 3: Implement Fixes & Best Practices

From here, each potential cause of a slow website and low CWV scores can be fixed:

The Easy Way: Use The WP Rocket Performance Plugin

Time To Implement: 3 minutes | Download WP Rocket

Rather than piecing together multiple plugins and manually tweaking settings, you get an all-in-one approach that handles the heavy lifting automatically. This is where purpose-built performance technology can change the game.

The endgame is to remove the complexity from WordPress optimization:

  • Instant results. For example, upon activation, WP Rocket implements 80% of web performance best practices without requiring any configuration. Page caching, GZIP compression, CSS and JS minification, and browser caching are just a few of the many optimizations that run in the background for you.
  • No coding required. Advanced features such as lazy-loading images, removing unused CSS, and delaying JavaScript are available via simple toggles.
  • Built-in compatibility. It’s designed to work with popular themes, plugins, page builders, and WooCommerce.
  • Performance tracking included. Built-in tool lets you monitor your speed improvements and Core Web Vitals scores without leaving your dashboard.

The goal isn’t to become a performance expert. It’s to have a fast website that supports your business objectives. When optimization happens in the background, you’re free to focus on what you actually do best.

For many, shifting tactics can cause confusion and unnecessary complexity. Utilizing the right technology makes implementing them so much easier and ensures you maximize AI visibility and website revenue.

A three-minute fix can make a huge difference to how your WordPress site performs.

Ready to get your site up to speed?

optimize-site-speed-with-wp-rocke

Image Credits

Featured Image: Image by WP Media. Used with permission.

In-Post Images: Image by WP Media. Used with permission.

Google’s Core Updates, Explained

Google released another Core Update to its search algorithm over the holidays. It was the most comprehensive update of 2025.

Google changes its algorithm frequently. Some are more widespread than others. Unlike Spam Updates, Core Updates generally do not penalize but, instead, alter how the algorithm treats certain queries and their intent.

For example, a Core Update may result in more “best of” listings (rather than product categories) in search results. Ecommerce sites may lose traffic, but not because of anything they’ve done, so no fix is required.

Yet a Core Update may result in higher rankings for certain types of content, which could prompt merchants to add those pages.

Core Updates can elevate a wide range of queries. The recent holiday update lowered the listings of large publishers and elevated niche sites. Search Engine Journal reported that Macy’s rankings decreased, while those of Columbia, The North Face, and Fragrance Market increased.

Content helpfulness

Google’s infamous Helpful Content algorithm is now part of its Core Updates and can, in theory, target an “unhelpful” site.

Google provides guidelines to human evaluators for what makes content helpful. It’s the best indicator for search optimizers as to Google’s definition of that term. To paraphrase from the guidelines:

  • Websites should place the most useful portions at the top of a page.
  • The amount of effort, originality, and skill determines the quality of the content.
  • Avoid unnecessary fluff or “filler” content that obscures what visitors are looking for.
  • Use clear titles and headings that inform, not oversell.

If a Core Update resulted in lost traffic, scrutinize your content helpfulness and on-page engagement.

How to recover

It’s often difficult to know why a Core Update lowered a site’s rankings. To diagnose, I typically start with the helpfulness of its pages and its overall engagement.

The first step is always to identify what was lost. Search Console will reveal the impacted queries:

  • Go to the full “Performance” report.
  • Choose “Compare” in the “More” filter.
  • Choose “Custom” and set start and end dates to expose the week before the change (early December for the most recent update) and the week after (beginning of January). Click “Apply.”
  • Sort the ensuing “Queries” column and the “Clicks Difference” column to see queries that now generate fewer clicks.
Screenshot of Search Console page for customizing a date range.

Select a before and after date range in Search Console to identify queries that generate fewer organic clicks.

Next, manually search Google for each affected query to determine if results shifted broadly or only for your page. The appearance of many new listings that answer a query in a new way may indicate a broad shift.

Semrush provides monthly snapshots of ranking URLs for each query. Refer to its archive to see how your overall SERPs have changed. If you see a widespread shift (i.e., 80% of listings are new for a given query), there is likely no fix needed. It’s Google changing its algorithm.

If only your site is downranked, most definitely look at the impacted pages and how to make them more helpful and engaging, such as:

  • Move the main portion, such as a quick answer to a search query, to the top.
  • Improve page structure and subheadings.
  • Remove ads, such as intrusive pop-ups, that block users from interacting with a page.
  • Add jump-to links that help visitors navigate the page.
  • Include social proof on the page.
  • Show the author’s name and bio.
  • Link to trusted sources.
  • Add helpful images and videos.
  • Update the page with recent data, trends, and stats (with sources).
  • Add explanatory sections, such as FAQs and definitions, tailored to the page’s purpose.

Helpfulness is subjective and vague. Nonetheless, consider your target audience and tailor your content accordingly.

Google announces only substantial Core Updates, those that affect many users. Lesser, unannounced updates occur more often and can result in recoveries.

Google On Phantom Noindex Errors In Search Console via @sejournal, @martinibuster

Google’s John Mueller recently answered a question about phantom noindex errors reported in Google Search Console. Mueller asserted that these reports may be real.

Noindex In Google Search Console

A noindex robots directive is one of the few commands that Google must obey, one of the few ways that a site owner can exercise control over Googlebot, Google’s indexer.

And yet it’s not totally uncommon for search console to report being unable to index a page because of a noindex directive that seemingly does not have a noindex directive on it, at least none that is visible in the HTML code.

When Google Search Console (GSC) reports “Submitted URL marked ‘noindex’,” it is reporting a seemingly contradictory situation:

  • The site asked Google to index the page via an entry in a Sitemap.
  • The page sent Google a signal not to index it (via a noindex directive).

It’s a confusing message from Search Console that a page is preventing Google from indexing it when that’s not something the publisher or SEO can observe is happening at the code level.

The person asking the question posted on Bluesky:

“For the past 4 months, the website has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to disappear from Search Console. There is no noindex anywhere on the website nor robots.txt. We’ve already looked into this… What could be causing this error?”

Noindex Shows Only For Google

Google’s John Mueller answered the question, sharing that there were always a noindex showing to Google on the pages he’s examined where this kind of thing was happening.

Mueller responded:

“The cases I’ve seen in the past were where there was actually a noindex, just sometimes only shown to Google (which can still be very hard to debug). That said, feel free to DM me some example URLs.”

While Mueller didn’t elaborate on what can be going on, there are ways to troubleshoot this issue to find out what’s going on.

How To Troubleshoot Phantom Noindex Errors

It’s possible that there is a code somewhere that is causing a noindex to show just for Google. For example, it may have happened that a page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that time, which in turn would cause the old noindex header to be shown to Googlebot (because it frequently visits the site) while serving a fresh version to the site owner.

Checking the HTTP Header is easy, there are many HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.

A 520 server header response code is one that’s sent by Cloudflare when it’s blocking a user agent.

Screenshot: 520 Cloudflare Response Code

Screenshot showing a 520 error response code

Below is a screenshot of a 200 server response code generated by cloudflare:

Screenshot: 200 Server Response Code

I checked the same URL using two different header checkers, with one header checker returning a a 520 (blocked) server response code and the other header checker sending a 200 (OK) response code. That shows how differently Cloudflare can respond to something like a header checker. Ideally, try checking with several header checkers to see if there’s a consistent 520 response from Cloudflare.

In the situation where a web page is showing something exclusively to Google that is otherwise not visible to someone looking at the code, what you need to do is to get Google to look at the page for you using an actual Google crawler and from a Google IP address. The way to do this is by dropping the URL into Google’s Rich Results Test. Google will dispatch a crawler from a Google IP address and if there’s something on the server (or a CDN) that’s showing a noindex, this will catch it. In addition to the structured data, the Rich Results test will also provide the HTTP response and a snapshot of the web page showing exactly what the server shows to Google.

When you run a URL through the Google Rich Results Test, the request:

  • Originates from Google’s Data Centers: The bot uses an actual Google IP address.
  • Passes Reverse DNS Checks: If the server, security plugin, or CDN checks the IP, it will resolve back to googlebot.com or google.com.

If the page is blocked by noindex, the tool will be unable to provide any structured data results. It should provide a status saying “Page not eligible” or “Crawl failed”. If you see that, click a link for “View Details” or expand the error section. It should show something like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.

This approach does not send the GoogleBot user agent, it uses the Google-InspectionTool/1.0 user agent string. That means if the server block is by IP address then this method will catch it.

Another angle to check is for the situation where a rogue noindex tag is specifically written to block GoogleBot, you can still spoof (mimic) the GoogleBot user agent string with Google’s own User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to identify itself with the GoogleBot user agent and that should catch it.

Screenshot: Chrome User Agent Switcher

Phantom Noindex Errors In Search Console

These kinds of errors can feel like a pain to diagnose but before you throw your hands up in the air take some time to see if any of the steps outlined here will help identify the hidden reason that’s responsible for this issue.

Featured Image by Shutterstock/AYO Production