OpenAI Search Crawler Passes 55% Coverage In Hostinger Study via @sejournal, @MattGSouthern

Hostinger analyzed 66 billion bot requests across more than 5 million websites and found that AI crawlers are following two different paths.

LLM training bots are losing access to the web as more sites block them. Meanwhile, AI assistant bots that power search tools like ChatGPT are expanding their reach.

The analysis draws on anonymized server logs from three 6-day windows, with bot classification mapped to AI.txt project classifications.

Training Bots Are Getting Blocked

The starkest finding involves OpenAI’s GPTBot, which collects data for model training. Its website coverage dropped from 84% to 12% over the study period.

Meta’s ExternalAgent was the largest training-category crawler by request volume in Hostinger’s data. Hostinger says this training-bot group shows the strongest declines overall, driven in part by sites blocking AI training crawlers.

These numbers align with patterns I’ve tracked through multiple studies. BuzzStream found that 79% of top news publishers now block at least one training bot. Cloudflare’s Year in Review showed GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives across top domains.

The data quantifies what those studies suggested. Hostinger interprets the drop in training-bot coverage as a sign that more sites are blocking those crawlers, even when request volumes remain high.

Assistant Bots Tell a Different Story

While training bots face resistance, the bots that power AI search tools are expanding access.

OpenAI’s OAI-SearchBot, which fetches content for ChatGPT’s search feature, reached 55.67% average coverage. TikTok’s bot grew to 25.67% coverage with 1.4 billion requests. Apple’s bot reached 24.33% coverage.

These assistant crawls are user-triggered and more targeted. They serve users directly rather than collecting training data, which may explain why sites treat them differently.

Classic Search Remains Stable

Traditional search engine crawlers held steady throughout the study. Googlebot maintained 72% average coverage with 14.7 billion requests. Bingbot stayed at 57.67% coverage.

The stability contrasts with changes in the AI category. Google’s main crawler faces a unique position since blocking it affects search visibility.

SEO Tools Show Decline

SEO and marketing crawlers saw declining coverage. Ahrefs maintained the largest footprint at 60% coverage, but the category overall shrank. Hostinger attributes this to two factors. These tools increasingly focus on sites actively doing SEO work. And website owners are blocking resource-intensive crawlers.

I reported on the resource concerns when Vercel data showed GPTBot generating 569 million requests in a single month. For some publishers, the bandwidth costs became a business problem.

Why This Matters

The data confirms a pattern that’s been building over the past year. Site operators are drawing a line between AI crawlers they’ll allow and those they won’t.

The decision comes down to function. Training bots collect content to improve models without sending traffic back. Assistant bots fetch content to answer specific user questions, which means they can surface your content in AI search results.

Hostinger suggests a middle path: block training bots while allowing assistant bots that drive discovery. This lets you participate in AI search without contributing to model training.

Looking Ahead

OpenAI recommends allowing OAI-SearchBot if you want your site to appear in ChatGPT search results, even if you block GPTBot.

OpenAI’s documentation clarifies the difference. OAI-SearchBot controls inclusion in ChatGPT search results and respects robots.txt. ChatGPT-User handles user-initiated browsing and may not be governed by robots.txt in the same way.

Hostinger recommends checking server logs to see what’s actually hitting your site, then making blocking decisions based on your goals. If you’re concerned about server load, you can use CDN-level blocking. If you want to potentially increase your AI visibility, review current AI crawler user agents and allow only the specific bots that support your strategy.


Featured Image: BestForBest/Shutterstock

Perplexity AI Interview Explains How AI Search Works via @sejournal, @martinibuster

I recently spoke with Jesse Dwyer of Perplexity about SEO and AI search about what SEOs should be focusing on in terms of optimizing for AI search. His answers offered useful feedback about what publishers and SEOs should be focusing on right now.

AI Search Today

An important takeaway that Jesse shared is that personalization is completely changing

“I’d have to say the biggest/simplest thing to remember about AEO vs SEO is it’s no longer a zero sum game. Two people with the same query can get a different answer on commercial search, if the AI tool they’re using loads personal memory into the context window (Perplexity, ChatGPT).

A lot of this comes down to the technology of the index (why there actually is a difference between GEO and AEO). But yes, it is currently accurate to say (most) traditional SEO best practices still apply.”

The takeaway from Dwyer’s response is that search visibility is no longer about a single consistent search result. Personal context as a role in AI answers means that two users can receive significantly different answers to the same query with possibly different underlying content sources.

While the underlying infrastructure is still a classic search index, SEO still plays a role in determining whether content is eligible to be retrieved at all. Perplexity AI is said to use a form of PageRank, which is a link-based method of determining the popularity and relevance of websites, so that provides a hint about some of what SEOs should be focusing on.

However, as you’ll see, what is retrieved is vastly different than in classic search.

I followed up with the following question:

So what you’re saying (and correct me if I’m wrong or slightly off) is that Classic Search tends to reliably show the same ten sites for a given query. But for AI search, because of the contextual nature of AI conversations, they’re more likely to provide a different answer for each user.

Jesse answered:

“That’s accurate yes.”

Sub-document Processing: Why AI Search Is Different

Jesse continued his answer by talking about what goes on behind the scenes to generate an answer in AI search.

He continued:

“As for the index technology, the biggest difference in AI search right now comes down to whole-document vs. “sub-document” processing.

Traditional search engines index at the whole document level. They look at a webpage, score it, and file it.

When you use an AI tool built on this architecture (like ChatGPT web search), it essentially performs a classic search, grabs the top 10–50 documents, then asks the LLM to generate a summary. That’s why GPT search gets described as “4 Bing searches in a trenchcoat” —the joke is directionally accurate, because the model is generating an output based on standard search results.

This is why we call the optimization strategy for this GEO (Generative Engine Optimization). That whole-document search is essentially still algorithmic search, not AI, since the data in the index is all the normal page scoring we’re used to in SEO. The AI-first approach is known as “sub-document processing.”

Instead of indexing whole pages, the engine indexes specific, granular snippets (not to be confused with what SEO’s know as “featured snippets”). A snippet, in AI parlance, is about 5-7 tokens, or 2-4 words, except the text has been converted into numbers, (by the fundamental AI process known as a “transformer”, which is the T in GPT). When you query a sub-document system, it doesn’t retrieve 50 documents; it retrieves about 130,000 tokens of the most relevant snippets (about 26K snippets) to feed the AI.

Those numbers aren’t precise, though. The actual number of snippets always equals a total number of tokens that matches the full capacity of the specific LLM’s context window. (Currently they average about 130K tokens). The goal is to completely fill the AI model’s context window with the most relevant information, because when you saturate that window, you leave the model no room to ‘hallucinate’ or make things up.

In other words, it stops being a creative generator and delivers a more accurate answer. This sub-document method is where the industry is moving, and why it is more accurate to be called AEO (Answer Engine Optimization).

Obviously this description is a bit of an oversimplification. But the personal context that makes each search no longer a universal result for every user is because the LLM can take everything it knows about the searcher and use that to help fill out the full context window. Which is a lot more info than a Google user profile.

The competitive differentiation of a company like Perplexity, or any other AI search company that moves to sub-document processing, takes place in the technology between the index and the 26K snippets. With techniques like modulating compute, query reformulation, and proprietary models that run across the index itself, we can get those snippets to be more relevant to the query, which is the biggest lever for getting a better, richer answer.

Btw, this is less relevant to SEO’s, but this whole concept is also why Perplexity’s search API is so legit. For devs building search into any product, the difference is night and day.”

Dwyer contrasts two fundamentally different indexing and retrieval approaches:

  • Whole-document indexing, where pages are retrieved and ranked as complete units.
  • Sub-document indexing, where meaning is stored and retrieved as granular fragments.

In the first version, AI sits on top of traditional search and summarizes ranked pages. In the second, the AI system retrieves fragments directly and never reasons over full documents at all.

He also described that answer quality is constrained by context-window saturation, that accuracy emerges from filling the model’s entire context window with relevant fragments. When retrieval succeeds at saturating that window, the model has little capacity to invent facts or hallucinate.

Lastly, he says that “modulating compute, query reformulation, and proprietary models” is part of their secret sauce for retrieving snippets that are highly relevant to the search query.

Featured Image by Shutterstock/Summit Art Creations

Why Agentic AI May Flatten Brand Differentiators via @sejournal, @martinibuster

James LePage, Dir Engineering AI, co-lead of the WordPress AI Team, described the future of the Agentic AI Web, where websites become interactive interfaces and data sources and the value add that any site brings to their site becomes flattened. Although he describes a way out of brand and voice getting flattened, the outcome for informational, service, and media sites may be “complex.”

Evolution To Autonomy

One of the points that LePage makes is that of agentic autonomy and how that will impact what it means to have an online presence. He maintains that humans will still be in the loop but at a higher and less granular level, where agentic AI interactions with websites are at the tree level dealing with the details and the humans are at the forest level dictating the outcome they’re looking for.

LePage writes:

“Instead of approving every action, users set guidelines and review outcomes.”

He sees agentic AI progressing on an evolutionary course toward greater freedom with less external control, also known as autonomy. This evolution is in three stages.

He describes the three levels of autonomy:

  1. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions.
  2. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority.
  3. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.”

AI Agents May Turn Sites Into Data Sources

LePage sees the web in terms of control, with Agentic AI experiences taking control of how the data is represented to the user. The user experience and branding is removed and the experience itself is refashioned by the AI Agent.

He writes:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user.

Your website becomes a data source rather than an experience.”

Does it sound problematic that websites will turn into data sources? As you’ll see in the next paragraph, LePage’s answer for that situation is to double down on interactions and personalization via AI, so that users can interact with the data in ways that are not possible with a static website.

These are important insights because they’re coming from the person who is the director of AI engineering at Automattic and co-leads the team in charge of coordinating AI integration within the WordPress core.

AI Will Redefine Website Interactions

LePage, who is the co-lead of WordPress’s AI Team, which coordinates AI-related contributions to the WordPress core, said that AI will enable websites to offer increasingly personalized and immersive experiences. Users will be able to interact with the website as a source of data refined and personalized for the individual’s goals, with website-side AI becoming the differentiator.

He explained:

“Humans who visit directly still want visual presentation. In fact, they’ll likely expect something more than just content now. AI actually unlocks this.

Sites can create more immersive and personalized experiences without needing a developer for every variation. Interactive data visualizations, product configurators, personalized content flows. The bar for what a “visit” should feel like is rising.

When AI handles the informational layer, the experiential layer becomes a differentiator.”

That’s an important point right there because it means that if AI can deliver the information anywhere (in an agent user interface, an AI generated comparison tool, a synthesized interactive application), then information alone stops separating you from everyone else.

In this kind of future, what becomes the differentiator, your value add, is the website experience itself.

How AI Agents May Negatively Impact Websites

LePage says that Agentic AI is a good fit for commercial websites because they are able to do comparisons and price checks and zip through the checkout. He says that it’s a different story for informational sites, calling it “more complex.”

Regarding the phrase “more complex,” I think that’s a euphemism that engineers use instead of what they really mean: “You’re probably screwed.”

Judge for yourself. Here’s how LePage explains websites lose control over the user experience:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user. Your website becomes a data source rather than an experience.

For media and services, it’s more complex. Your brand, your voice, your perspective, the things that differentiate you from competitors, these get flattened when an agent summarizes your content alongside everyone else’s.”

For informational websites, the website experience can be the value add but that advantage is eliminated by Agentic AI and unlike with ecommerce transactions where sales are the value exchange, there is zero value exchange since nobody is clicking on ads, much less viewing them.

Alternative To Flattened Branding

LePage goes on to present an alternative to brand flattening by imagining a scenario where websites themselves wield AI Agents so that users can interact with the information in ways that are helpful, engaging, and useful. This is an interesting thought because it represents what may be the biggest evolutionary step in website presence since responsive design made websites engaging regardless of device and browser.

He explains how this new paradigm may work:

“If agents are going to represent you to users, you might need your own agent to represent you to them.

Instead of just exposing static content and hoping the visiting agent interprets it well, the site could present a delegate of its own. Something that understands your content, your capabilities, your constraints, and your preferences. Something that can interact with the visiting agent, answer its questions, present information in the most effective way, and even negotiate.

The web evolves from a collection of static documents to a network of interacting agents, each representing the interests of their principal. The visiting agent represents the user. The site agent represents the entity. They communicate, they exchange information, they reach outcomes.

This isn’t science fiction. The protocols are being built. MCP is now under the Linux Foundation with support from Anthropic, OpenAI, Google, Microsoft, and others. Agent2Agent is being developed for agent-to-agent communication. The infrastructure for this kind of web is emerging.”

What do you think about the part where a site’s AI agent talks to a visitor’s AI agent and communicates “your capabilities, your constraints, and your preferences,” as well as how your information will be presented? There might be something here, and depending on how this is worked out, it may be something that benefits publishers and keeps them from becoming just a data source.

AI Agents May Force A Decision: Adaptation Versus Obsolescence

LePage insists that publishers, which he calls entities, that evolve along with the Agentic AI revolution will be the ones that will be able to have the most effective agent-to-agent interactions, while those that stay behind will become data waiting to be scraped .

He paints a bleak future for sites that decline to move forward with agent-to-agent interactions:

“The ones that don’t will still exist on the web. But they’ll be data to be scraped rather than participants in the conversation.”

What LePage describes is a future in which product and professional service sites can extract value from agent-to-agent interactions. But the same is not necessarily true for informational sites that users depend on for expert reviews, opinions, and news. The future for them looks “complex.”

Head Of WordPress AI Team Explains SEO For AI Agents via @sejournal, @martinibuster

James LePage, Director Engineering AI at Automattic, and the co-lead of the WordPress AI Team, shared his insights into things publishers should be thinking about in terms of SEO. He’s the founder and co-lead of the WordPress Core AI Team, which is tasked with coordinating AI-related projects within WordPress, including how AI agents will interact within the WordPress ecosystem. He shared insights into what’s coming to the web in the context of AI agents and some of the implications for SEO.

AI Agents And Infrastructure

The first observation that he made was that AI agents will use the same web infrastructure as search engines. The main point he makes is that the data that the agents are using comes from the regular classic search indexes.

He writes, somewhat provocatively:

“Agents will use the same infrastructure the web already has.

  • Search to discover relevant entities.
  • “Domain authority” and trust signals to evaluate sources.
  • Links to traverse between entities.
  • Content to understand what each entity offers.

I find it interesting how much money is flowing into AIO and GEO startups when the underlying way agents retrieve information is by using existing search indexes. ChatGPT uses Bing. Anthropic uses Brave. Google uses Google. The mechanics of the web don’t change. What changes is who’s doing the traversing.”

AI SEO = Longtail Optimization

LePage also said that schema structured data, semantic density, and interlinking between pages is essential for optimizing for AI agents. Notable is that he said that AI optimization that AIO and GEO companies are doing is just basic longtail query optimization.

He explained:

“AI intermediaries doing synthesis need structured, accessible content. Clear schemas, semantic density, good interlinking. This is the challenge most publishers are grappling with now. In fact there’s a bit of FUD in this industry. Billions of dollars flowing into AIO and GEO when much of what AI optimization really is is simply long-tail keyword search optimization.”

What Optimized Content Looks Like For AI Agents

LePage, who is involved in AI within the WordPress ecosystem, said that content should be organized in an “intentional” manner for agent consumption, by which he means structured markdown, semantic markup, and content that’s easy to understand.

A little further he explains what he believes content should look like for AI agent consumption:

“Presentations of content that prioritize what matters most. Rankings that signal which information is authoritative versus supplementary. Representations that progressively disclose detail, giving agents the summary first with clear paths to depth. All of this still static, not conversational, not dynamic, but shaped with agent traversal in mind.

Think of it as the difference between a pile of documents and a well-organized briefing. Both contain the same information. One is far more useful to someone trying to quickly understand what you offer.”

A little later in the article he offers a seemingly contradictory prediction of the role of content in an agentic AI future, reversing today’s formula of a well organized briefing over a pile of documents, saying that agentic AI will not need a website, just the content, a pile of documents.

Nevertheless, he recommends that content have structure so that the information is well organized at the page level with clear hierarchical structure and at the site level as well where interlinking makes the relationships between documents clearer. He emphasizes that the content must communicate what it’s for.

He then adds that in the future websites will have AI agents that communicate with external AI agents, which gets into the paradigm he mentioned of content being split off from the website so that the data can be displayed in ways that make sense for a user, completely separated from today’s concept of visiting a website.

He writes:

“Think of this as a progression. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.

The progression is toward more autonomy, but that doesn’t mean humans disappear from the loop. It means the loop gets wider. Instead of approving every action, users set guidelines and review outcomes.

…Before full site delegates exist, there’s a middle ground that matters right now.

The content an agent has access to can be presented in a way that makes sense for how agents work today. Currently, that means structured markdown, clean semantic markup, content that’s easy to parse and understand. But even within static content, there’s room to be intentional about how information is organized for agent consumption.”

His article, titled Agents & The New Internet (3/5), provides useful ideas of how to prepare for the agentic AI future.

Featured Image by Shutterstock/Blessed Stock

Google’s Mueller: Free Subdomain Hosting Makes SEO Harder via @sejournal, @MattGSouthern

Google’s John Mueller warns that free subdomain hosting services create unnecessary SEO challenges, even for sites doing everything else right.

The advice came in response to a Reddit post from a publisher whose site shows up in Google but doesn’t appear in normal search results, despite using Digitalplat Domains, a free subdomain service on the Public Suffix List.

What’s Happening

Mueller told the site owner that they likely aren’t making technical mistakes. The problem is the environment they chose to publish in.

He wrote:

“A free subdomain hosting service attracts a lot of spam & low-effort content. It’s a lot of work to maintain a high quality bar for a website, which is hard to qualify if nobody’s getting paid to do that.”

The issue comes down to association. Sites on free hosting platforms share infrastructure with whatever else gets published there. Search engines struggle to differentiate quality content from the noise surrounding it.

Mueller added:

“For you, this means you’re basically opening up shop on a site that’s filled with – potentially – problematic ‘flatmates’. This makes it harder for search engines & co to understand the overall value of the site – is it just like the others, or does it stand out in a positive way?”

He also cautioned against cheap TLDs for similar reasons. The same dynamics apply when entire domain extensions become overrun with low-quality content.

Beyond domain choice, Mueller pointed to content competition as a factor. The site in question publishes on a topic already covered extensively by established publishers with years of work behind them.

“You’re publishing content on a topic that’s already been extremely well covered. There are sooo many sites out there which offer similar things. Why should search engines show yours?”

Why This Matters

Mueller’s advice here fits a pattern I’ve covered repeatedly over the years. Previously, Google’s Gary Illyes warned against cheap TLDs for the same reason. Illyes put it bluntly at the time, telling publishers that when a TLD is overrun by spam, search engines might not want to pick up sitemaps from those domains.

The free subdomain situation creates a unique problem. While the Public Suffix List theoretically tells Google to treat these subdomains as separate sites, the neighborhood signal remains strong. If the vast majority of subdomains on that host are spam, Google’s systems may struggle to identify your site as the one diamond in the rough.

This matters for anyone considering free hosting as a way to test an idea before investing in a real domain. The test environment itself becomes the test. Search engines evaluate your site in the context of everything else published under that same domain.

The competitive angle also deserves attention. New sites on well-covered topics face a high bar regardless of domain choice. Mueller’s point about established publishers having years of work behind them is a reality check about where the effort needs to go.

Looking Ahead

Mueller suggested that search visibility shouldn’t be the first priority for new publishers.

“If you love making pages with content like this, and if you’re sure that it hits what other people are looking for, then I’d let others know about your site, and build up a community around it directly. Being visible in popular search results is not the first step to becoming a useful & popular web presence, and of course not all sites need to be popular.”

For publishers starting out, focus on building direct traffic through promotion and community engagement. Search visibility tends to follow after a site establishes itself through other channels.


Featured Image: Jozef Micic/Shutterstock

Google On Phantom Noindex Errors In Search Console via @sejournal, @martinibuster

Google’s John Mueller recently answered a question about phantom noindex errors reported in Google Search Console. Mueller asserted that these reports may be real.

Noindex In Google Search Console

A noindex robots directive is one of the few commands that Google must obey, one of the few ways that a site owner can exercise control over Googlebot, Google’s indexer.

And yet it’s not totally uncommon for search console to report being unable to index a page because of a noindex directive that seemingly does not have a noindex directive on it, at least none that is visible in the HTML code.

When Google Search Console (GSC) reports “Submitted URL marked ‘noindex’,” it is reporting a seemingly contradictory situation:

  • The site asked Google to index the page via an entry in a Sitemap.
  • The page sent Google a signal not to index it (via a noindex directive).

It’s a confusing message from Search Console that a page is preventing Google from indexing it when that’s not something the publisher or SEO can observe is happening at the code level.

The person asking the question posted on Bluesky:

“For the past 4 months, the website has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to disappear from Search Console. There is no noindex anywhere on the website nor robots.txt. We’ve already looked into this… What could be causing this error?”

Noindex Shows Only For Google

Google’s John Mueller answered the question, sharing that there were always a noindex showing to Google on the pages he’s examined where this kind of thing was happening.

Mueller responded:

“The cases I’ve seen in the past were where there was actually a noindex, just sometimes only shown to Google (which can still be very hard to debug). That said, feel free to DM me some example URLs.”

While Mueller didn’t elaborate on what can be going on, there are ways to troubleshoot this issue to find out what’s going on.

How To Troubleshoot Phantom Noindex Errors

It’s possible that there is a code somewhere that is causing a noindex to show just for Google. For example, it may have happened that a page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that time, which in turn would cause the old noindex header to be shown to Googlebot (because it frequently visits the site) while serving a fresh version to the site owner.

Checking the HTTP Header is easy, there are many HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.

A 520 server header response code is one that’s sent by Cloudflare when it’s blocking a user agent.

Screenshot: 520 Cloudflare Response Code

Screenshot showing a 520 error response code

Below is a screenshot of a 200 server response code generated by cloudflare:

Screenshot: 200 Server Response Code

I checked the same URL using two different header checkers, with one header checker returning a a 520 (blocked) server response code and the other header checker sending a 200 (OK) response code. That shows how differently Cloudflare can respond to something like a header checker. Ideally, try checking with several header checkers to see if there’s a consistent 520 response from Cloudflare.

In the situation where a web page is showing something exclusively to Google that is otherwise not visible to someone looking at the code, what you need to do is to get Google to look at the page for you using an actual Google crawler and from a Google IP address. The way to do this is by dropping the URL into Google’s Rich Results Test. Google will dispatch a crawler from a Google IP address and if there’s something on the server (or a CDN) that’s showing a noindex, this will catch it. In addition to the structured data, the Rich Results test will also provide the HTTP response and a snapshot of the web page showing exactly what the server shows to Google.

When you run a URL through the Google Rich Results Test, the request:

  • Originates from Google’s Data Centers: The bot uses an actual Google IP address.
  • Passes Reverse DNS Checks: If the server, security plugin, or CDN checks the IP, it will resolve back to googlebot.com or google.com.

If the page is blocked by noindex, the tool will be unable to provide any structured data results. It should provide a status saying “Page not eligible” or “Crawl failed”. If you see that, click a link for “View Details” or expand the error section. It should show something like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.

This approach does not send the GoogleBot user agent, it uses the Google-InspectionTool/1.0 user agent string. That means if the server block is by IP address then this method will catch it.

Another angle to check is for the situation where a rogue noindex tag is specifically written to block GoogleBot, you can still spoof (mimic) the GoogleBot user agent string with Google’s own User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to identify itself with the GoogleBot user agent and that should catch it.

Screenshot: Chrome User Agent Switcher

Phantom Noindex Errors In Search Console

These kinds of errors can feel like a pain to diagnose but before you throw your hands up in the air take some time to see if any of the steps outlined here will help identify the hidden reason that’s responsible for this issue.

Featured Image by Shutterstock/AYO Production

ChatGPT To Begin Testing Ads In The United States via @sejournal, @brookeosmundson

Just today, OpenAI confirmed it will begin testing advertising in the United States for ChatGPT Free and ChatGPT Go users in the coming weeks, marking the first time ads will appear inside the ChatGPT experience.

The test coincides with the U.S. launch of ChatGPT Go, a low-cost subscription tier priced at $8 per month that has been available internationally since August.

The details reveal a cautious approach, with clear limits on where ads can appear, who will see them, and how they will be separated from ChatGPT’s responses.

Here’s what OpenAI shared, how the tests will work, and why this shift matters for users and advertisers alike.

What OpenAI Is Testing

ChatGPT ads are not being introduced as part of a broader redesign or monetization overhaul. Instead, OpenAI is framing this as a limited test, with narrow placement rules and clear separation from ChatGPT’s core function.

Ads will appear at the bottom of a response, only when there is a relevant sponsored product or service tied to the active conversation. They will be clearly labeled, visually distinct from organic answers, and dismissible.

Users will also be able to see why a particular ad is being shown and turn off ad personalization entirely if they choose.

Just as important is where ads will not appear.

OpenAI stated that ads will not be shown to users under 18 and will not be eligible to run near sensitive or regulated topics, including health, mental health, and politics. Conversations will not be shared with advertisers, and user data will not be sold.

Timing Ad Testing with the Go Tier Launch

The timing of the announcement doesn’t seem accidental.

Alongside the ad testing plans, OpenAI confirmed that ChatGPT Go is now available in the United States.

Priced at $8 per month, Go sits between the free tier and higher-cost subscriptions, offering expanded access to messaging, image generation, file uploads, and memory.

Ads are positioned as a way to support both the free tier and Go users, allowing more people to use ChatGPT with fewer restrictions without forcing an upgrade.

At the same time, OpenAI made it clear that Pro, Business, and Enterprise subscriptions will remain ad-free, reinforcing that paid tiers are still the preferred path for users who want an uninterrupted experience.

Explaining the Guardrails of Early Ad Testing

OpenAI spent as much time explaining what ads will not do as what they will.

The company was explicit that advertising will not influence ChatGPT’s responses. Answers are optimized for usefulness, not commercial outcomes. There is no intent to optimize for time spent, engagement loops, or other metrics commonly associated with ad-driven platforms.

This is a notable departure from how advertising has historically been introduced elsewhere on the internet. Rather than retrofitting ads into an existing product and adjusting incentives later, OpenAI is attempting to define the rules up front.

Whether those rules hold over time is an open question. But the clarity of the initial framework suggests OpenAI understands the risk of getting this wrong.

What Early Ad Formats Tell Us

OpenAI shared two examples of the ad formats it plans to test inside ChatGPT.

In the first example, a ChatGPT response provides recipe ideas for a Mexican dinner party. Below the response, a sponsored product recommendation appears for a grocery item. The ad is clearly labeled and visually separated from the organic answer.

Image credit: openai.com

In the second example, ChatGPT responds to a conversation about traveling to Santa Fe, New Mexico. A sponsored lodging listing appears below the response, labeled as sponsored. The example also shows a follow-up chat screen, indicating that users can continue interacting with ChatGPT after seeing the ad.

Image credit: openai.com

In both examples, the ads appear at the bottom of ChatGPT’s responses and are presented as separate from the main answer. OpenAI stated that these formats are part of its initial ad testing and may change as testing progresses.

Why This Matters for Advertisers

This is not something advertisers can plan for just yet.

There are no announced buying models, no targeting details, no measurement framework, and no indication of when access might expand beyond testing. OpenAI has been clear that this is not an open marketplace at the moment.

Still, the implications are hard to ignore. Ads placed alongside high-intent, problem-solving conversations could eventually represent a different kind of discovery environment. One where usefulness matters more than volume, and where poor creative or loose targeting would feel immediately out of place.

If this becomes a real channel, it is unlikely to reward the same tactics that work in search or social today.

How Marketers Are Reacting So Far

Early industry reaction has been measured, not alarmist.

Most commentary acknowledges that advertising inside ChatGPT was inevitable at this scale.

Lily Ray stated her curiosity to “see how this change impacts user experience.”

Most people in the comments of her post are not shocked by this:

There is also skepticism, particularly around whether relevance can be maintained over time without pressure to expand inventory. That skepticism is warranted. History suggests that once ads work, the temptation to scale them follows.

For now, though, this feels less like an ad platform launch and more like OpenAI testing whether ads can exist inside a conversational interface without changing how people trust the product.

The Bigger Signal for AI Platforms

For users, OpenAI is expanding access while trying to preserve the trust that has made ChatGPT widely used. Introducing ads without blurring the line between answers and monetization sets a high bar, especially for a product people rely on for personal and professional tasks.

Outside of ChatGPT itself, this update shows how AI-first products may think about revenue differently than search or social networks. Ads are positioned as a way to support access, not as the product, with paid tiers remaining central.

OpenAI says it will adjust how ads appear based on user feedback once testing begins in the U.S.

For now, this is a limited test rather than a full advertising launch. Whether those boundaries hold will matter, not just for ChatGPT, but for how monetization inside conversational interfaces is expected to work.

WordPress Membership Plugin Flaw Exposes Sensitive Stripe Data via @sejournal, @martinibuster

An advisory was published about a vulnerability discovered in the Membership Plugin By StellarWP which exposes sensitive Stripe payment setup data on WordPress sites using the plugin. The flaw enables unauthenticated attackers to launch attacks and is rated 8.2 (High).

Membership Plugin By StellarWP

The Membership Plugin – Restrict Content By StellarWP is used by WordPress sites to manage paid and private content. It enables site owners to restrict access to pages, posts, or other resources so that only logged-in users or paying members can view them and manage what non-paying site visitors can see. The plugin is commonly deployed on membership and subscription-based sites.

Vulnerable to Unauthenticated Attackers

The Wordfence advisory states that the vulnerability can be exploited by unauthenticated attackers, meaning no login or WordPress user account is required to launch an attack. User permission roles do not factor into whether the issue can be triggered, and that’s what makes this particular vulnerability more dangerous because it’s easier to trigger.

What the Vulnerability Is

The issue stems from missing security checks related to Stripe payment handling. Specifically, the plugin failed to properly protect Stripe SetupIntent data.

A Stripe SetupIntent is used during checkout to collect and save a customer’s payment method for future use. Each SetupIntent includes a client_secret value that is intended to be shared during a checkout or account setup flow.

The official Wordfence advisory explains:

“The Membership Plugin – Restrict Content plugin for WordPress is vulnerable to Missing Authentication in all versions up to, and including, 3.2.16 via the ‘rcp_stripe_create_setup_intent_for_saved_card’ function due to missing capability check.

Additionally, the plugin does not check a user-controlled key, which makes it possible for unauthenticated attackers to leak Stripe SetupIntent client_secret values for any membership.”

According to Stripe’s official documentation, the Setup Intents API is used to set up a payment method for future charges without creating an immediate payment. A SetupIntent includes a client_secret. Stripe’s documentation states that client_secret values should not be stored, logged, or exposed to anyone other than the intended customer.

This is how Stripe’s documentation explains what the purpose is for the Setup Intents API:

“Use the Setup Intents API to set up a payment method for future payments. It’s similar to a payment, but no charge is created.

The goal is to have payment credentials saved and optimized for future payments, meaning the payment method is configured correctly for any scenario. When setting up a card, for example, it may be necessary to authenticate the customer or check the card’s validity with the customer’s bank. Stripe updates the SetupIntent object throughout that process.”

Stripe documentation also explains that client_secret values are used client-side to complete payment-related actions and are intended to be passed securely from the server to the browser. Stripe states that these values should not be stored, logged, or exposed to anyone other than the relevant customer.

This is how Stripe’s documentation explains the client_secret value:

“client_secret
The client secret of this Customer Session. Used on the client to set up secure access to the given customer.

The client secret can be used to provide access to customer from your frontend. It should not be stored, logged, or exposed to anyone other than the relevant customer. Make sure that you have TLS enabled on any page that includes the client secret.”

Because the plugin did not enforce the appropriate protections, Stripe SetupIntent client_secret values could be exposed.

What this means in real life is that Stripe payment setup data associated with memberships was accessible beyond its intended scope.

Affected Versions

The vulnerability affects all versions of the plugin up to and including version 3.2.16. Wordfence assigned the issue a CVSS score of 8.2, reflecting the sensitivity of the exposed data and the fact that no authentication is required to trigger the issue.

A score in this range indicates a high-severity vulnerability that can be exploited remotely without special access, increasing the importance of timely updates for sites that rely on the plugin for managing paid memberships or restricted content.

Patch Availability

The plugin has been updated with a patch and is available now. The issue was fixed in version 3.2.17 of the plugin. The update adds missing nonce and permission checks related to Stripe payment handling, addressing the conditions that allowed SetupIntent client_secret values to be exposed. A nonce is a temporary security token that ensures a specific action on a WordPress website was intentionally requested by the user and not by a malicious attacker.

The official Membership Plugin changelog responsibly discloses the updates:

“3.2.17
Security: Added nonce and permission checks for adding Stripe payment methods.
3.2.16
Security: Improved escaping and sanitization for [restrict] and [register_form] shortcode attributes.”

What Site Owners Should Do

Sites using Membership Plugin – Restrict Content should update to version 3.2.17 or newer.

Failure to update the plugin will leave the Stripe SetupIntent client_secret data exposed to unauthenticated attackers.

Featured Image by Shutterstock/file404

Google Health AI Overviews Cite YouTube More Than Any Hospital Site via @sejournal, @MattGSouthern

Google’s AI Overviews may be relying on YouTube more than official medical sources when answering health questions, according to new research from SEO platform SE Ranking.

The study analyzed 50,807 German-language health prompts and keywords, captured in a one-time snapshot from December using searches run from Berlin.

The report lands amid renewed scrutiny of health-related AI Overviews. Earlier this month, The Guardian published an investigation into misleading medical summaries appearing in Google Search. The outlet later reported Google had removed AI Overviews for some medical queries.

What The Study Measured

SE Ranking’s analysis focused on which sources Google’s AI Overviews cite for health-related queries. In that dataset, the company says AI Overviews appeared on more than 82% of health searches, making health one of the categories where users are most likely to see a generated summary instead of a list of links.

The report also cites consumer survey findings suggesting people increasingly treat AI answers as a substitute for traditional search, including in health. It cites figures including 55% of chatbot users trusting AI for health advice and 16% saying they’ve ignored a doctor’s advice because AI said otherwise.

YouTube Was The Most Cited Source

Across SE Ranking’s dataset, YouTube accounted for 4.43% of all AI Overview citations, or 20,621 citations out of 465,823.

The next most cited domains were ndr.de (14,158 citations, 3.04%) and MSD Manuals (9,711 citations, 2.08%), according to the report.

The authors argue that the ranking matters because YouTube is a general-purpose platform with a mixed pool of creators. Anyone can publish health content there, including licensed clinicians and hospitals, but also creators without medical training.

To check what the most visible YouTube citations looked like, SE Ranking reviewed the 25 most-cited YouTube videos in its dataset. It found 24 of the 25 came from medical-related channels, and 21 of the 25 clearly noted the content was created by a licensed or trusted source. It also warned that this set represents less than 1% of all YouTube links cited by AI Overviews.

Government & Academic Sources Were Rare

SE Ranking categorized citations into “more reliable” and “less reliable” groups based on the type of organization behind each source.

It reports that 34.45% of citations came from the more reliable group, while 65.55% came from sources “not designed to ensure medical accuracy or evidence-based standards.”

Within the same breakdown, academic research and medical journals accounted for 0.48% of citations, German government health institutions accounted for 0.39%, and international government institutions accounted for 0.35%.

AI Overview Citations Often Point To Different Pages Than Organic Search

The report compared AI Overview citations to organic rankings for the same prompts.

While SE Ranking found that 9 out of 10 domains overlapped between AI citations and frequent organic results, it says the specific URLs frequently diverged. Only 36% of AI-cited links appeared in Google’s top 10 organic results, 54% appeared in the top 20, and 74% appeared somewhere in the top 100.

The biggest domain-level exception in its comparison was YouTube. YouTube ranked first in AI citations but only 11th in organic results in its analysis, appearing 5,464 times as an organic link compared to 20,621 AI citations.

How This Connects To The Guardian Reporting

The SE Ranking report explicitly frames its work as broader than spot-checking individual responses.

“The Guardian investigation focused on specific examples of misleading advice. Our research shows a bigger problem,” the authors wrote, arguing that AI health answers in their dataset relied heavily on YouTube and other sites that may not be evidence-based.

Following The Guardian’s reporting, the outlet reported that Google removed AI Overviews for certain medical queries.

Google’s public response, as reported by The Guardian, emphasized ongoing quality work while also disputing aspects of the investigation’s conclusions.

Why This Matters

This report adds a concrete data point to a problem that’s been easier to talk about in the abstract.

I covered The Guardian’s investigation earlier this month, and it raised questions about accuracy in individual examples. SE Ranking’s research tries to show what the source mix looks like at scale.

Visibility in AI Overviews may depend on more than being the most prominent “best answer” in organic search. SE Ranking found many cited URLs didn’t match top-ranking pages for the same prompts.

The source mix also raises questions about what Google’s systems treat as “good enough” evidence for health summaries at scale. In this dataset, government and academic sources barely showed up compared to media platforms and a broad set of less reliability-focused sites.

That’s relevant beyond SEO. The Guardian reporting showed how high-stakes the failure modes can be, and Google’s pullback on some medical queries suggests the company is willing to disable certain summaries when the scrutiny gets intense.

Looking Ahead

SE Ranking’s findings are limited to German-language queries in Germany and reflect a one-time snapshot, which the authors acknowledge may vary over time, by region, and by query phrasing.

Even with that caveat, the combination of this source analysis and the recent Guardian investigation puts more focus on two open questions. The first is how Google weights authority versus platform-level prominence in health citations. The second is how quickly it can reduce exposure when specific medical query patterns draw criticism.


Featured Image: Yurii_Yarema/Shutterstock

All In One SEO WordPress Vulnerability Affects Over 3 Million Sites via @sejournal, @martinibuster

A security vulnerability was discovered in the popular All in One SEO (AIOSEO) WordPress plugin that made it possible for low-privileged users to access a site’s global AI access token, potentially allowing them to misuse the plugin’s artificial intelligence features and could allow attackers to generate content or consume credits using the affected site’s AIOSEO AI credits and AI features. The plugin is installed on more than 3 million WordPress websites, making the exposure significant.

All in One SEO WordPress Plugin (AIOSEO)

All in One SEO is one of the most widely used WordPress SEO plugins, installed in over 3 million websites. It helps site owners manage search engine optimization tasks such as generating metadata, creating XML sitemaps, adding structured data, and providing AI-powered tools that assist with writing titles, descriptions, blog posts, FAQs, social medial posts, and generate images.

Those AI features rely on a site-wide AI access token that allows the plugin to communicate with the AIOSEO external AI services.

Missing Capability Check

According to Wordfence, the vulnerability was caused by a missing permission check on a specific REST API endpoint used by the plugin which enabled users with contributor level access to view the global AI access token.

In the context of a WordPress website, an API (Application Programming Interface) is like a bridge between the WordPress website and different software applications (including external apps like AIOSEO’s AI content generator) that enable them to securely communicate and share data with one another. A REST endpoint is a URL that exposes an interface to functionality or data.

The flaw was in the following REST API endpoint:

/aioseo/v1/ai/credits

That endpoint is meant to return information about a site’s AI usage and remaining credits. However, it failed to verify whether the user making the request was actually allowed to see that data. AIOSEO’s plugin failed to do a capability check to verify whether someone logged in with a contributor level access can have access to that data.

Because of that, any logged-in user with Contributor-level access or higher could call the endpoint and retrieve the site’s global AI access token.

Wordfence describes the flaw like this:

“This makes it possible for authenticated attackers, with Contributor-level access and above, to disclose the global AI access token.”

The problem was that the implementation of the REST API endpoint did not do a permission check, which enabled someone with contributor level access to see sensitive data.

In WordPress, REST API routes are supposed to include capability checks that ensure only authorized users can access them. In this case, that check was missing, so the plugin treated Contributors the same as administrators when returning the AI token.

Why The Vulnerability Is Problematic

In WordPress, the Contributor level role is one of the lowest privilege levels. Many sites grant Contributor level access to multiple people so that they can submit article drafts for review and publication.

By exposing the global AI token to those users, the plugin may have effectively handed out a site-wide credential that controls access to its AI features. That token could be used to:

1. Unauthorized AI Usage
The token functions as a site wide credential that authorizes AI requests. If an attacker obtains it, they could potentially use it to generate AI content through the affected site’s account, consuming whatever credits or usage limits are associated with that token.

2. Service Depletion
An attacker could automate requests using the exposed token to exhaust the site’s available AI quota. That would prevent site administrators from using the AI features they rely on, effectively creating a denial of service for the plugin’s AI tools.

Even though the vulnerability does not allow direct code execution, leaking a site-wide API token still represents a possible billing risk.

Part Of A Broader Pattern Of Vulnerabilities

This is not the first time All In One SEO has shipped with vulnerabilities related to missing authorization or low-privilege access. According to Wordfence, the plugin has had six vulnerabilities disclosed in 2025 alone, many of which allowed Contributor or Subscriber level users to access or modify data they should not have been able to access.

Those issues included SQL injection, information disclosure, arbitrary media deletion, missing authorization checks, sensitive data exposure, and stored cross-site scripting. The recurring theme across those reports is improper permission enforcement for low-privilege users, the same underlying class of flaw that led to the AI token exposure in this case.

Six vulnerabilities in one year is a high level for an SEO plugin. Yoast SEO plugin had zero vulnerabilities in 2025, RankMath had four vulnerabilities in 2025 and Squirrly SEO had only three vulnerabilities in 2025.

Screenshot Of Six AIOSEO Vulnerabilities In 2025

How The Vulnerability Was Fixed

The vulnerability affects all versions of All in One SEO up to and including 4.9.2. It was addressed in version 4.9.3, which included a security update described in the official plugin changelog by the plugin developers as:

“Hardened API routes to prevent AI access token from being exposed.”

That change corresponds directly to the REST API flaw identified by Wordfence.

What Site Owners Should Do

Anyone running All in One SEO should update to version 4.9.3 or newer as soon as possible. Sites that allow multiple external contributors are especially exposed since low-privilege accounts could access the site’s AI token on vulnerable versions.

Featured Image by Shutterstock/Shutterstock AI Generator