How Recommender Systems Like Google Discover May Work via @sejournal, @martinibuster

Google Discover is largely a mystery to publishers and the search marketing community even though Google has published official guidance about what it is and what they feel publishers should know about it. Nevertheless, it’s so mysterious that it’s generally not even considered as a recommender system, yet that is what it is. This is a review of a classic research paper that shows how to scale a recommender system. Although it’s for YouTube, it’s not hard to imagine how this kind of system can be adapted to Google Discover.

Recommender Systems

Google Discover belongs to the class of systems known as a recommender systems. A classic recommender system I remember is the MovieLens system from way back in 1997. It is a university science department project that allowed users to rate movies and it would use those ratings to recommend movies to watch. The way it worked is like, people who tend to like these kinds of movies tend to also like these other kinds of movies. But these kinds of algorithms have limitations that make them fall short for the scale necessary to personalize recommendations for YouTube or Google Discover.

Two-Tower Recommender System Model

The modern style of recommender systems are sometimes referred to as the Two-Tower architecture or the Two-Tower model. The Two-Tower model came about as a solution for YouTube, even though the original research paper (Deep Neural Networks for YouTube Recommendations) does not use this term.

It may seem counterintuitive to look to YouTube to understand how the Google Discover algorithm works, but the fact is that the system Google developed for YouTube became the foundation for how to scale a recommender system for an environment where massive amounts of content are generated every hour of the day, 24 hours a day.

It’s called the Two-Tower architecture because there are two representations that are matched against each other, like two towers.

In this model, which handles the initial “retrieval” of content from the database, a neural network processes user information to produce a user embedding, while content items are represented by their own embeddings. These two representations are matched using similarity scoring rather than being combined inside a single network.

I’m going to repeat that the research paper does not refer to the architecture as a Two-Tower architecture, it’s a description for this kind of approach that was created later. So, while the research paper doesn’t use the word tower, I’m going to continue using it as it makes it easier to visualize what’s going on in this kind of recommender system.

User Tower
The User Tower processes things like a user’s watch history, search tokens, location, and basic demographics. It uses this data to create a vector representation that maps the user’s specific interests in a mathematical space.

Item Tower
The Item Tower represents content using learned embedding vectors. In the original YouTube implementation, these were trained alongside the user model and stored for fast retrieval. This allows the system to compare a user’s “coordinates” against millions of video “coordinates” instantly, without having to run a complex analysis on every single video each time you refresh your feed.

The Fresh Content Problem

Google’s research paper offers an interesting take on freshness. The problem of freshness is described as a tradeoff between exploitation and exploration. The YouTube recommendation system has to balance between showing users content that is already known to be popular (exploitation) versus exposing them to new and unproven content (exploration). What motivates Google to show new but unproven content, at least for the context of YouTube, is that users show a strong preference for new and fresh content.

The research paper explains why fresh content is important:

“Many hours worth of videos are uploaded each second to YouTube. Recommending this recently uploaded (“fresh”) content is extremely important for YouTube as a product. We consistently observe that users prefer fresh content, though not at the expense of relevance.”

This tendency to show fresh content seems to hold true for Google Discover, where Google tends to show fresh content on topics that users are personally trending with. Have you ever noticed how Google Discover tends to favor fresh content? The insights that the researchers had about user preferences probably carry over to the Google Discover recommendation system. The takeaway here is that producing content on a regular basis could be helpful for getting web pages surfaced in Google Discover.

An interesting insight in this research paper, and I don’t know if it’s still true but it’s still interesting, is that the researchers state that machine learning algorithms show an implicit biased toward older existing content because they are trained on historical data.

They explain:

“Machine learning systems often exhibit an implicit bias towards the past because they are trained to predict future behavior from historical examples.”

The neural network is trained on past videos and they learn that things from one or two days ago were popular. But this creates a bias for things that happened in the past. The way they solved the freshness issue is when the system is recommending videos to a user (serving), this time-based feature is set to zero days ago (or slightly negative). This signals to the model that it is making a prediction at the very end of the training window, essentially forcing it to predict what is popular right now rather than what was popular on average in the past.

Accuracy Of Click Data

Google’s foundational research paper also provides insights about implicit user feedback signals, which is a reference to click data. The researchers say that this kind of data rarely provides accurate user satisfaction information.

The researchers write:

“Noise: Historical user behavior on YouTube is inherently difficult to predict due to sparsity and a variety of unobservable external factors. We rarely obtain the ground truth of user satisfaction and instead model noisy implicit feedback signals. Furthermore, metadata associated with content is poorly structured without a well defined ontology. Our algorithms need
to be robust to these particular characteristics of our training data.”

The researchers conclude the paper by stating that this approach to recommender systems helped increase user watch time and proved to be more effective than other systems.

They write:

“We have described our deep neural network architecture for recommending YouTube videos, split into two distinct problems: candidate generation and ranking.
Our deep collaborative filtering model is able to effectively assimilate many signals and model their interaction with layers of depth, outperforming previous matrix factorization approaches used at YouTube.

We demonstrated that using the age of the training example as an input feature removes an inherent bias towards the past and allows the model to represent the time-dependent behavior of popular of videos. This improved offline holdout precision results and increased the watch time dramatically on recently uploaded videos in A/B testing.

Ranking is a more classical machine learning problem yet our deep learning approach outperformed previous linear and tree-based methods for watch time prediction. Recommendation systems in particular benefit from specialized features describing past user behavior with items. Deep neural networks require special representations of categorical and continuous features which we transform with embeddings and quantile normalization, respectively.”

Although this research paper is ten years old, it still offers insights into how recommender systems work and takes a little of the mystery out of recommender systems like Google Discover. Read the original research paper: Deep Neural Networks for YouTube Recommendations

Featured Image by Shutterstock/Andrii Iemelianenko

The Great Decoupling via @sejournal, @Kevin_Indig

SEO died as a traffic channel the moment pipeline stopped following page views. Traffic is either down for many sites, or its growth nowhere near reflects growth rates of 2019-2022, but demos and pipeline are up for brands that shifted from chasing clicks to building authority.

What you’ll get in today’s memo:

  • Why traffic and pipeline decoupled.
  • What brand strength actually means in AI search.
  • How to reframe SEO with executives.
The old funnel has holes. Traffic and pipeline no longer move together. (Image Credit: Kevin Indig)

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

1. We’ve Hit Peak Search Volume For Traditional Queries

Image Credit: Kevin Indig
Image Credit: Kevin Indig
Image Credit: Kevin Indig

Short-head keyword demand is in permanent decline and likely contributing to slowed traffic growth or decline.

An analysis of roughly 10,000 short-head keywords shows that collective search volume grew only 1.2% over the last 12 months and is forecasted to decline by 0.74% over the next 12 months.

Two forces are driving it:

  • Fragmentation into long-tail: Demand did not disappear; it atomized into thousands of specific queries.
  • Bypass behavior: More users start in AI interfaces (AIOs, AI Mode, ChatGPT) instead of classic search.

This shift is irreversible for four structural reasons:

1. AI Overviews are here to stay. Google’s revenue model depends on keeping users inside the SERP. Zero-click search protects Google’s ad business. The company is not reverting to the 10 blue links.

2. LLM outputs are preferred starting points. Many users have conditioned themselves to expect direct answers. The behavior change is complete.

3. Zero-click is now the default expectation. Clicking through now feels like friction, not value. If the answer or solution isn’t easily acquired, the search experience failed.

4. Content supply exploded. There is significantly more content competing for the same queries than three years ago. AI-generated articles, Reddit threads, YouTube videos, and newsletters all compete for visibility. Even if visibility or “rankings” hold, CTR collapses under the weight of infinite options.

Optimizing for traffic growth in this environment is like optimizing for fax machine usage in 2010. The channel is structurally shifting – the products that people use to find answers have fundamentally changed.

2. Traffic And Pipeline Decoupled Because AI Ate The Click

The correlation between organic traffic and pipeline has broken. But it takes a bit more work to convince stakeholders and executives. We’re seeing this across the industry.

In December, Maeva Cifuentes reported traffic growth of 32% for one of her clients, while signups grew 75% over the same six-month period. Her post was in response to one from Gaetano DiNardi, who found no correlation between traffic and pipeline across multiple B2B SaaS companies he advises. Maeva’s client data shows you can grow pipeline 2.3x faster than traffic. Gaetano’s data shows you can grow pipeline while traffic stays flat or even declines.

Image Credit: Kevin Indig

The classic SEO model assumed a linear relationship: More rankings meant more clicks, more clicks meant more traffic, more traffic meant more leads.

Alternatively, AI answers queries without sending clicks. The Growth Memo AI Mode Study found that when the search task was informational and non-transactional, the number of external clicks to sources outside the AI Mode output was nearly zero across all user tasks. Users get the information they need – directly in their interface of choice – without ever visiting your site.

But buying intent didn’t disappear with the clicks.

SEO creates influence. It can still shape which brands buyers trust. It just doesn’t deliver the click anymore.

Education happens inside the AI interface. Brand selection happens after. Your traffic vanished, but the demand for your product/services didn’t.

This explains why Maeva noted she has clients whose traffic is declining, but demos are growing by double digits month-over-month.

Image Credit: Kevin Indig

The SEO work didn’t stop working. The measurement broke. Teams optimized for clicks are being judged on a metric that no longer predicts business outcomes.

3. Strong Brands Still Win In AI Search, But “Brand Strength” Has A New Definition

In AI search, performance depends less on “more pages” and more on whether AI systems can confidently understand, trust, and cite you for a specific audience and context.

Brand strength in AI search has four components:

  1. Topical Authority: Complete ownership of the conceptual map (see topic-first SEO), not just keyword coverage.
  2. ICP Alignment: Answers tailored to specific buyer questions, prioritizing relevance over volume. Read Personas are critical for AI search to learn more.
  3. Third-Party Validation: Citations from category-defining sources matter more than high-DA links (see the data in How AI weighs your links).
  4. Positioning Clarity: LLMs must recognize what a brand is known for. Vague positioning gets skipped; sharp positioning gets cited (covered in State of AI Search).

SEO teams that are structured for traffic optimization are now misaligned with business outcomes.

The conversation you need to have is “traffic and pipeline decoupled, here’s the data proving it, and here’s what we’re measuring instead.”

Move from keyword-first workflows to ICP-first workflows. Start with ICP research (what questions do your buyers ask and where do they ask them), positioning (what are you known for), and omnichannel distribution (SEO + Reddit + YouTube + earned media). SEO is no longer a standalone channel. It’s one input in a brand-building system.

Move from traffic reporting to influence reporting. Stop leading stakeholder conversations with sessions, impressions, and rankings. Report on brand lift (are more people searching for you by name?), pipeline influence (what percentage of demos started with organic touchpoints?), and LLM visibility rates (how often do AI systems mention your brand vs cite your content?).

4. The Uncomfortable Question: If SEO Doesn’t Drive Traffic Anymore, What Does It Do?

Here’s what SEO actually does and always did: It shapes mental availability and brand recognition, builds topic/category authority, frames the problem (and the solution), and reduces buyer uncertainty.

Traffic was a proxy for those things. The click was the observable action, but the trust was the outcome that mattered.

LLM-based search has removed the click but kept the trust-building. Users still learn from your content. It just happens inside an LLM interface instead of on your domain. Your content can still influence which brands buyers trust. Yes, it’s harder to measure because it’s invisible to analytics. But the outcome – buyers choosing your brand when they’re ready to buy – is the same.

SEO influences brand preference within the category. When buyers are in-market and researching solutions, SEO determines whether your brand is in the consideration set and whether AI systems recommend you.

Traffic was never the point. It was just the easiest thing to measure.


Featured Image: Paulo Bobita/Search Engine Journal

Perplexity AI Interview Explains How AI Search Works via @sejournal, @martinibuster

I recently spoke with Jesse Dwyer of Perplexity about SEO and AI search about what SEOs should be focusing on in terms of optimizing for AI search. His answers offered useful feedback about what publishers and SEOs should be focusing on right now.

AI Search Today

An important takeaway that Jesse shared is that personalization is completely changing

“I’d have to say the biggest/simplest thing to remember about AEO vs SEO is it’s no longer a zero sum game. Two people with the same query can get a different answer on commercial search, if the AI tool they’re using loads personal memory into the context window (Perplexity, ChatGPT).

A lot of this comes down to the technology of the index (why there actually is a difference between GEO and AEO). But yes, it is currently accurate to say (most) traditional SEO best practices still apply.”

The takeaway from Dwyer’s response is that search visibility is no longer about a single consistent search result. Personal context as a role in AI answers means that two users can receive significantly different answers to the same query with possibly different underlying content sources.

While the underlying infrastructure is still a classic search index, SEO still plays a role in determining whether content is eligible to be retrieved at all. Perplexity AI is said to use a form of PageRank, which is a link-based method of determining the popularity and relevance of websites, so that provides a hint about some of what SEOs should be focusing on.

However, as you’ll see, what is retrieved is vastly different than in classic search.

I followed up with the following question:

So what you’re saying (and correct me if I’m wrong or slightly off) is that Classic Search tends to reliably show the same ten sites for a given query. But for AI search, because of the contextual nature of AI conversations, they’re more likely to provide a different answer for each user.

Jesse answered:

“That’s accurate yes.”

Sub-document Processing: Why AI Search Is Different

Jesse continued his answer by talking about what goes on behind the scenes to generate an answer in AI search.

He continued:

“As for the index technology, the biggest difference in AI search right now comes down to whole-document vs. “sub-document” processing.

Traditional search engines index at the whole document level. They look at a webpage, score it, and file it.

When you use an AI tool built on this architecture (like ChatGPT web search), it essentially performs a classic search, grabs the top 10–50 documents, then asks the LLM to generate a summary. That’s why GPT search gets described as “4 Bing searches in a trenchcoat” —the joke is directionally accurate, because the model is generating an output based on standard search results.

This is why we call the optimization strategy for this GEO (Generative Engine Optimization). That whole-document search is essentially still algorithmic search, not AI, since the data in the index is all the normal page scoring we’re used to in SEO. The AI-first approach is known as “sub-document processing.”

Instead of indexing whole pages, the engine indexes specific, granular snippets (not to be confused with what SEO’s know as “featured snippets”). A snippet, in AI parlance, is about 5-7 tokens, or 2-4 words, except the text has been converted into numbers, (by the fundamental AI process known as a “transformer”, which is the T in GPT). When you query a sub-document system, it doesn’t retrieve 50 documents; it retrieves about 130,000 tokens of the most relevant snippets (about 26K snippets) to feed the AI.

Those numbers aren’t precise, though. The actual number of snippets always equals a total number of tokens that matches the full capacity of the specific LLM’s context window. (Currently they average about 130K tokens). The goal is to completely fill the AI model’s context window with the most relevant information, because when you saturate that window, you leave the model no room to ‘hallucinate’ or make things up.

In other words, it stops being a creative generator and delivers a more accurate answer. This sub-document method is where the industry is moving, and why it is more accurate to be called AEO (Answer Engine Optimization).

Obviously this description is a bit of an oversimplification. But the personal context that makes each search no longer a universal result for every user is because the LLM can take everything it knows about the searcher and use that to help fill out the full context window. Which is a lot more info than a Google user profile.

The competitive differentiation of a company like Perplexity, or any other AI search company that moves to sub-document processing, takes place in the technology between the index and the 26K snippets. With techniques like modulating compute, query reformulation, and proprietary models that run across the index itself, we can get those snippets to be more relevant to the query, which is the biggest lever for getting a better, richer answer.

Btw, this is less relevant to SEO’s, but this whole concept is also why Perplexity’s search API is so legit. For devs building search into any product, the difference is night and day.”

Dwyer contrasts two fundamentally different indexing and retrieval approaches:

  • Whole-document indexing, where pages are retrieved and ranked as complete units.
  • Sub-document indexing, where meaning is stored and retrieved as granular fragments.

In the first version, AI sits on top of traditional search and summarizes ranked pages. In the second, the AI system retrieves fragments directly and never reasons over full documents at all.

He also described that answer quality is constrained by context-window saturation, that accuracy emerges from filling the model’s entire context window with relevant fragments. When retrieval succeeds at saturating that window, the model has little capacity to invent facts or hallucinate.

Lastly, he says that “modulating compute, query reformulation, and proprietary models” is part of their secret sauce for retrieving snippets that are highly relevant to the search query.

Featured Image by Shutterstock/Summit Art Creations

Case Study: How Entity Linking Can Support Local Search Success via @sejournal, @marthavanberkel

Search has changed dramatically, including local search. Search engines and AI systems now incorporate semantic understanding to generate citations and results. To gain semantic understanding, they need to know which topics appear in the content and how they relate to one another so that they can identify your areas of authority.

For brands with multiple locations, this shift can create challenges. Search engines often misinterpret place names or the services a location offers, which can lead to the wrong landing page appearing for a near-me query. At the same time, it gives local SEOs a new opportunity to add needed semantic clarity.

To support clarity and semantic understanding, SEOs should adopt an entity SEO approach. The topics, also known as entities, are like keywords with multiple dimensions. When defined within your content and with schema markup, entities can bring clarity to AI and search engines.

In Microsoft’s recent article titled “Optimizing Your Content for Inclusion in AI Search Answers,” Krishna Madhavan, Bing’s Principal Product Manager, stated:

“Schema can label your content as a product, review, FAQ, or event, turning plain text into structured data that machines can interpret with confidence.”

This semantic understanding is what adds clarity to AI.

With more than 47 locations, one of our clients, Brightview Senior Living, needed a way to scale SEO across dozens of markets. Entity linking helped them do exactly that. Their strategy shows what SEOs can start doing today to gain clarity, authority, and better local performance.

Why Entity Linking Matters For Local SEO Today

In the world of Entity SEO, search engines now look beyond keywords for:

  • What entities are mentioned on a page.
  • How those entities relate to the user’s search queries.
  • Whether the content provides meaningful context and clarity.

Entities include locations, services, products, people, or anything else with a definable meaning. But identifying an entity is only the first step. Search engines also need to understand the entity’s context, which is where properties in schema markup come in and help disambiguate what the entity actually represents.

When you optimize a page, you describe its main entity. By using the schema.org vocabulary, you can leverage its properties to provide search engines and AI with a structured way to understand the entity.

For example, if you’re describing a location, you’d define the physical location as a LocalBusiness entity, using schema properties to describe the business and its service area, and then define the properties that map to the content on the page to describe it.

Now that you’ve defined the entity using properties, it’s time to add entity linking.

There are two types of entity linking: external entity linking and internal entity linking.

Internal Entity linking is the process of linking to internal entities on your website. External Entity linking is the process of linking entities on your site to their definitions in authoritative knowledge bases such as Wikipedia, Wikidata, or industry-specific glossaries. This is done using schema.org properties such as “sameAs”, “mentions”, “areaServed”, and more. Note that entity linking can use any properties within schema.org.

Today, we’ll focus on external entity linking.

By linking the entities mentioned in your website content to authoritative external sources, you provide search engines with clear, explicit definitions. This reduces ambiguity, improves the relevance of your rankings, and can help your content’s performance in AI summaries and intent-based search experiences.

For organizations looking to optimize for local search, place-based entity linking is particularly impactful.

Brightview’s Challenge: Scaling Hyperlocal SEO Across 47+ Communities

Brightview Senior Living’s marketing team was responsible for performance across more than 47 community pages, each with its own name, local context, and service mix. Search engines often struggled to interpret these pages correctly, especially when the location name overlapped with a more prominent city elsewhere.

A prime example was Phoenix, Maryland, being confused with Phoenix, Arizona. This kind of misunderstanding can derail visibility for queries such as “assisted living near me” or “assisted living in Phoenix.”

To improve search engines’ understanding of what Brightview offered and where, they needed a future-proof strategy grounded in semantic clarity.

The Solution: Place-Based And Topical Entity Linking At Scale

Brightview shifted from keyword-first SEO to entity-first SEO. Their strategy focused on identifying the entities that defined each location and service offering, then linking them to authoritative definitions to eliminate ambiguity.

1. Disambiguating Place Names

On each community page, Brightview explicitly defined the location entity and linked it to its authoritative source. For example:

  • Using mentions within the schema markup to identify the specific place referenced on the community page.
  • Using areaServed on community pages to clarify the geographic region that the location serves.
  • Using sameAs to link each location entity to authoritative sources like Wikipedia, Wikidata, and Google’s Knowledge Graph to disambiguate places with similar or identical names.
Location-based schema markup with entity linking example.
Image from author, December 2025

This resolved issues such as the Phoenix, Maryland, confusion by telling search engines exactly which Phoenix the content referred to. It also provided a clear geographic signal for near me and geo-modified queries.

2. Mapping Key Services As Entities

Brightview applied entity linking to core service terms, including assisted living and independent living. These concepts were linked to authoritative sources using “sameAs” and “mentions”.

This helped Brightview show up more consistently for non-branded, high-intent searches like “assisted living communities” or “independent living options,” which are critical touchpoints early in the customer journey.

By linking assisted living to a known entity, search engines recognized Brightview’s content as authoritative on the topic. This moved Brightview beyond brand-dependent queries and into the realm of broader, category-level search visibility.

3. Scaling Entity Linking Across All Content Types

Entity linking was applied across community pages, blog posts, and informational resources. This built a connected content knowledge graph that reinforced Brightview’s authority across both topics and locations that mattered most to their organization.

The result was a site where search engines could clearly understand what each page was about, what locations it represented, and how those pages related to Brightview’s broader expertise.

By disambiguating locations and services, Brightview made it easier for AI systems to return correct answers when users searched for care options in specific regions.

The Result: Stronger Local Visibility And More Accurate Search Interpretation

After implementing entity linking, Brightview saw measurable gains in both local and non-branded visibility.

Stronger Non-Branded Search Performance

Non-branded queries often indicate users who have not yet chosen a provider and who are actively evaluating options.

By clearly defining their service entities using schema markup, Brightview achieved:

  • 25% increase in clicks for non-branded queries featuring the “assisted living” entity.
  • 30% increase in impressions for those same queries.

This shift shows how entity linking helps organizations rank for what they do and where they do it, not just who they are.

Higher Discoverability For Community Pages

With place-based external entity linking in place, Brightview’s community pages performed better for high-intent local searches. Search engines better understood the connection between each community and its service area.

Across community pages, Brightview saw:

  • 16% year-over-year increase in clicks (despite industry-wide drops in clicks).
  • 26 % year-over-year increase in impressions.

Pages that used clear, linked location data were more reliably served for near-me and city-based queries.

Stable CTR Despite Industry Declines

As AI Overviews reshape the SERP with zero-click search, many brands have seen their click-through rate drop. Brightview’s CTR remained strong relative to benchmarks. Clear entity definitions helped search engines and AI models surface their content accurately, even as the search landscape shifted.

Ryan Pitcheralle, Brightview’s SEO consultant, noted that the strength of their schema markup implementation was a direct driver of performance. As he put it, their results showed “complete causation, not just correlation. This is why we’ve stayed competitive in clickthrough rate and performance while everyone else is sliding.”

How To Use Entity Linking Strategically

Entity linking is not only a technical tactic. It is a strategic opportunity to clarify what your organization should be known for. Here is how to apply it effectively.

1. Identify The Entities That Define Your Authority

Your website contains many entities, but you do not need to link them all. Focus on the ones that support clarity and strategic differentiation.

For example:

  • Locations you want to rank for.
  • Core service offerings.
  • Product categories.
  • Regulated terms or industry definitions.
  • Topics you want to be recognized as authoritative on.

Consistently linking these entities signals to search engines where your expertise lies.

2. Build A Connected Content Knowledge Graph

Entity linking is a key part of creating a content knowledge graph that shows search engines the relationships between your locations, offerings, resources, and brand. Your content knowledge graph helps machines infer meaning, understand context, and deliver more accurate results about your organization that can make or break conversions.

3. Prioritize Place-Based Entity Linking If You Have Multiple Locations

Local search hinges on clarity. Search engines need explicit signals about:

  • Which location your page refers to.
  • What services are available there.
  • Which geographic region that page serves.

Place-based entity linking provides that clarity and increases your chances of ranking for geo-modified and near-me queries.

4. Prepare For AI Search

AI search experiences rely on correctly interpreted entities. When locations, services, and concepts are linked to authoritative sources, AI systems can return more accurate, helpful answers and are more likely to reference your content correctly.

Entity Linking Is A Clear Path To Local SEO Accuracy

Brightview’s success shows that entity linking is a practical, high-impact way to strengthen local search performance. By clarifying locations, services, and key concepts, you can help search engines and AI systems understand exactly what your content represents.

Entity linking improves semantic accuracy and builds the foundation for long-term authority. For SEO and marketing leaders, it is one of the most actionable ways to prepare for the future of semantic and AI-driven search.

More Resources:


Featured Image: optimarc/Shutterstock

Data Shows AI Overviews Disappears On Certain Kinds Of Finance Queries via @sejournal, @martinibuster

New data from BrightEdge shows how finance-related queries perform on AI Overviews, identifying clear areas that continue to show AIO while Google is pulling back from others. The deciding factor is whether the query benefits from explanation and synthesis versus direct data retrieval or action.

AI Overviews In Finance Are Query-Type Driven

Finance queries with an educational component, such as “what is” queries trigger a high level of AI Overviews, generating and AIO response as high as 91% of the time.

According to the data:

  • Educational queries (“what is an IRA”): 91% have AI Overviews
  • Rate and planning queries: 67% have AI Overviews
  • Stock tickers and real-time prices: 7% have AI Overviews

Examples of finance educational queries that generate AI Overviews:

  • ebitda meaning
  • how does compound interest work
  • what is an IRA
  • what is dollar cost averaging
  • what is a derivative
  • what is a bond

Finance Queries Where AIO Stays Out

Two areas where AIO stays out are local type queries or queries where real-time accuracy are of the essence. Local queries were initially a part of the original Search Generative Experience results in 2023, showing AI answers 90% of the time. That dropped to about 10% of the time.

The data also shows that “brand + near me” and other “near me” queries are dominated by local pack results and Maps integrations.

Tool and real-time information needs are no longer triggering AI Overviews. Finance calculator queries only shows AI Overviews 9% of the time. Other similar queries show no AI Overviews at all such as:

  • 401k calculator
  • compound interest calculator
  • investment calculator
  • mortgage calculator

The BrightEdge data shows that these real-time data topics do not generate AIO or generate a low amount:

  • Individual stock tickers: 7% have AI Overviews
  • Live price queries: Traditional results dominate
  • Market indices: Low AI coverage

Examples of queries Google AI generally keeps out of:

  • AAPL stock
  • Tesla price
  • dow jones industrial average today
  • S&P 500 futures

Takeaway

The direction Google takes for virtually anything search related depends on user feedback and the ability to show relevant results. It’s not uncommon for some in SEO to underestimate the power of implicit and explicit user feedback as a force that moves Google’s hands on when to show certain kinds of search features. Thus it may be that users are not satisfied with synthesized answers for real-time, calculator and tool, and local near me types of queries.

AIO Stays Out Of Brand Queries

Another area where AI Overviews are rarely if ever shown are finance queries that have a brand name as a component of the query. Brand login queries show AIO only zero to four percent of the time. Brand navigational queries do not show any AI search results.

Where AI Overviews Dominates Finance Results

The finance queries where AIO tends to dominate are those with an educational or explanatory intent, where users are seeking to understand concepts, compare options, or receive general guidance rather than retrieve live data, use tools, or complete a navigational task.

The data shows AIO dominating these kinds of queries:

  • Rate and planning queries: 67% have AI Overviews.
  • Rate information queries: 67% have AI Overviews.
  • Rate/planning queries (mortgages, retirement): 67%.
  • Retirement planning queries: 61% have AI Overviews.
  • Tax-related queries: 55% have AI Overviews.

Takeaway

As previously noted, Google doesn’t arbitrarily decide to show AI answers based on its judgments. User behavior and satisfaction signals play a large role. The fact that AI answers dominates these kinds of answers shows that AIO tends to satisfy users for these kinds of finance queries with a strong learning intent. This means that showing up as a citation for these kinds of queries requires carefully crafting content with a high level of precise answers. In my opinion, I think that a focus on creating content that is unique and doing it on a predictable and regular basis sends a signal of authoritativeness and trustworthiness. Definitely stay away from tactic of the month approaches to content.

Visibility And Competition Takeaways

Educational and guidance content have a high visibility in AI responses, not just organic rankings. Visibility increasingly depends on being cited or referenced. It may be useful to focus not just on text content but to offer audio, image, and video content. Not only that, but graphs and tables may be useful ways of communicating data, anything that can be referenced as an answer or to support the answer may be useful.

Traditional ranking factors still hold for high-volume local, tool, and real-time data queries. Live prices, calculators, and local searches continue to operate under conventional SEO factors.

Finance search behavior is increasingly segmented by intent and topic. Each query type follows a different path toward AI or organic results. The underlying infrastructure is still the same classic search which means that focusing on the fundamentals of SEO plus expanding beyond simple text content to see what works is a path forward.

Read BrightEdge’s data on finance queries and AI: Finance and AI Overviews: How Google Applies YMYL Principles to Financial Search

Featured Image by Shutterstock/Mix and Match Studio

Why Agentic AI May Flatten Brand Differentiators via @sejournal, @martinibuster

James LePage, Dir Engineering AI, co-lead of the WordPress AI Team, described the future of the Agentic AI Web, where websites become interactive interfaces and data sources and the value add that any site brings to their site becomes flattened. Although he describes a way out of brand and voice getting flattened, the outcome for informational, service, and media sites may be “complex.”

Evolution To Autonomy

One of the points that LePage makes is that of agentic autonomy and how that will impact what it means to have an online presence. He maintains that humans will still be in the loop but at a higher and less granular level, where agentic AI interactions with websites are at the tree level dealing with the details and the humans are at the forest level dictating the outcome they’re looking for.

LePage writes:

“Instead of approving every action, users set guidelines and review outcomes.”

He sees agentic AI progressing on an evolutionary course toward greater freedom with less external control, also known as autonomy. This evolution is in three stages.

He describes the three levels of autonomy:

  1. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions.
  2. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority.
  3. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.”

AI Agents May Turn Sites Into Data Sources

LePage sees the web in terms of control, with Agentic AI experiences taking control of how the data is represented to the user. The user experience and branding is removed and the experience itself is refashioned by the AI Agent.

He writes:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user.

Your website becomes a data source rather than an experience.”

Does it sound problematic that websites will turn into data sources? As you’ll see in the next paragraph, LePage’s answer for that situation is to double down on interactions and personalization via AI, so that users can interact with the data in ways that are not possible with a static website.

These are important insights because they’re coming from the person who is the director of AI engineering at Automattic and co-leads the team in charge of coordinating AI integration within the WordPress core.

AI Will Redefine Website Interactions

LePage, who is the co-lead of WordPress’s AI Team, which coordinates AI-related contributions to the WordPress core, said that AI will enable websites to offer increasingly personalized and immersive experiences. Users will be able to interact with the website as a source of data refined and personalized for the individual’s goals, with website-side AI becoming the differentiator.

He explained:

“Humans who visit directly still want visual presentation. In fact, they’ll likely expect something more than just content now. AI actually unlocks this.

Sites can create more immersive and personalized experiences without needing a developer for every variation. Interactive data visualizations, product configurators, personalized content flows. The bar for what a “visit” should feel like is rising.

When AI handles the informational layer, the experiential layer becomes a differentiator.”

That’s an important point right there because it means that if AI can deliver the information anywhere (in an agent user interface, an AI generated comparison tool, a synthesized interactive application), then information alone stops separating you from everyone else.

In this kind of future, what becomes the differentiator, your value add, is the website experience itself.

How AI Agents May Negatively Impact Websites

LePage says that Agentic AI is a good fit for commercial websites because they are able to do comparisons and price checks and zip through the checkout. He says that it’s a different story for informational sites, calling it “more complex.”

Regarding the phrase “more complex,” I think that’s a euphemism that engineers use instead of what they really mean: “You’re probably screwed.”

Judge for yourself. Here’s how LePage explains websites lose control over the user experience:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user. Your website becomes a data source rather than an experience.

For media and services, it’s more complex. Your brand, your voice, your perspective, the things that differentiate you from competitors, these get flattened when an agent summarizes your content alongside everyone else’s.”

For informational websites, the website experience can be the value add but that advantage is eliminated by Agentic AI and unlike with ecommerce transactions where sales are the value exchange, there is zero value exchange since nobody is clicking on ads, much less viewing them.

Alternative To Flattened Branding

LePage goes on to present an alternative to brand flattening by imagining a scenario where websites themselves wield AI Agents so that users can interact with the information in ways that are helpful, engaging, and useful. This is an interesting thought because it represents what may be the biggest evolutionary step in website presence since responsive design made websites engaging regardless of device and browser.

He explains how this new paradigm may work:

“If agents are going to represent you to users, you might need your own agent to represent you to them.

Instead of just exposing static content and hoping the visiting agent interprets it well, the site could present a delegate of its own. Something that understands your content, your capabilities, your constraints, and your preferences. Something that can interact with the visiting agent, answer its questions, present information in the most effective way, and even negotiate.

The web evolves from a collection of static documents to a network of interacting agents, each representing the interests of their principal. The visiting agent represents the user. The site agent represents the entity. They communicate, they exchange information, they reach outcomes.

This isn’t science fiction. The protocols are being built. MCP is now under the Linux Foundation with support from Anthropic, OpenAI, Google, Microsoft, and others. Agent2Agent is being developed for agent-to-agent communication. The infrastructure for this kind of web is emerging.”

What do you think about the part where a site’s AI agent talks to a visitor’s AI agent and communicates “your capabilities, your constraints, and your preferences,” as well as how your information will be presented? There might be something here, and depending on how this is worked out, it may be something that benefits publishers and keeps them from becoming just a data source.

AI Agents May Force A Decision: Adaptation Versus Obsolescence

LePage insists that publishers, which he calls entities, that evolve along with the Agentic AI revolution will be the ones that will be able to have the most effective agent-to-agent interactions, while those that stay behind will become data waiting to be scraped .

He paints a bleak future for sites that decline to move forward with agent-to-agent interactions:

“The ones that don’t will still exist on the web. But they’ll be data to be scraped rather than participants in the conversation.”

What LePage describes is a future in which product and professional service sites can extract value from agent-to-agent interactions. But the same is not necessarily true for informational sites that users depend on for expert reviews, opinions, and news. The future for them looks “complex.”

How To Analyze Google Discover

TL;DR

  1. To generate the most value from Discover, view it through an entity-focused lens. People, places, organisations, teams, et al.
  2. Your best chance of success in Discover with an individual article is to make sure it outperforms its expected performance early. So share, share, share.
  3. Then analyze the type of content you create. What makes it clickable? What resonates? What headline and image combination works?
  4. High CTR is key for success, but “curiosity gap” headlines that fail to deliver kill long-term credibility. User satisfaction trumps clickiness over time.

Discover isn’t a completely black box. We have a decent idea of how it works and can reverse engineer more value with some smart analysis.

Yes, there’s always going to be some surprises. It’s a bit mental at times. But we can make the most of the platform without destroying our credibility by publishing articles about vitamin B12 titled:

“Outlive your children with this one secret trick the government don’t want you to know about.”

Key Tenets Of Discover

Before diving in headfirst, let’s check the depth of the pool.

“Sustained presence on search helps maintain your status as a trustworthy publisher.”

  • Discover feeds off fresh content. While evergreen content pops up, it is very closely related to the news.
  • More lifestyle-y, engaging content tends to thrive on the clickless platform.
  • Just like news, Discover is very entity, click, and early engagement driven.
  • The personalized platform groups cohorts of people together. If you satiate one, more of that cohort will likely follow.
  • If your content outperforms its predicted early-stage performance, it is more likely to be boosted.
  • Once the groups of potentially interested doomscrollers have been saturated, content performance naturally drops off.
  • Google is empowering our ability to find individual creators and video content on the platform, because people trust people and like watching stuff. Stunned.

Obviously, loads of people know how to game the system and have become pretty rich by doing so. If you want to laugh and cry in equal measure, see the state of Google’s spam problems here.

No sign of it being fixed either (Image Credit: Harry Clarkson-Bennett)

Most algorithms follow the Golden Hour Rule. Not to be confused with the golden shower rule, it means the first 60 minutes after posting determine whether algorithms will amplify or bury your content.

If you want to go viral, your best bet is to drive early stage engagement.

What Data Points Should You Analyze?

This is focused more on how you, as an SEO or analyst, can get more value out of the platform. So, let’s take conversions and click/impression data as read. We’re going deeper. This isn’t amateur hour.

I think you need to track the below and I’ll explain why.

  • CTR.
  • Entities.
  • Subfolders.
  • Authorship.
  • Headlines and images.
  • Content type (just a simple breakdown of news, how-tos, interviews, evergreen guides, etc.).
  • Publishing performance.

You need to already get traffic from Discover to generate value from this analysis. If you don’t, revert back to creating high-quality, unique content in your niche(s) and push it out to the wider world.

Create great content and get the right people sharing it.

Worth noting you can’t accurately identify Discover traffic in analytics platforms. You have to accept some of it will be mis-attributed. Most companies make an educated guess of sorts, using a combination of Google and mobile/android to group it together.

CTR

CTR is one of the foundational metrics of news SEO, Top Stories, Discover, and almost any form of real-time SEO. It is far more prevalent in news than traditional SEO because the algorithm is making decisions about what content should be promoted in almost real time.

Evergreen results are altered continuously, based on much longer-term engagement.

This is weighted alongside some kind of traditional Navboost engagement data – clicks, on-page interactions, session duration, et al. – to associate a clickable headline and image with content that serves the user effectively.

It’s also one of the reasons why clickbait has (broadly) started to die a death. Like rampant AI slop, even the mouth breathers will tire of it eventually.

To get the most out of CTR, you need to combine it with:

  • Image type.
  • Headline type (content type too).
  • And entity analysis.

Entity Analysis

Entities are more important in news than any other part of SEO. While entity SEO has been growing in popularity for years, news sites have been obsessed with entities (arguably without knowing it), for years.

Individual entity analysis based on the title and page content is perfect for Discover (Image Credit: Harry Clarkson-Bennett)

While it isn’t as easy to just frontload headlines with relevant entities to get traffic anymore, there’s still a real value in analyzing performance at an entity level.

Particularly in Discover.

You want to know what people, places, and organizations (arguably, these three make up 85%+ of all entities you need to care about) drive value for you and users in Discover.

To run proper entity analysis you cannot do this manually. At least not well or at scale.

My advice is to use a combination of your LLM of choice, an NER (Named Entity Recognition) tool and either Google’s Knowledge Graph or WikiData.

You can then extract the entity from the page in question (the title), disambiguate using the on page content (this helps you assess whether ‘apple’ is the computing company, the fruit or an idiotic celebrities daughter) and confirm it with WikiData or Google’s Knowledge Graph.

Bubble charts are a fantastic way of quickly visualizing opportunities for content, not just for Discover (Image Credit: Harry Clarkson-Bennett)

Subfolder

Relatively straightforward, but you want to know which subfolders tend to generate more impressions and clicks on average in Discover. This is particularly valuable if you work on larger sites with a lot of subfolders and high content production.

I like to break down entity performance at a subfolder level like so (Image Credit: Harry Clarkson-Bennett)

You want to make sure that everything you do maximizes value.

This becomes far more valuable when you combine this data with the type of headline and entities. If you begin to understand the type of headline (and content) that works for specific subfolders, you can help commissioners and writers make smarter decisions.

Subfolders that tend to perform better in Discover give individual articles a better chance of success.

Generate a list of all of your subfolders (or topics if your site isn’t setup particularly effectively) and tracking clicks, impressions and CTR over time. I’d use total clicks, impressions and CTR and an average per article as a starting point.

Authorship

Google tracks authorship in search. No ifs, no buts. The person who writes the content has significance when it comes to E-E-A-T, and good, reliable authorship makes a difference.

How much significance, I don’t know. And neither do you.

In breaking down all metrics from the leak that mention the word “author,” the below is how Google perceives and values authorship. As always, this is an imperfect science, but it’s interesting to note that of the 35 categories I reviewed, almost half are related just to identifying the author.

Not just who authored the article, but how clear is their online footprint (Image Credit: Harry Clarkson-Bennett)

Disambiguation is one of the most important components of modern-day search. Semantic SEO. Knowledge graphs. Structured data. E-E-A-T. A huge amount of this is designed to counter false documents, AI slop, and misinformation.

So, it’s really important for search (and Discover) that you provide undeniable clarity.

For Discover specifically, you should see authors through the prism of:

  • How many articles have they written that make it onto Discover (and that perform in Search)?
  • What topic/entities do they perform best with?
  • Ditto headline type.

Headline Type

This is a really good way of viewing the type of content that tends to perform for you. For example, you want to know whether curiosity gap headlines work well for you and whether headlines with numbers have a higher or lower CTR on average.

  • Do headlines with celebrities in the headline work well for you?
  • Does this differ by subfolder?
  • Do first-person headlines have a higher CTR in Money than in News?

These are all questions and hypotheses that you should be asking. Although you can’t scrape Discover directly (trust me, I’ve tried), you can hypothesize which H1, page title, and OG title is the clickiest.

The top headline is a list that piques my curiosity (although I’d add in a number here), and the bottom is more of a straight “how-to.” (Image Credit: Harry Clarkson-Bennett)

What’s interesting in this example is that “how-to” headlines are not portrayed as very Discover-friendly. But it’s the concept that sells it. It’s different.

Start by defining all the types of headlines you use – curiosity gap, localized, numbered lists, questions, how-to or utility type, emotional trigger, first person, et al. – and analyze how effective each one is.

Use a machine learning model (you can absolutely use ChatGPT’s API) to categorize each headline.

  • Train the model to identify place names, numbers, questions, and first-person style patterns.
  • Verify the quality of the categorization.
  • Break this down by subfolder, author, entity, or anything else you choose.

Worth noting that there are five different headlines you and Google can and should be using to determine how content is perceived. Discover is known to use the OG title more frequently than traditional search.

It’s an opportunity to create a “clickier” headline than you would typically use in the H1 or page title.

Images

Images fall into a similar category as headlines. They’re crucial. You can’t definitively prove which image gets pulled through into Discover. But as long as your featured image is 1200 px wide, it’s safe-ish to assume this is the one that’s used.

CTR is arguably the single biggest factor in determining early success. Continued success, I believe, is more Navboost-related – more traditional-ish engagement.

And CTR in Discover is determined by two things:

  1. The headline.
  2. The image.

Well, two things in your control. You could be pedantic and say, “Ooo, your brand is an important factor in CTR, actually. Psychologically, people always click on…”

And I’d tell you to bore off. We’re talking about an individual article. We’ve done a significant amount of image testing and know that in straight news, people like seeing people looking sad. They like real-ness.

In money, they like people looking at the camera, looking happy. It makes them feel safe in a financial decision.

People looking evocatively miserable, looking directly at the camera. Probably clickable, but you need to test (Image Credit: Harry Clarkson-Bennett)

Stupid, I know. But we’re not an intelligent race. Sure, there are a few outliers. Galileo. Einstein. Noel Edmonds. But the rest of us are just trying not to throw stuff at each other outside Yates’s on a Friday night.

It is actually why clickbait headlines have worked for years. It works until it doesn’t.

You’ll need to upload a set of images to help train the model, and please don’t take it as gospel. Check the outputs. For the basics – whether people are present, where they’re looking, color schemes, etc. – great. For more nuanced decisions like trustworthiness or emotional meaning, you’ll need to do that yourself.

Worth noting that lots of publishers trial badges and logos on images. And for good reason. Images with logos consistently click higher for larger brands (to the best of my knowledge), and if you’re a paywalled site, but have set live blogs to free, it’s worth telling people.

You should breakdown this image analysis into:

  • Human presence and gaze.
  • Facial expression.
  • Emotional resonance.
  • Composition and framing.
  • Colour schemes.
  • Photo-type.

Then you can use machine learning to bucket photos into groups to help determine CTR. For example, people directly looking at a camera + smiling could be one bucket. Not looking at a camera + scowling.

Publishing Performance

The more you publish, the more this matters.

Large newsrooms run analysis on publishing volumes, times, and content freshness fairly consistently and at a desk-level. If you only have 50 or fewer articles per month making it into Discover, you probably don’t need to do this.

But if we’re talking about hundreds or thousands of articles, these insights can be really useful to commissioners.

I would focus on:

  • Publishing days.
  • Publishing times.
  • Content freshness.
  • Republishing vs. publishing.
Breaking things down at a subfolder level is always crucial (Image Credit: Harry Clarkson-Bennett)
Day of the week data is always useful for larger publishers to get the most value out of their publishing (Image Credit: Harry Clarkson-Bennett)
Image Credit: Harry Clarkson-Bennett

Your output should give really clear guidance to desks, commissioners, and publishers around when is best to publish for peak Discover performance.

We never make direct recommendations solely for Discover for a number of reasons. Discover is a highly volatile platform and one that does reward nonsense. It can lead you down the garden path with all sorts of thin, curiousity gap style content if you just follow the numbers.

And it has limited direct impact on your bottom line.

How Do You Tie This All Together?

You need a clear set of goals. Goals that help you deliver analysis that directly impacts the value of your content in Discover. When you set your analysis, focus on elements you have more control over.

For example, you might not be able to control what commissioners choose to publish, but you can change the headline (H1, title, and/or OG) and image prior to publish.

  1. Set a clear goal around conversions and traffic.
  2. Understand what you have more control over.
  3. Deliver insights at a desk or subfolder level.

Understanding whether your role is more strategic or tactical is crucial. Strategic roles are more advisory in nature. You can offer some thoughts and advice on the type of headlines and entities to avoid or choose, but you may not be able to change them.

Tactical roles mean you have more say in the implementation of change. Headlines, publish times, entity targeting, etc.

Simple.

More Resources:


This post was originally published on Leadership in SEO.


Featured Image: Master1305/Shutterstock

Head Of WordPress AI Team Explains SEO For AI Agents via @sejournal, @martinibuster

James LePage, Director Engineering AI at Automattic, and the co-lead of the WordPress AI Team, shared his insights into things publishers should be thinking about in terms of SEO. He’s the founder and co-lead of the WordPress Core AI Team, which is tasked with coordinating AI-related projects within WordPress, including how AI agents will interact within the WordPress ecosystem. He shared insights into what’s coming to the web in the context of AI agents and some of the implications for SEO.

AI Agents And Infrastructure

The first observation that he made was that AI agents will use the same web infrastructure as search engines. The main point he makes is that the data that the agents are using comes from the regular classic search indexes.

He writes, somewhat provocatively:

“Agents will use the same infrastructure the web already has.

  • Search to discover relevant entities.
  • “Domain authority” and trust signals to evaluate sources.
  • Links to traverse between entities.
  • Content to understand what each entity offers.

I find it interesting how much money is flowing into AIO and GEO startups when the underlying way agents retrieve information is by using existing search indexes. ChatGPT uses Bing. Anthropic uses Brave. Google uses Google. The mechanics of the web don’t change. What changes is who’s doing the traversing.”

AI SEO = Longtail Optimization

LePage also said that schema structured data, semantic density, and interlinking between pages is essential for optimizing for AI agents. Notable is that he said that AI optimization that AIO and GEO companies are doing is just basic longtail query optimization.

He explained:

“AI intermediaries doing synthesis need structured, accessible content. Clear schemas, semantic density, good interlinking. This is the challenge most publishers are grappling with now. In fact there’s a bit of FUD in this industry. Billions of dollars flowing into AIO and GEO when much of what AI optimization really is is simply long-tail keyword search optimization.”

What Optimized Content Looks Like For AI Agents

LePage, who is involved in AI within the WordPress ecosystem, said that content should be organized in an “intentional” manner for agent consumption, by which he means structured markdown, semantic markup, and content that’s easy to understand.

A little further he explains what he believes content should look like for AI agent consumption:

“Presentations of content that prioritize what matters most. Rankings that signal which information is authoritative versus supplementary. Representations that progressively disclose detail, giving agents the summary first with clear paths to depth. All of this still static, not conversational, not dynamic, but shaped with agent traversal in mind.

Think of it as the difference between a pile of documents and a well-organized briefing. Both contain the same information. One is far more useful to someone trying to quickly understand what you offer.”

A little later in the article he offers a seemingly contradictory prediction of the role of content in an agentic AI future, reversing today’s formula of a well organized briefing over a pile of documents, saying that agentic AI will not need a website, just the content, a pile of documents.

Nevertheless, he recommends that content have structure so that the information is well organized at the page level with clear hierarchical structure and at the site level as well where interlinking makes the relationships between documents clearer. He emphasizes that the content must communicate what it’s for.

He then adds that in the future websites will have AI agents that communicate with external AI agents, which gets into the paradigm he mentioned of content being split off from the website so that the data can be displayed in ways that make sense for a user, completely separated from today’s concept of visiting a website.

He writes:

“Think of this as a progression. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.

The progression is toward more autonomy, but that doesn’t mean humans disappear from the loop. It means the loop gets wider. Instead of approving every action, users set guidelines and review outcomes.

…Before full site delegates exist, there’s a middle ground that matters right now.

The content an agent has access to can be presented in a way that makes sense for how agents work today. Currently, that means structured markdown, clean semantic markup, content that’s easy to parse and understand. But even within static content, there’s room to be intentional about how information is organized for agent consumption.”

His article, titled Agents & The New Internet (3/5), provides useful ideas of how to prepare for the agentic AI future.

Featured Image by Shutterstock/Blessed Stock

Google’s Mueller: Free Subdomain Hosting Makes SEO Harder via @sejournal, @MattGSouthern

Google’s John Mueller warns that free subdomain hosting services create unnecessary SEO challenges, even for sites doing everything else right.

The advice came in response to a Reddit post from a publisher whose site shows up in Google but doesn’t appear in normal search results, despite using Digitalplat Domains, a free subdomain service on the Public Suffix List.

What’s Happening

Mueller told the site owner that they likely aren’t making technical mistakes. The problem is the environment they chose to publish in.

He wrote:

“A free subdomain hosting service attracts a lot of spam & low-effort content. It’s a lot of work to maintain a high quality bar for a website, which is hard to qualify if nobody’s getting paid to do that.”

The issue comes down to association. Sites on free hosting platforms share infrastructure with whatever else gets published there. Search engines struggle to differentiate quality content from the noise surrounding it.

Mueller added:

“For you, this means you’re basically opening up shop on a site that’s filled with – potentially – problematic ‘flatmates’. This makes it harder for search engines & co to understand the overall value of the site – is it just like the others, or does it stand out in a positive way?”

He also cautioned against cheap TLDs for similar reasons. The same dynamics apply when entire domain extensions become overrun with low-quality content.

Beyond domain choice, Mueller pointed to content competition as a factor. The site in question publishes on a topic already covered extensively by established publishers with years of work behind them.

“You’re publishing content on a topic that’s already been extremely well covered. There are sooo many sites out there which offer similar things. Why should search engines show yours?”

Why This Matters

Mueller’s advice here fits a pattern I’ve covered repeatedly over the years. Previously, Google’s Gary Illyes warned against cheap TLDs for the same reason. Illyes put it bluntly at the time, telling publishers that when a TLD is overrun by spam, search engines might not want to pick up sitemaps from those domains.

The free subdomain situation creates a unique problem. While the Public Suffix List theoretically tells Google to treat these subdomains as separate sites, the neighborhood signal remains strong. If the vast majority of subdomains on that host are spam, Google’s systems may struggle to identify your site as the one diamond in the rough.

This matters for anyone considering free hosting as a way to test an idea before investing in a real domain. The test environment itself becomes the test. Search engines evaluate your site in the context of everything else published under that same domain.

The competitive angle also deserves attention. New sites on well-covered topics face a high bar regardless of domain choice. Mueller’s point about established publishers having years of work behind them is a reality check about where the effort needs to go.

Looking Ahead

Mueller suggested that search visibility shouldn’t be the first priority for new publishers.

“If you love making pages with content like this, and if you’re sure that it hits what other people are looking for, then I’d let others know about your site, and build up a community around it directly. Being visible in popular search results is not the first step to becoming a useful & popular web presence, and of course not all sites need to be popular.”

For publishers starting out, focus on building direct traffic through promotion and community engagement. Search visibility tends to follow after a site establishes itself through other channels.


Featured Image: Jozef Micic/Shutterstock

Google On Phantom Noindex Errors In Search Console via @sejournal, @martinibuster

Google’s John Mueller recently answered a question about phantom noindex errors reported in Google Search Console. Mueller asserted that these reports may be real.

Noindex In Google Search Console

A noindex robots directive is one of the few commands that Google must obey, one of the few ways that a site owner can exercise control over Googlebot, Google’s indexer.

And yet it’s not totally uncommon for search console to report being unable to index a page because of a noindex directive that seemingly does not have a noindex directive on it, at least none that is visible in the HTML code.

When Google Search Console (GSC) reports “Submitted URL marked ‘noindex’,” it is reporting a seemingly contradictory situation:

  • The site asked Google to index the page via an entry in a Sitemap.
  • The page sent Google a signal not to index it (via a noindex directive).

It’s a confusing message from Search Console that a page is preventing Google from indexing it when that’s not something the publisher or SEO can observe is happening at the code level.

The person asking the question posted on Bluesky:

“For the past 4 months, the website has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to disappear from Search Console. There is no noindex anywhere on the website nor robots.txt. We’ve already looked into this… What could be causing this error?”

Noindex Shows Only For Google

Google’s John Mueller answered the question, sharing that there were always a noindex showing to Google on the pages he’s examined where this kind of thing was happening.

Mueller responded:

“The cases I’ve seen in the past were where there was actually a noindex, just sometimes only shown to Google (which can still be very hard to debug). That said, feel free to DM me some example URLs.”

While Mueller didn’t elaborate on what can be going on, there are ways to troubleshoot this issue to find out what’s going on.

How To Troubleshoot Phantom Noindex Errors

It’s possible that there is a code somewhere that is causing a noindex to show just for Google. For example, it may have happened that a page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that time, which in turn would cause the old noindex header to be shown to Googlebot (because it frequently visits the site) while serving a fresh version to the site owner.

Checking the HTTP Header is easy, there are many HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.

A 520 server header response code is one that’s sent by Cloudflare when it’s blocking a user agent.

Screenshot: 520 Cloudflare Response Code

Screenshot showing a 520 error response code

Below is a screenshot of a 200 server response code generated by cloudflare:

Screenshot: 200 Server Response Code

I checked the same URL using two different header checkers, with one header checker returning a a 520 (blocked) server response code and the other header checker sending a 200 (OK) response code. That shows how differently Cloudflare can respond to something like a header checker. Ideally, try checking with several header checkers to see if there’s a consistent 520 response from Cloudflare.

In the situation where a web page is showing something exclusively to Google that is otherwise not visible to someone looking at the code, what you need to do is to get Google to look at the page for you using an actual Google crawler and from a Google IP address. The way to do this is by dropping the URL into Google’s Rich Results Test. Google will dispatch a crawler from a Google IP address and if there’s something on the server (or a CDN) that’s showing a noindex, this will catch it. In addition to the structured data, the Rich Results test will also provide the HTTP response and a snapshot of the web page showing exactly what the server shows to Google.

When you run a URL through the Google Rich Results Test, the request:

  • Originates from Google’s Data Centers: The bot uses an actual Google IP address.
  • Passes Reverse DNS Checks: If the server, security plugin, or CDN checks the IP, it will resolve back to googlebot.com or google.com.

If the page is blocked by noindex, the tool will be unable to provide any structured data results. It should provide a status saying “Page not eligible” or “Crawl failed”. If you see that, click a link for “View Details” or expand the error section. It should show something like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.

This approach does not send the GoogleBot user agent, it uses the Google-InspectionTool/1.0 user agent string. That means if the server block is by IP address then this method will catch it.

Another angle to check is for the situation where a rogue noindex tag is specifically written to block GoogleBot, you can still spoof (mimic) the GoogleBot user agent string with Google’s own User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to identify itself with the GoogleBot user agent and that should catch it.

Screenshot: Chrome User Agent Switcher

Phantom Noindex Errors In Search Console

These kinds of errors can feel like a pain to diagnose but before you throw your hands up in the air take some time to see if any of the steps outlined here will help identify the hidden reason that’s responsible for this issue.

Featured Image by Shutterstock/AYO Production