How Google Discover REALLY Works

This is all based on the Google leak and tallies up with my experience of content that does well in Discover over time. I have pulled out what I think are the most prominent Discover proxies and grouped them into what seems like the appropriate workflow.

Like a disgraced BBC employee, thoughts are my own.

TL;DR

  1. Your site needs to be seen as a trusted source” with low SPAM, evaluated by proxies like publisher trust score, in order to be eligible.
  2. Discover is driven by a six-part pipeline, using good vs. bad clicks (long dwell time vs. pogo-sticking) and repeat visits to continuously score and re-score content quality.
  3. Fresh content gets an initial boost. Success hinges on a strong CTR and positive early-stage engagement (good clicks/shares from all channels count, not just Discover).
  4. Content that aligns with a user’s interests is prioritised. To optimize, focus on your areas of topical authority, use a compelling headline(s), be entity-driven, and use large (1200px+) images.
Image Credit: Harry Clarkson-Bennett

I count 15 different proxies that Google uses to guide satiate the doomscrollers’ desperate need for quality content in the Discover feed. It’s not that different to how traditional Google search works.

But traditional search (a high-quality pull channel) is worlds apart from Discover. Audiences killing time on trains. At their in-laws. The toilet. Given they’re part of the same ecosystem, they’re bundled together into one monolithic entity.

And here’s how it works.

Image Credit: Harry Clarkson-Bennett

Google’s Discover Guidelines

This section is boring, and Google’s guidelines around eligibility are exceptionally vague:

  • Content is automatically eligible to appear in Discover if it is indexed by Google and meets Discover’s content policies.
  • Any kind of dangerous, spammy, deceptive, or violent/vulgar content gets filtered out.

“…Discover makes use of many of the same signals and systems used by Search to determine what is… helpful, reliable, people-first content.”

Then they give some solid, albeit beige advice around quality titles – clicky, not baity as John Shehata would say. Ensuring your featured image is at least 1200px wide and creating timely, value-added content.

But we can do better.

Discover’s Six-Part Content Pipeline

From cradle to grave, let’s review exactly how your content does or, in most cases, doesn’t appear in Discover. As always, remembering I have made these clusters up, albeit based on real Google proxies from the Google leak.

  1. Eligibility check and baseline filtering.
  2. Initial exposure and testing.
  3. User quality assessment.
  4. Engagement and feedback loop.
  5. Personalization layer.
  6. Decay and renewal cycles.

Eligibility And Baseline Filtering

For starters, your site has to be eligible for Google Discover. This means you are seen as a “trusted source” on the topic, and you have a low enough SPAM score that the threshold isn’t triggered.

There are three primary proxy scores to account for eligibility and baseline filtering:

  • is_discover_feed_eligible: a Boolean feature that filters non-eligible pages.
  • publisher_trustScore: a score that evaluates publisher reliability and reputation.
  • topicAuthority_discover: a score that helps Discover identify trusted sources at the topic level.

The site’s reputation and topical authority are ranked for the topic at hand. These three metrics help evaluate whether your site is eligible to appear in Discover.

Initial Exposure And Testing

This is very much the freshness stage, where fresh content is given a temporary boost (because contemporary content is more likely to satiate a dopamine addicted mind).

  • freshnessBoost_discover: provides a temporary boost for fresh content to keep the feed alive.
  • discover_clicks: where early-stage article clicks are used as a predictor of popularity.
  • headlineClickModel_discover: is a predictive CTR model based on the headline and image.

I would hypothesize that using a Bayesian style predictive model, Google applies learnings at a site and subfolder level to predict likely CTR. The more quality content you have published over time (presumably at a site, subfolder and author level), the more likely you are to feature.

Because there is less ambiguity. A key feature of SEO now.

User Quality Assessment

An article is ultimately judged by the quality of user engagement. Google uses the good and bad click style model from Navboost to establish what is and isn’t working for users. Low CTR and/or pogo-sticking style behavior downgrades an article’s chance of featuring.

Valuable content is decided by the good vs bad click ratio. Repeat visits are used to measure lasting satisfaction and re-rank top-performing content.

  • discover_blacklist_score: Penalty for spam, misinformation, or clickbait.
  • goodClicks_discover: Positive user interactions (long dwell time).
  • badClicks_discover: Negative interactions (bounces, short dwell).
  • nav_boosted_discover_clicks: Repeat or return engagement metric.

The quality of the article is then measured by its user engagement. As Discover is a personalized platform, this can be done accurately and at scale. Cohorts of users can be grouped together. People with the same general interests are served the content if, by the algorithm’s standard, they should be interested.

But if the overly clicky or misleading title delivers poor engagement (dwell time and on-page interactions), then the article may be downgraded. Over time, this kind of practice can compound and nerf your site completely.

Headlines like this are a one way ticket to devaluing your brand in the eyes of people and search engines (Image Credit: Harry Clarkson-Bennett)

Important to note that this click data doesn’t have to come from Discover. Once an article is out in the ether – it’s been published, shared on social, etc. – Chrome click data is stored and is applied to the algorithm.

So, the more quality click data and shares you can generate early in an article’s lifecycle (accounting for the importance of freshness), the better your chance of success on Discover. Treat it like a viral platform. Make noise. Do marketing.

Engagement And Feedback Loop

Once the article enters the proverbial fray, a scoring and rescoring loop begins. Continuous CTR, impressions, and explicit user feedback (like, hate, and “don’t show me this again, please” style buttons) feed models like Navboost to refine what gets shown.

  • discover_impressions: The number of times an article appears in a Discover feed.
  • discover_ctr: Clicks divided by impressions. Impressions and click data feed CTR modelling
  • discover_feedback_negative: Specific user feedback, i.e., not interested suppresses content for individuals, groups, and on the platform as a whole.

These behavioral signals define an article’s success. It lives or dies on relatively simple metrics. And the more you use it, the better it gets. Because it knows what you and your cohort are more likely to click and enjoy.

This is as true in Discover as it is in the main algorithm. Google admitted as such in the DoJ rulings. (Image Credit: Harry Clarkson-Bennett)

I imagine headline and image data are stored so that the algorithm can apply some rigorous standards to statistical modelling. Once it knows what types of headlines, images and articles perform best for specific cohorts, personalisation becomes effective faster.

Personalization Layer

Google knows a lot about us. It’s what its business is built on. It collects a lot of non-anonymized data (credit card details, passwords, contact details, etc.) alongside every conceivable interaction you have with webpages.

Discover takes personalization to the next level. I think it may offer an insight into how part of the SERP could look like in the future. A personalized cluster of articles, videos, and social posts designed to hook you in embedded somewhere alongside search results and AI Mode.

All of this is designed to keep you on Google’s owned properties for longer. Because they make more money that way.

Hint: They want to keep you around because they make more money (Image Credit: Harry Clarkson-Bennett)
  • contentEmbeddings_discover: Content embeddings determine how well the content aligns with the user’s interests. This powers Discover’s interest-matching engine.
  • personalization_vector_match: This module dynamically personalises the user’s feed in real-time. It identifies similarity between content and user interest vectors.

Content that matches well with your personal and cohort’s interest will be boosted into your feed.

You can see the site’s you engage with frequently using the site engagement page in Chrome (from your toolbar: chrome://site-engagement/) and every stored interaction with histograms. This histogram data indirectly shows key interaction points you have with web pages, by measuring the browser’s response and performance around those interactions.

It doesn’t explicitly say user A clicked X, but logs the technical impact, i.e., how long did the browser spending processing said click or scroll.

Decay And Renewal Cycles

Discover boosts freshness because people are thirsty for it. By boosting fresh content, older or saturated stories naturally decay as the news cycle moves on and article engagement declines.

For successful stories, this is through market saturation.

  • freshnessDecay_timer: This module measures recency decay after initial exposure, gradually reducing visibility to make way for fresher content.
  • content_staleness_penalty: Outdated content or topics are given a lower priority once engagement starts to decline to keep the feed current.

Discover is Google’s answer to a social network. None of us spend time in Google. It’s not fun. I use the word fun loosely. It isn’t designed to hook us in and ruin our attention spans with constant spiking of dopamine.

But Google Discover is clearly on the way to that. They want to make it a destination. Hence, all the recent changes where you can “catch up” with creators and publishers you care about across multiple platforms.

Videos, social posts, articles … the whole nine yards. I wish they’d stop summarizing literally everything with AI, however.

My 11-Step Workflow To Get The Most Out Of Google Discover

Follow basic principles and you will put yourself in good stead. Understand where your site is topically strong and focus your time on content that will drive value. Multiple ways you can do this.

If you don’t feature much in Discover, you can use your Search Console click and impressions data to identify areas where you generate the highest value. Where you are topically authoritative. I would do this at a subfolder and entity level (e.g., politics and Rachel Reeves or the Labor Party).

Also worth breaking this down in total and by article. Or you can use something like Ahrefs’ Traffic Share report to determine your share of voice via third-party data.

Essentially share of voice data (Image Credit: Harry Clarkson-Bennett)

Then really focus your time on a) areas where you’re already authoritative and b) areas that drive value for your audience.

Assuming you’re not focusing on NSFW content and you’re vaguely eligible, here’s what I would do:

  1. Make sure you’re meeting basic image requirements. 1200 pixels wide as a minimum.
  2. Identify your areas of topical authority. Where do you already rank effectively at a subfolder level? Is there a specific author who performs best? Try to build on your valuable content hubs with content that should drive extra value in this area.
  3. Invest in content that will drive real value (links and engagement) in these areas. Do not chase clicks via Discover. It’s a one-way ticket to clickbait city.
  4. Make sure you’re plugged into the news cycle. Being first has a huge impact on your news visibility in search. If you’re not first on the scene, make sure you’re adding something additional to the conversation. Be bold. Add value. Understand how news SEO really works.
  5. Be entity-driven. In your headlines, first paragraph, subheadings, structured data, and image alt text. Your page should remove ambiguity. You need to make it incredibly clear who this page is about. A lack of clarity is partly why Google rewrites headlines.
  6. Use the Open Graph title. The OG title is a headline that doesn’t show on your page. Primarily designed for social media use, it is one of the most commonly picked up headlines in Discover. It can be jazzy. Curiosity led. Rich. Interesting. But still entity-focused.
  7. Make sure you share content likely to do well on Discover across relevant push channels early in its lifecycle. It needs to outperform its predicted early-stage performance.*
  8. Create a good page experience. Your page (and site) should be fast, secure, ad-lite, and memorable for the right reasons.
  9. Try to drive quality onward journeys. If you can treat users from Discover differently to your main site, think about how you would link effectively for them. Maybe you use a pop-up “we think you’ll like this next” section based on a user’s scroll depth of dwell time.
  10. Get the traffic to convert. While Discover is a personalized feed, the standard scroller is not very engaged. So, focus on easier conversions like registrations (if you’re a subscriber first company) or advertising revenue et al.
  11. Keep a record of your best performers. Evergreen content can be refreshed and repubbed year after year. It can still drive value.

*What I mean here is if your content is predicted to drive three shares and two links, if you share it on social and in newsletters and it drives seven shares and nine links, it is more likely to go viral.

As such, the algorithm identifies it as ‘Discover-worthy.’

More Resources:


This was originally published on Leadership in SEO.


Featured Image: Roman Samborskyi/Shutterstock

How to Turn Every Campaign Into Lasting SEO Authority [Webinar] via @sejournal, @hethr_campbell

Capture Links, Mentions, and Citations That Make a Difference

Backlinks alone no longer move the authority needle. Brand mentions are just as critical for visibility, recognition, and long-term SEO success. Are your campaigns capturing both?

Join Michael Johnson, CEO of Resolve, for a webinar where he shares a replicable campaign framework that aligns media outreach, SEO impact, and brand visibility, helping your campaigns become long-term assets.

What You’ll Learn

  • The Resolve Campaign Framework: Step-by-step approach to ideating, creating, and pitching SEO-focused digital PR campaigns.
  • The Dual Outcome Strategy: How to design campaigns that earn both high-quality backlinks and brand mentions from top-tier media.
  • Real Campaign Case Studies: Examples of campaigns that created a compounding effect of links, mentions, and brand recognition.
  • Techniques for Measuring Success: How to evaluate the SEO and branding impact of your campaigns.

Why You Can’t Miss This Webinar

Successful SEO campaigns today capture authority on multiple fronts. This session provides actionable strategies for engineering campaigns that work hand in hand with SEO, GEO, and AEO to grow your brand.

📌 Register now to learn how to design campaigns that earn visibility, links, and citations.

🛑 Can’t attend live? Register anyway, and we’ll send you the recording so you don’t miss out.

The AI Search Visibility Audit: 15 Questions Every CMO Should Ask

This post was sponsored by IQRush. The opinions expressed in this article are the sponsor’s own.

Your traditional SEO is winning. Your AI visibility is failing. Here’s how to fix it.

Your brand dominates page one of Google. Domain authority crushes competitors. Organic traffic trends upward quarter after quarter. Yet when customers ask ChatGPT, Perplexity, or others about your industry, your brand is nowhere to be found.

This is the AI visibility gap, which causes missed opportunities in awareness and sales.

SEO ranking on page one doesn’t guarantee visibility in AI search.  The rules of ranking have shifted from optimization to verification.”

Raj Sapru, Netrush, Chief Strategy Officer

Recent analysis of AI-powered search patterns reveals a troubling reality: commercial brands with excellent traditional SEO performance often achieve minimal visibility in AI-generated responses. Meanwhile, educational institutions, industry publications, and comparison platforms consistently capture citations for product-related queries.

The problem isn’t your content quality. It’s that AI engines prioritize entirely different ranking factors than traditional search: semantic query matching over keyword density, verifiable authority markers over marketing claims, and machine-readable structure over persuasive copy.

This audit exposes 15 questions that separate AI-invisible brands from citation leaders.

We’re sharing the first 7 critical questions below, covering visibility assessment, authority verification, and measurement fundamentals. These questions will reveal your most urgent gaps and provide immediate action steps.

Question 1: Are We Visible in AI-Powered Search Results?

Why This Matters: Commercial brands with strong traditional SEO often achieve minimal AI citation visibility in their categories. A recent IQRush field audit found fewer than one in ten AI-generated answers included in the brand, showing how limited visibility remains, even for strong SEO performers. Educational institutions, industry publications, and comparison sites dominate AI responses for product queries—even when commercial sites have superior content depth. In regulated industries, this gap widens further as compliance constraints limit commercial messaging while educational content flows freely into AI training data.

How to Audit:

  • Test core product or service queries through multiple AI platforms (ChatGPT, Perplexity, Claude)
  • Document which sources AI engines cite: educational sites, industry publications, comparison platforms, or adjacent content providers
  • Calculate your visibility rate: queries where your brand appears vs. total queries tested

Action: If educational/institutional sources dominate, implement their citation-driving elements:

  • Add research references and authoritative citations to product content
  • Create FAQ-formatted content with an explicit question-answer structure
  • Deploy structured data markup (Product, FAQ, Organization schemas)
  • Make commercial content as machine-readable as educational sources

IQRush tracks citation frequency across AI platforms. Competitive analysis shows which schema implementations, content formats, and authority signals your competitors use to capture citations you’re losing.

Question 2: Are Our Expertise Claims Actually Verifiable?

Why This Matters: Machine-readable validation drives AI citation decisions: research references, technical standards, certifications, and regulatory documentation. Marketing claims like “industry-leading” or “trusted by thousands” carry zero weight. In one IQRush client analysis, more than four out of five brand mentions were supported by citations—evidence that structured, verifiable content is far more likely to earn visibility. Companies frequently score high on human appeal—compelling copy, strong brand messaging—but lack the structured authority signals AI engines require. This mismatch explains why brands with excellent traditional marketing achieve limited citation visibility.

How to Audit:

  • Review your priority pages and identify every factual claim made (performance stats, quality standards, methodology descriptions)
  • For each claim, check whether it links to or cites an authoritative source (research, standards body, certification authority)
  • Calculate verification ratio: claims with authoritative backing vs. total factual claims made

Action: For each unverified claim, either add authoritative backing or remove the statement:

  • Add specific citations to key claims (research databases, technical standards, industry reports)
  • Link technical specifications to recognized standards bodies
  • Include certification or compliance verification details where applicable
  • Remove marketing claims that can’t be substantiated with machine-verifiable sources

IQRush’s authority analysis identifies which claims need verification and recommends appropriate authoritative sources for your industry, eliminating research time while ensuring proper citation implementation.

Question 3: Does Our Content Match How People Query AI Engines?

Why This Matters: Semantic alignment matters more than keyword density. Pages optimized for traditional keyword targeting often fail in AI responses because they don’t match conversational query patterns. A page targeting “best project management software” may rank well in Google but miss AI citations if it doesn’t address how users actually ask: “What project management tool should I use for a remote team of 10?” In recent IQRush client audits, AI visibility clustered differently across verticals—consumer brands surfaced more frequently for transactional queries, while financial clients appeared mainly for informational intent. Intent mapping—informational, consideration, or transactional—determines whether AI engines surface your content or skip it.

How to Audit:

  • Test sample queries customers would use in AI engines for your product category
  • Evaluate whether your content is structured for the intent type (informational vs. transactional)
  • Assess if content uses conversational language patterns vs. traditional keyword optimization

Action: Align content with natural question patterns and semantic intent:

  • Restructure content to directly address how customers phrase questions
  • Create content for each intent stage: informational (education), consideration (comparison), transactional (specifications)
  • Use conversational language patterns that match AI engine interactions
  • Ensure semantic relevance beyond just keyword matching

IQRush maps your content against natural query patterns customers use in AI platforms, showing where keyword-optimized pages miss conversational intent.

Question 4: Is Our Product Information Structured for AI Recommendations?

Why This Matters: Product recommendations require structured data. AI engines extract and compare specifications, pricing, availability, and features from schema markup—not from marketing copy. Products with a comprehensive Product schema capture more AI citations in comparison queries than products buried in unstructured text. Bottom-funnel transactional queries (“best X for Y,” product comparisons) depend almost entirely on machine-readable product data.

How to Audit:

  • Check whether product pages include Product schema markup with complete specifications
  • Review if technical details (dimensions, materials, certifications, compatibility) are machine-readable
  • Test transactional queries (product comparisons, “best X for Y”) to see if your products appear
  • Assess whether pricing, availability, and purchase information is structured

Action: Implement comprehensive product data structure:

  • Deploy Product schema with complete technical specifications
  • Structure comparison information (tables, lists) that AI can easily parse
  • Include precise measurements, certifications, and compatibility details
  • Add FAQ schema addressing common product selection questions
  • Ensure pricing and availability data is machine-readable

IQRush’s ecommerce audit scans product pages for missing schema fields—price, availability, specifications, reviews—and prioritizes implementations based on query volume in your category.

Question 5: Is Our “Fresh” Content Actually Fresh to AI Engines?

Why This Matters: Recency signals matter, but timestamp manipulation doesn’t work. Pages with recent publication dates, but outdated information underperforms older pages with substantive updates: new research citations, current industry data, or refreshed technical specifications. Genuine content updates outweigh simple republishing with changed dates.

How to Audit:

  • Review when your priority pages were last substantively updated (not just timestamp changes)
  • Check whether content references recent research, current industry data, or updated standards
  • Assess if “evergreen” content has been refreshed with current examples and information
  • Compare your content recency to competitors appearing in AI responses

Action: Establish genuine content freshness practices:

  • Update high-priority pages with current research, data, and examples
  • Add recent case studies, industry developments, or regulatory changes
  • Refresh citations to include latest research or technical standards
  • Implement clear “last updated” dates that reflect substantive changes
  • Create update schedules for key content categories

IQRush compares your content recency against competitors capturing citations in your category, flagging pages that need substantive updates (new research, current data) versus pages where timestamp optimization alone would help.

Question 6: How Do We Measure What’s Actually Working?

Why This Matters: Traditional SEO metrics—rankings, traffic, CTR—miss the consideration impact of AI citations. Brand mentions in AI responses influence purchase decisions without generating click-through attribution, functioning more like brand awareness channels than direct response. CMOs operating without AI visibility measurement can’t quantify ROI, allocate budgets effectively, or report business impact to executives.

How to Audit:

  • Review your executive dashboards: Are AI visibility metrics present alongside SEO metrics?
  • Examine your analytics capabilities: Can you track how citation frequency changes month-over-month?
  • Assess competitive intelligence: Do you know your citation share relative to competitors?
  • Evaluate coverage: Which query categories are you blind to?

Action: Establish AI citation measurement:

  • Track citation frequency for core queries across AI platforms
  • Monitor competitive citation share and positioning changes
  • Measure sentiment and accuracy of brand mentions
  • Add AI visibility metrics to executive dashboards
  • Correlate AI visibility with consideration and conversion metrics

IQRush tracks citation frequency, competitive share, and month-over-month trends across across AI platforms. No manual testing or custom analytics development is required.

Question 7: Where Are Our Biggest Visibility Gaps?

Why This Matters: Brands typically achieve citation visibility for a small percentage of relevant queries, with dramatic variation by funnel stage and product category. IQRush analysis showed the same imbalance: consumer brands often surfaced in purchase-intent queries, while service firms appeared mostly in educational prompts. Most discovery moments generate zero brand visibility. Closing these gaps expands reach at stages where competitors currently dominate.

How to Audit:

  • List queries customers would ask about your products/services across different funnel stages
  • Group them by funnel stage (informational, consideration, transactional)
  • Test each query in AI platforms and document: Does your brand appear?
  • Calculate what percentage of queries produce brand mentions in each funnel stage
  • Identify patterns in the queries where you’re absent

Action: Target the funnel stages with lowest visibility first:

  • If weak at informational stage: Build educational content that answers “what is” and “how does” queries
  • If weak at consideration stage: Create comparison content structured as tables or side-by-side frameworks
  • If weak at transactional stage: Add comprehensive product specs with schema markup
  • Focus resources on stages where small improvements yield largest reach gains

IQRush’s funnel analysis quantifies gap size by stage and estimates impact, showing which content investments will close the most visibility gaps fastest.

The Compounding Advantage of Early Action

The first seven questions and actions highlight the differences between traditional SEO performance and AI search visibility. Together, they explain why brands with strong organic rankings often have zero citations in AI answers.

The remaining 8 questions in the comprehensive audit help you take your marketing further. They focus on technical aspects: the structure of your content, the backbone of your technical infrastructure, and the semantic strategies that signal true authority to AI. 

“Visibility in AI search compounds, making it harder for your competition to break through. The brands that make themselves machine-readable today will own the conversation tomorrow.”
Raj Sapru, Netrush, Chief Strategy Officer

IQRush data shows the same thing across industries: early brands that adopt a new AI answer engine optimization strategy quickly start to lock in positions of trust that competitors can’t easily replace. Once your brand becomes the reliable answer source, AI engines will start to default to you for related queries, and the advantage snowballs.

The window to be an early adopter and take AI visibility for your brand will not stay open forever.  As more brands invest in AI visibility, the visibility race is heating up.

Download the Complete AI Search Visibility Audit with detailed assessment frameworks, implementation checklists, and the 8 strategic questions covering content architecture, technical infrastructure, and linguistic optimization. Each question includes specific audit steps and immediate action items to close your visibility gaps and establish authoritative positioning before your market becomes saturated with AI-optimized competitors.

Image Credits

Featured Image: Image by IQRush. Used with permission.

In-Post Images: Image by IQRush. Used with permission.

Google’s Advice On Canonicals: They’re Case Sensitive via @sejournal, @martinibuster

Google’s John Mueller answered a question about canonicals, expressing his opinion that “hope” shouldn’t be a part of your SEO strategy with regard to canonicals. The implication is that hoping Google will figure it out on its own misses the point of what SEO is about.

Canonicals And Case Sensitivity

Rel=canonical is an HTML tag that enables a publisher or SEO to tell Google what their preferred URL is. For example, it’s useful for suggesting the best URL when there are multiple URLs with the same or similar content. Google isn’t obligated to obey the rel=canonical declaration, it’s treated as a strong hint.

Someone on Reddit was in the situation where a website has category names that they begin with a capitalized letter but the canonical tag contains a lowercase version. There is currently a redirect from the lowercase version to the uppercase.

They’re currently not seeing any negative impact from this state of the website and were asking if it’s okay to leave it as-is because it hasn’t affected search visibility.

The person asking the question wrote:

“…I’m running into something annoying on our blog and could use a sanity check before I push dev too hard to fix it. It’s been an issue for a month, after a redesign was launched.

All of our URLs resolve in this format: /site/Topic/topic-title/

…but the canonical tag uses a lowercase topic, like: /site/topic/topic-title/

So the canonical doesn’t exactly match the actual URL’s case. Lowercase topic 301 redirects to the correct, uppercase version.

I know that mismatched canonicals can send mixed signals to Google.

Dev is asking, “Are you seeing any real impact from this?” and technically, the answer is no — but I still think it’s worth fixing to follow best practices.”

If It Works Don’t Fix It?

This is an interesting case because in many things related to SEO if something’s working there’s little point trying to fix a small detail for fear of triggering a negative response. Relying on Google to figure things out is another fallback.

Google’s John Mueller has a different opinion. He responded:

“URL path, filename, and query parameters are case-sensitive, the hostname / domain name aren’t. Case-sensitivity matters for canonicalization, so it’s a good idea to be consistent there. If it serves the same content, it’ll probably be seen as a duplicate and folded together, but “hope” should not be a part of an SEO strategy.

Case-sensitivity in URLs also matters for robots.txt.”

Takeaway

I know that in highly competitive niches the SEO is on a generally flawless level. If there’s something to improve it gets improved. And there’s a good reason for that. Someone at one of the search engines once told me that anything you can do to make it easier for the crawlers is a win. They advised me to make sites easy to crawl and content easy to understand. That advice is still useful, it follows with Mueller’s advice to not “hope” that Google figures things out, implying that it’s best to make sure they do work out.

Featured Image by Shutterstock/MyronovDesign

Does Schema Markup Help AI Visibility?

Google and Bing publish guidelines for traditional search engine optimization and provide tools to measure performance.

We have no such instruction from generative engine providers, making optimization much more challenging. The result is a slew of misleading and uninformed speculation.

The importance of Schema.org markup is an example.

Schema for LLMs?

I’ve seen no statement or indication from a large language model regarding structured data markup, including Schema.org’s.

Google has long advised using such markup for traditional organic search, stating:

Google Search works hard to understand the content of a page. You can help us by providing explicit clues about the meaning of a page to Google by including structured data on the page.

The search giant generates rich snippets from select structured data and gathers info on a business from additional markup types, such as Schema.org’s Organization, FAQPage, and Author.

While answers from Google’s AI Mode tend to come from top organic rankings, we don’t know the impact of structured data on AI agents or crawlers.

Unlike Google, LLMs have no native indexes. They generate answers based on their training data (which doesn’t store URLs or code) and from external search engines such as Google, Bing, Reddit, and YouTube.

To access a page, LLMs can (i) query traditional search engines, indirectly relying on structured data markup such as Schema.org, and (ii) crawl a page directly to fetch answers.

AI Visibility

Many businesses don’t understand Schema.org markup, and thus retain the GEO services that claim implementing it will increase AI visibility.

Don’t be misled. I’ve seen no reputable case studies demonstrating that structured data improves AI mentions or citations. Implementation, moreover, is easy (and cheap) with apps and plugins.

Instead, focus on the proven long-term tactics:

  • Emphasize and invest in overall brand visibility, and track Google searches for your company and products.
  • Ensure your brand and its benefits appear alongside competitors in “best-of” listicles and recommendations.
  • Optimize your product feeds for conversational searches. Prompts are much more specific and diverse than search queries. Provide as much detail as possible to capture all kinds of conversations.

Low Priority

Structured data markup such as Schema.org likely drives organic search rankings and therefore helps AI visibility indirectly. Yet implementation is easy and almost certainly a low priority. What really matters for AI visibility is relevant content and long-term brand building.

Automattic’s Legal Claims About SEO… Is This Real? via @sejournal, @martinibuster

SEO plays a role in Automattic’s counterclaim against WP Engine. The legal document mentions search engine optimization six times and SEO once as part of counterclaims asserting that WP Engine excessively used words like “WordPress” to rank in search engines as part of an “infringement” campaign that uses WordPress trademarks in commerce. A close look at those claims shows that some of the evidence may be biased and that claims about SEO rely on outdated information.

Automattic’s Claims About SEO

Automattic’s counterclaim asserts that WP Engine used SEO to rank for WordPress-related keywords and that this is causing confusion.

The counterclaim explains:

“WP Engine also has sown confusion in recent years by dramatically increasing the number of times Counterclaimants’ Marks appear on its websites. Starting in or around 2021, WP Engine began to sharply increase its use of the WordPress Marks, and starting in or around 2022, began to sharply increase its use of the WooCommerce Marks.”

Automattic next argues that the repetition of keywords on a web page is WP Engine’s SEO strategy. Here’s where their claims become controversial to those who know how search engines rank websites.

The counterclaim asserts:

“The increased number of appearances of the WordPress Marks on WP Engine’s website is particularly likely to cause confusion in the internet context.

On information and belief, internet search engines factor in the number of times a term appears in a website’s text in assessing the “relevance” of a website to the terms a user enters into the search engine when looking for websites.

WP Engine’s decision to increase the number of times the WordPress Marks appear on WP Engine’s website appears to be a conscious “search engine optimization” strategy to ensure that when internet users look for companies that offer services related to WordPress, they will be exposed to confusingly written and formatted links that take them to WP Engine’s sites rather than WordPress.org or WordPress.com.”

They call WP Engine’s strategy aggressive:

“WP Engine’s strategy included aggressive utilization of search engine optimization to use the WordPress and WooCommerce Marks extremely frequently and confuse consumers searching for authorized providers of WordPress and WooCommerce software;”

Is The Number Of Keywords Used A Ranking Factor?

I have twenty-five years of experience in search engine optimization and have a concomitantly deep understanding of how search engines rank content. The fact is that Automattic’s claim that search engines “factor in the number of times” a keyword is used in a website’s content is incorrect. Modern search engines don’t factor in the number of times a keyword appears on a web page as a ranking factor. Google’s algorithms use models like BERT to gain a semantic understanding of the meaning and intent of the keyword phrases used in search queries and content, resulting in the ability to rank content that doesn’t even contain the user’s keywords.

Those aren’t just my opinions; Google’s web page about how search works explicitly says that content is ranked according to the user’s intent, regardless of keywords, which directly contradicts Automattic’s claim about WPE’s SEO:

“To return relevant results, we first need to establish what you’re looking for – the intent behind your query. To do this, we build language models to try to decipher how the relatively few words you enter into the search box match up to the most useful content available.

This involves steps as seemingly simple as recognizing and correcting spelling mistakes, and extends to our sophisticated synonym system that allows us to find relevant documents even if they don’t contain the exact words you used.”

If Google’s documentation is not convincing enough, take a look at the search results for the phrase “Managed WordPress Hosting.” WordPress.com ranks #2, despite the phrase being completely absent from its web page.

Screenshot Of WordPress.com In Search Results

What Is The Proof?

Automattic provides a graph comparing WP Engine’s average monthly mentions of the word “WordPress” with mentions published by 18 other web hosts. The comparison of 19 total web hosts dramatically illustrates that WP Engine mentions WordPress more often than any of the other hosting providers, by a large margin.

Screenshot Of Graph

Here’s a close-up of the graph (with the values inserted) showing that WP Engine’s monthly mentions of “WordPress” far exceed the number of times words containing WordPress are used on the web pages of the other hosts.

Screenshot Of Graph Closeup

People say that numbers don’t lie, and the graph presents compelling evidence that WP Engine is deploying an aggressive use of keywords with the word WordPress in them. Leaving aside the debunked idea that keyword-term spamming actually works, a closer look at the graph comparison shows that the evidence is not so strong because it is biased.

Automattic’s Comparison Is Arguably Biased

Automattic’s counterclaim compares eighteen web hosts against WP Engine. Of those eighteen hosts, only five (including WPE) are managed WordPress hosting platforms. The remaining fourteen are generalist hosting platforms that offer cloud hosting, VPS (virtual private servers), dedicated hosting, and domain name registrations.

The significance of this fact is that the comparison can be considered biased against WP Engine because the average mention of WordPress will naturally be lower across the entire website of a company that offers multiple services (like VPS, dedicated hosting, and domain name registrations) versus a site like WP Engine that offers only one service, managed WordPress hosting.

Two of the hosts listed in the comparison, Namecheap and GoDaddy, are primarily known as domain name registrars. Namecheap is the second biggest domain name registrar in the world. There’s no need to belabor the point that these two companies in Automattic’s comparison may be biased choices to compare against WP Engine.

Of the five hosts that offer WordPress hosting, two are plugin platforms: Elementor and WPMU Dev. Both are platforms built around their respective plugins, which means that the average number of mentions of WordPress is going to be lower because the average may be diluted by documentation and blog posts about the plugins. Those two companies are also arguably biased choices for this kind of comparison.

Of the eighteen hosts that Automattic chose to compare with WP Engine, only two of them are comparable in service to WP Engine: Kinsta and Rocket.net.

Comparison Of Managed WordPress Hosts

Automattic compares the monthly mentions of phrases with “WordPress” in them, and it’s clear that the choice of hosts in the comparison biases the results against WP Engine. A fairer comparison is to compare the top-ranked web page for the phrase “managed WordPress hosting.”

The following is a comparison of the top-ranked web page for each of the three managed WordPress hosts in Automattic’s comparison list, a straightforward one-to-one comparison. I used the phrase “managed WordPress hosting” plus the domain name appended to a search query in order to surface the top-ranked page from each website and then compared how many times the word “WordPress” is used on those pages.

Here are the results:

Rocket.net

The home page of Rocket.net ranks #1 for the phrase “rocket.net managed wordpress hosting.” The home page of Rocket.net contains the word “WordPress” 21 times.

Screenshot of Google’s Search Results

Kinsta

The top ranked Kinsta page is kinsta.com/wordpress-hosting/ and that page mentions the word “WordPress” 55 times.

WP Engine

The top ranked WP Engine web page is wpengine.com/managed-wordpress-hosting/ and that page mentions the word “WordPress” 27 times.

A fair one-to-one comparison of managed WordPress host providers, selected from Automattic’s own list, shows that WP Engine is not using the word “WordPress” more often than its competitors. Its use falls directly in the middle of a fair one-to-one comparison.

Number Of Times Page Mentions WordPress

  • Rocket.net: 21 times
  • WP Engine: 27 times
  • Kinsta: 55 times

What About Other Managed WordPress Hosts?

For the sake of comparison, I compared an additional five managed WordPress hosts that Automattic omitted from its comparison to see how often the word “WordPress” was mentioned on the top-ranked web pages of WP Engine’s direct competitors.

Here are the results:

  • WPX Hosting: 9
  • Flywheel: 16
  • InstaWP: 22
  • Pressable: 23
  • Pagely: 28

It’s apparent that WP Engine’s 27 mentions put it near the upper level in that comparison, but nowhere near the level at which Kinsta mentions “WordPress.” So far, we only see part of the story. As you’ll see, other web hosts use the word “WordPress” far more than Kinsta does, and it won’t look like such an outlier when compared to generalist web hosts.

A Comparison With Generalist Web Hosts

Next, we’ll compare the generalist web hosts listed in Automattic’s comparison.

I did the same kind of search for the generalist web hosts to surface their top-ranked pages for the query “managed WordPress hosting” plus the name of the website, which is a one-to-one comparison to WP Engine.

Other Web Hosts Compared To WP Engine:

  1. InMotion Hosting: 101 times
  2. Greengeeks: 97 times
  3. Jethost: 71 times
  4. Verpex: 52 times
  5. GoDaddy: 49 times
  6. Cloudways: 47 times
  7. Namecheap: 41 times
  8. Liquidweb: 40 times
  9. Pair: 40 times
  10. Hostwinds: 37 times
  11. KnownHost: 33 times
  12. Mochahost: 33 times
  13. Panthen: 31 times
  14. Siteground: 30 times
  15. WP Engine: 27 times

Crazy, right? WP Engine uses the word “WordPress” less often than any of the other generalist web hosts. This one-to-one comparison contradicts Automattic’s graph.

And just for the record, WordPress.com’s top-ranked page wordpress.com/hosting/ uses the word “WordPress” 62 times, over twice as often as WP Engine’s web page.

Will Automattic’s SEO Claims Be Debunked?

Automattic’s claims about WP Engine’s use of SEO may be based on shaky foundations. The claims about how keywords work for SEO contradict Google’s own documentation, and the fact that WordPress.com’s own website ranks for the phrase “Managed WordPress Hosting” despite not using that exact phrase appears to debunk their assertion that search engines factor the number of times a user’s keywords are used on a web page.

The graph that Automattic presents in their counterclaim does not represent a comparison of direct competitors, which may contribute to a biased impression that WP Engine is aggressively using the “WordPress” keywords more often than competitors. However, a one-to-one comparison of the actual web pages that compete against each other for the phrase “Managed WordPress Hosting” shows that many of the web hosts in Automattic’s own list use the word “WordPress” far more often than WP Engine, which directly contradicts Automattic’s narrative.

I ran WP Engine’s Managed WordPress Hosting URL in a Keyword Density Tool, and it shows that WP Engine’s web page uses the word “WordPress” a mere 1.92% of the time, which, from an SEO point of view, could be considered a modest amount and far from excessive. It will be interesting to see how the judge decides the merits of Automattic’s SEO-related claims.

Featured Image by Shutterstock/file404

The Same But Different: Evolving Your Strategy For AI-Driven Discovery via @sejournal, @alexmoss

The web – and the way in which humans interact with it – has definitely changed since the early days of SEO and the emergence of search engines in the early to mid-90s. In that time, we’ve witnessed the internet turn from something that nobody understood to something most of the world cannot operate without. This interview between Bill Gates and David Letterman puts this 30-year phenomenon into perspective:

The attitude 30 years ago was that the internet was not understood at all and nor was its potential influence. Today, the concept of AI entering into our daily lives is taken much more seriously to the point that it is something that many look upon with fear – perhaps now because we [think] we have an accurate outlook on how this may progress.

This transformation isn’t so much about the skills we’ve developed over time, but rather about the evolution of the technology and channels that surround them. Those technologies and channels are evolving at a fast pace and causing some to panic over whether their inherent technological skills will still apply within today’s Search ecosystem.

The Technological Rat Race

Right now, it may feel like there’s something new to learn or a new product to experiment with every day, and it can be difficult to decide where to focus your attention and priorities. This is, unfortunately, a phase that I believe will continue for a good couple of years as the dust settles over this wild west of change.

Because these changes are impacting nearly everything an SEO would be responsible for as part of organic visibility, it may feel overwhelming to digest all these things – all while we seemingly take on the challenge of communicating these changes to our clients or stakeholders/board members.

But change does not equal the end of days. This “change” relates to the technology around what we’ve been doing for over a generation, and not the foundation of the discipline itself.

Old Hat Is New Hat

The major search engines have been actively telling you, including Google and Bing, that core SEO principles should still be at the forefront of what we do moving forward. Danny Sullivan, former Search Liaison at Google, also made this clear during his recent keynote at WordCamp USA:

The consistent messages are clear:

  • Produce well-optimized sites that perform well.
  • Populate solid structured data and entity knowledge graphs.
  • Re-enforce brand sentiment and perspective.
  • Offer unique, valuable content for people.

The problem some may have is that the content we produce is moreso for agents than for people, and if this is true, what impact does this make?

The Web Is Splitting Into Two

The open web has been disrupted most of all, with some business models being uprooted by taking solved knowledge and serving it within their platform, appropriating the human visitor, which they rely on for income.

This has created a split from a complete open web into two – the “human” web and the “agentic” web – where these two audiences are both major considerations and will differ from site to site. SEOs will have to consider both sides of the web and how to serve both – which is where an SEO’s skill set becomes more valuable than it was before.

One example can be seen in the way that agents now take charge of ecommerce transactions, where OpenAI announced “Buy it in ChatGPT,” where the buying experience is even more seamless with instant checkouts. It also open-sourced the technology behind it, Agentic Commerce Protocol (ACP), and is already being adopted by content management system (CMS), including Shopify. This split between agentic and human engagement will still require optimization in order to ensure maximum discoverability.

When it comes to content, ensure everything is concise and avoid fluff, what I refer to as “tokenization spam.” Content isn’t just crawled; it’s processed, chunked, and tokenized. Agents will take preference to well-structured and formatted text.

“Short-Term Wins” Sounds Like Black Hat

Of course, during any technological shift, there will be some bad actors who may tell you about a brand-new tactic that is guaranteed to work to help you “rank in AI.” Remember that the dust has not yet settled when it comes to the maturity of these assistance engines, and you should compare this to the pre-Panda/Penguin era of SEO, where black hat SEO techniques were easier to achieve.

These algorithm updates closed those loopholes, and the same will happen again as these platforms improve – with increased speed as agents understand what is truly honest with increased precision.

Success Metrics Will Change, Not The Execution To Influence Them

In reality, core SEO principles and foundations are still the same and have been throughout most changes in the past – including “the end of desktop” when mobiles became more dominant; and “the end of typing” when voice search started to grow with products such as Alexa, Google Home, and even Google Glass.

Is the emergence of AI going to render what I do as an SEO obsolete? No.

Technical SEO remains the same, and the attributes that agents look at are not dissimilar to what we would be optimizing if large language models (LLMs) weren’t around. Brand marketing remains the same. While the term “brand sentiment” is a term used more widely nowadays, it is something that should have always been a part of our online marketing strategies when it comes to authority, relevance, and perspective.

That being said, our native metrics have been devalued within two years, and those metrics will continue to shift alongside the changes that are yet to come as these platforms deliver more stability. This has already skewed year-over-year data and will continue to skew for the year ahead as more LLMs evolve. This, however, could be compared to events such as replacing granular organic keyword data with one (not provided) metric within Google Analytics, the deprecation of Yahoo! Site Explorer, or devaluation of benchmark data such as Alexa Rank and Google PageRank.

Revise Your Success Metric Considerations

Success metrics now have to go beyond the SERP into visibility and discoverability as a whole, through multiple channels. There are now several tools and platforms available that can analyze and report on AI-focused visibility metrics, such as Yoast AI Brand Insights, that can provide better insight into how your brand is interpreted by LLMs.

If you’re more technical, make use of MCPs (Model Context Protocol) to understand data more via natural language dialogs. MCP is an open-source standard that lets AI applications connect to external systems like databases, tools, or workflows (you can visualize this as a USB-C port for AI) so they can access information and perform tasks using a simple, unified connection. There are several MCPs you can work with already, including:

You can take this a step further by coupling this with a vibe coding tool such as Claude Code, where you can use it to create a reporting app that uses a combination of the above MCP servers in order to extract the best data and create visuals and interactive charts for you and your clients/stakeholders.

The Same But Different … But Still The Same

While the divergence between human and agentic experiences is increasing, the methods by which we, as an SEO, would optimize for them are not too dissimilar. Leverage both within your strategy – just as you did when mobile gained traction in the same way.

More Resources:


Featured Image: Vallabh Soni/Shutterstock

Measuring When AI Assistants And Search Engines Disagree via @sejournal, @DuaneForrester

Before you get started, it’s important to heed this warning: There is math ahead! If doing math and learning equations makes your head swim, or makes you want to sit down and eat a whole cake, prepare yourself (or grab a cake). But if you like math, if you enjoy equations, and you really do believe that k=N (you sadist!), oh, this article is going to thrill you as we explore hybrid search in a bit more depth.

(Image Credit: Duane Forrester)

For years (decades), SEO lived inside a single feedback loop. We optimized, ranked, and tracked. Everything made sense because Google gave us the scoreboard. (I’m oversimplifying, but you get the point.)

Now, AI assistants sit above that layer. They summarize, cite, and answer questions before a click ever happens. Your content can be surfaced, paraphrased, or ignored, and none of it shows in analytics.

That doesn’t make SEO obsolete. It means a new kind of visibility now runs parallel to it. This article shows ideas of how to measure that visibility without code, special access, or a developer, and how to stay grounded in what we actually know.

Why This Matters

Search engines still drive almost all measurable traffic. Google alone handles almost 4 billion searches per day. By comparison, Perplexity’s reported total annual query volume is roughly 10 billion.

So yes, assistants are still small by comparison. But they’re shaping how information gets interpreted. You can already see it when ChatGPT Search or Perplexity answers a question and links to its sources. Those citations reveal which content blocks (chunks) and domains the models currently trust.

The challenge is that marketers have no native dashboard to show how often that happens. Google recently added AI Mode performance data into Search Console. According to Google’s documentation, AI Mode impressions, clicks, and positions are now included in the overall “Web” search type.

That inclusion matters, but it’s blended in. There’s currently no way to isolate AI Mode traffic. The data is there, just folded into the larger bucket. No percentage split. No trend line. Not yet.

Until that visibility improves, I’m suggesting we can use a proxy test to understand where assistants and search agree and where they diverge.

Two Retrieval Systems, Two Ways To Be Found

Traditional search engines use lexical retrieval, where they match words and phrases directly. The dominant algorithm, BM25, has powered solutions like Elasticsearch and similar systems for years. It’s also in use in today’s common search engines.

AI assistants rely on semantic retrieval. Instead of exact words, they map meaning through embeddings, the mathematical fingerprints of text. This lets them find conceptually related passages even when the exact words differ.

Each system makes different mistakes. Lexical retrieval misses synonyms. Semantic retrieval can connect unrelated ideas. But when combined, they produce better results.

Inside most hybrid retrieval systems, the two methods are fused using a rule called Reciprocal Rank Fusion (RRF). You don’t have to be able to run it, but understanding the concept helps you interpret what you’ll measure later.

RRF In Plain English

Hybrid retrieval merges multiple ranked lists into one balanced list. The math behind that fusion is RRF.

The formula is simple: score equals one divided by k plus rank. This is written as 1 ÷ (k + rank). If an item appears in several lists, you add those scores together.

Here, “rank” means the item’s position in that list, starting with 1 as the top. “k” is a constant that smooths the difference between top and mid-ranked items. Most systems typically use something near 60, but each may tune it differently.

It’s worth remembering that a vector model doesn’t rank results by counting word matches. It measures how close each document’s embedding is to the query’s embedding in multi-dimensional space. The system then sorts those similarity scores from highest to lowest, effectively creating a ranked list. It looks like a search engine ranking, but it’s driven by distance math, not term frequency.

(Image Credit: Duane Forrester)

Let’s make it tangible with small numbers and two ranked lists. One from BM25 (keyword relevance) and one from a vector model (semantic relevance). We’ll use k = 10 for clarity.

Document A is ranked number 1 in BM25 and number 3 in the vector list.
From BM25: 1 ÷ (10 + 1) = 1 ÷ 11 = 0.0909.
From the vector list: 1 ÷ (10 + 3) = 1 ÷ 13 = 0.0769.
Add them together: 0.0909 + 0.0769 = 0.1678.

Document B is ranked number 2 in BM25 and number 1 in the vector list.
From BM25: 1 ÷ (10 + 2) = 1 ÷ 12 = 0.0833.
From the vector list: 1 ÷ (10 + 1) = 1 ÷ 11 = 0.0909.
Add them: 0.0833 + 0.0909 = 0.1742.

Document C is ranked number 3 in BM25 and number 2 in the vector list.
From BM25: 1 ÷ (10 + 3) = 1 ÷ 13 = 0.0769.
From the vector list: 1 ÷ (10 + 2) = 1 ÷ 12 = 0.0833.
Add them: 0.0769 + 0.0833 = 0.1602.

Document B wins here as it ranks high in both lists. If you raise k to 60, the differences shrink, producing a smoother, less top-heavy blend.

This example is purely illustrative. Every platform adjusts parameters differently, and no public documentation confirms which k values any engine uses. Think of it as an analogy for how multiple signals get averaged together.

Where This Math Actually Lives

You’ll never need to code it yourself as RRF is already part of modern search stacks. Here are examples of this type of system from their foundational providers. If you read through all of these, you’ll have a deeper understanding of how platforms like Perplexity do what they do:

All of them follow the same basic process: Retrieve with BM25, retrieve with vectors, score with RRF, and merge. The math above explains the concept, not the literal formula inside every product.

Observing Hybrid Retrieval In The Wild

Marketers can’t see those internal lists, but we can observe how systems behave at the surface. The trick is comparing what Google ranks with what an assistant cites, then measuring overlap, novelty, and consistency. This external math is a heuristic, a proxy for visibility. It’s not the same math the platforms calculate internally.

Step 1. Gather The Data

Pick 10 queries that matter to your business.

For each query:

  1. Run it in Google Search and copy the top 10 organic URLs.
  2. Run it in an assistant that shows citations, such as Perplexity or ChatGPT Search, and copy every cited URL or domain.

Now you have two lists per query: Google Top 10 and Assistant Citations.

(Be aware that not every assistant shows full citations, and not every query triggers them. Some assistants may summarize without listing sources at all. When that happens, skip that query as it simply can’t be measured this way.)

Step 2. Count Three Things

  1. Intersection (I): how many URLs or domains appear in both lists.
  2. Novelty (N): how many assistant citations do not appear in Google’s top 10.
    If the assistant has six citations and three overlap, N = 6 − 3 = 3.
  3. Frequency (F): how often each domain appears across all 10 queries.

Step 3. Turn Counts Into Quick Metrics

For each query set:

Shared Visibility Rate (SVR) = I ÷ 10.
This measures how much of Google’s top 10 also appears in the assistant’s citations.

Unique Assistant Visibility Rate (UAVR) = N ÷ total assistant citations for that query.
This shows how much new material the assistant introduces.

Repeat Citation Count (RCC) = (sum of F for each domain) ÷ number of queries.
This reflects how consistently a domain is cited across different answers.

Example:

Google top 10 = 10 URLs. Assistant citations = 6. Three overlap.
I = 3, N = 3, F (for example.com) = 4 (appears in four assistant answers).
SVR = 3 ÷ 10 = 0.30.
UAVR = 3 ÷ 6 = 0.50.
RCC = 4 ÷ 10 = 0.40.

You now have a numeric snapshot of how closely assistants mirror or diverge from search.

Step 4. Interpret

These scores are not industry benchmarks by any means, simply suggested starting points for you. Feel free to adjust as you feel the need:

  • High SVR (> 0.6) means your content aligns with both systems. Lexical and semantic relevance are in sync.
  • Moderate SVR (0.3 – 0.6) with high RCC suggests your pages are semantically trusted but need clearer markup or stronger linking.
  • Low SVR (< 0.3) with high UAVR shows assistants trust other sources. That often signals structure or clarity issues.
  • High RCC for competitors indicates the model repeatedly cites their domains, so it’s worth studying for schema or content design cues.

Step 5. Act

If SVR is low, improve headings, clarity, and crawlability. If RCC is low for your brand, standardize author fields, schema, and timestamps. If UAVR is high, track those new domains as they may already hold semantic trust in your niche.

(This approach won’t always work exactly as outlined. Some assistants limit the number of citations or vary them regionally. Results can differ by geography and query type. Treat it as an observational exercise, not a rigid framework.)

Why This Math Is Important

This math gives marketers a way to quantify agreement and disagreement between two retrieval systems. It’s diagnostic math, not ranking math. It doesn’t tell you why the assistant chose a source; it tells you that it did, and how consistently.

That pattern is the visible edge of the invisible hybrid logic operating behind the scenes. Think of it like watching the weather by looking at tree movement. You’re not simulating the atmosphere, just reading its effects.

On-Page Work That Helps Hybrid Retrieval

Once you see how overlap and novelty play out, the next step is tightening structure and clarity.

  • Write in short claim-and-evidence blocks of 200-300 words.
  • Use clear headings, bullets, and stable anchors so BM25 can find exact terms.
  • Add structured data (FAQ, HowTo, Product, TechArticle) so vectors and assistants understand context.
  • Keep canonical URLs stable and timestamp content updates.
  • Publish canonical PDF versions for high-trust topics; assistants often cite fixed, verifiable formats first.

These steps support both crawlers and LLMs as they share the language of structure.

Reporting And Executive Framing

Executives don’t care about BM25 or embeddings nearly as much as they care about visibility and trust.

Your new metrics (SVR, UAVR, and RCC) can help translate the abstract into something measurable: how much of your existing SEO presence carries into AI discovery, and where competitors are cited instead.

Pair those findings with Search Console’s AI Mode performance totals, but remember: You can’t currently separate AI Mode data from regular web clicks, so treat any AI-specific estimate as directional, not definitive. Also worth noting that there may still be regional limits on data availability.

These limits don’t make the math less useful, however. They help keep expectations realistic while giving you a concrete way to talk about AI-driven visibility with leadership.

Summing Up

The gap between search and assistants isn’t a wall. It’s more of a signal difference. Search engines rank pages after the answer is known. Assistants retrieve chunks before the answer exists.

The math in this article is an idea of how to observe that transition without developer tools. It’s not the platform’s math; it’s a marketer’s proxy that helps make the invisible visible.

In the end, the fundamentals stay the same. You still optimize for clarity, structure, and authority.

Now you can measure how that authority travels between ranking systems and retrieval systems, and do it with realistic expectations.

That visibility, counted and contextualized, is how modern SEO stays anchored in reality.

More Resources:


This post was originally published on Duane Forrester Decodes


Featured Image: Roman Samborskyi/Shutterstock

Time We Actually Start To Measure Relevancy When We Talk About “Relevant Traffic” via @sejournal, @TaylorDanRW

Every SEO strategy claims to drive “relevant traffic.” It is one of the industry’s most overused phrases, and one of the least examined. We celebrate growth in organic sessions and point to conversions as proof that our efforts are working.

Yet the metric we often use to prove “relevance” – last-click revenue or leads – tells us nothing about why those visits mattered, or how they contributed to the user’s journey.

If we want to mature SEO measurement, we need to redefine what relevance means and start measuring it directly, not infer it from transactional outcome.

With AI disrupting the user journey, and a lack of data and visibility from these platforms and Google’s latest Search additions (AI Mode and AI Overviews), now is the perfect time to redefine what SEO success looks like for the new, modern Search era – and for me, this starts with defining “relevant traffic.”

The Illusion Of Relevance

In most performance reports, “relevant traffic” is shorthand for “traffic that converts.”

But this definition is structurally flawed. Conversion metrics reward the final interaction, not the fit between user intent and content. They measure commercial efficiency, not contextual alignment.

A visitor could land on a blog post, spend five minutes reading, bookmark it, and return two weeks later via paid search to convert. In most attribution models, that organic session adds no measurable value to SEO. Yet that same session might have been the most relevant interaction in the entire funnel – the moment the brand aligned with the user’s need.

In Universal Analytics, we had some insights into this as we were able to view assisted conversion paths, but with Google Analytics 4, viewing conversion path reports is only available in the Advertising section.

Even when we had visibility on the conversion paths, we didn’t always consider the attribution touchpoints that Organic had on conversions with last-click attribution to other channels.

When we define relevance only through monetary endpoints, we constrain SEO to a transactional role and undervalue its strategic contribution: shaping how users discover, interpret, and trust a brand.

The Problem With Last-Click Thinking

Last-click attribution still dominates SEO reporting, even as marketers acknowledge its limitations.

It persists not because it is accurate, but because it is easy. It allows for simple narratives: “Organic drove X in revenue this month.” But simplicity comes at the cost of understanding.

User journeys are no longer linear; Search is firmly establishing itself as multimodal, which has been a shift happening over the past decade and is being further enabled by improvements in hardware, and AI.

Search is iterative, fragmented, and increasingly mediated by AI summarization and recommendation layers. A single decision may involve dozens of micro-moments, queries that refine, pivot, or explore tangents. Measuring “relevant traffic” through the lens of last-click attribution is like judging a novel by its final paragraph.

The more we compress SEO’s role into the conversion event, the more we disconnect it from how users actually experience relevance: as a sequence of signals that build familiarity, context, and trust.

What Relevance Really Measures

Actual relevance exists at the intersection of three dimensions: intent alignment, experience quality, and journey contribution.

1. Intent Alignment

  • Does the content match what the user sought to understand or achieve?
  • Are we solving the user’s actual problem, not just matching their keywords?
  • Relevance begins when the user’s context meets the brand’s competence.

2. Experience Quality

  • How well does the content facilitate progress, not just consumption?
  • Do users explore related content, complete micro-interactions, or return later?
  • Engagement depth, scroll behavior, and path continuation are not vanity metrics; they are proxies for satisfaction.

3. Journey Contribution

  • What role does the interaction play in the broader decision arc?
  • Did it inform, influence, or reassure, even if it did not close?
  • Assisted conversions, repeat session value, and brand recall metrics can capture this more effectively than revenue alone.

These dimensions demand a shift from output metrics (traffic, conversions) to outcome metrics (user progress, decision confidence, and informational completeness).

In other words, from “how much” to “how well.”

Measuring Relevance Beyond The Click

If we accept that relevance is not synonymous with revenue, then new measurement frameworks are needed. These might include:

  • Experience fit indices: Using behavioral data (scroll depth, dwell time, secondary navigation) to quantify whether users engage as expected given the intent type.
    Example: informational queries that lead to exploration and bookmarking score high on relevance, even if they do not convert immediately.
  • Query progression analysis: Tracking whether users continue refining their query after visiting your page. If they stop searching or pivot to branded terms, that is evidence of resolved intent.
  • Session contribution mapping: Modeling the cumulative influence of organic visits across multiple sessions and touchpoints. Tools like GA4’s data-driven attribution can be extended to show assist depth rather than last-touch value.
  • Experience-level segmentation: Grouping traffic by user purpose (for example, research, comparison, decision) and benchmarking engagement outcomes against expected behaviors for that intent.

These models do not replace commercial key performance indicators (KPIs); they contextualize them. They help organizations distinguish between traffic that sells and traffic that shapes future sales.

This isn’t to say that SEO activities shouldn’t be tied to commercial KPIs, but the role of SEO has evolved in the wider web ecosystem, and our understanding of value should also evolve with it.

Why This Matters Now

AI-driven search interfaces, from Google’s AI Overviews to ChatGPT and Perplexity, are forcing marketers to confront a new reality – relevance is being interpreted algorithmically.

Users are no longer exposed to 10 blue links and maybe some static SERP features, but to synthesized, conversational results. In this environment, content must not only rank; it must earn inclusion through semantic and experiential alignment.

This makes relevance an operational imperative. Brands that measure relevance effectively will understand how users perceive and progress through discovery in both traditional and AI-mediated ecosystems. Those who continue to equate relevance with conversion will misallocate resources toward transactional content at the expense of influence and visibility.

The next generation of SEO measurement should ask:

Does this content help the user make a better decision, faster? Not just, Did it make us money?

From Performance Marketing To Performance Understanding

The shift from measuring revenue to measuring relevance parallels the broader evolution of marketing itself, from performance marketing to performance understanding.

For years, the goal has been attribution: assigning value to touchpoints. But attribution without understanding is accounting, not insight.

Measuring relevance reintroduces meaning into the equation. It bridges brand and performance, showing not just what worked, but why it mattered.

This mindset reframes SEO as an experience design function, not merely a traffic acquisition channel. It also creates a more sustainable way to defend SEO investment by proving how organic experiences improve user outcomes and brand perception, not just immediate sales.

Redefining “Relevant Traffic” For The Next Era Of Search

It is time to retire the phrase “relevant traffic” as a catch-all justification for SEO success. Relevance cannot be declared; it must be demonstrated through evidence of user progress and alignment.

A modern SEO report should read less like a sales ledger and more like an experience diagnostic:

  • What intents did we serve best?
  • Which content formats drive confidence?
  • Where does our relevance break down?

Only then can we claim, with integrity, that our traffic is genuinely relevant.

Final Thought

Relevance is not measured at the checkout page. It is estimated that now a user feels understood.

Until we start measuring that, “relevant traffic” remains a slogan, not a strategy.

More Resources:


Featured Image: Master1305/Shutterstock

Google’s New BlockRank Democratizes Advanced Semantic Search via @sejournal, @martinibuster

A new research paper from Google DeepMind  proposes a new AI search ranking algorithm called BlockRank that works so well it puts advanced semantic search ranking within reach of individuals and organizations. The researchers conclude that it “can democratize access to powerful information discovery tools.”

In-Context Ranking (ICR)

The research paper describes the breakthrough of using In-Context Ranking (ICR), a way to rank web pages using a large language model’s contextual understanding abilities.

It prompts the model with:

  1. Instructions for the task (for example, “rank these web pages”)
  2. Candidate documents (the pages to rank)
  3. And the search query.

ICR is a relatively new approach first explored by researchers from Google DeepMind and Google Research in 2024 (Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? PDF). That earlier study showed that ICR could match the performance of retrieval systems built specifically for search.

But that improvement came with a downside in that it requires escalating computing power as the number of pages to be ranked are increased.

When a large language model (LLM) compares multiple documents to decide which are most relevant to a query, it has to “pay attention” to every word in every document and how each word relates to all others. This attention process gets much slower as more documents are added because the work grows exponentially.

The new research solves that efficiency problem, which is why the research paper is called, Scalable In-context Ranking with Generative Models, because it shows how to scale In-context Ranking (ICR) with what they call BlockRank.

How BlockRank Was Developed

The researchers examined how the model actually uses attention during In-Context Retrieval and found two patterns:

  • Inter-document block sparsity:
    The researchers found that when the model reads a group of documents, it tends to focus mainly on each document separately instead of comparing them all to each other. They call this “block sparsity,” meaning there’s little direct comparison between different documents. Building on that insight, they changed how the model reads the input so that it reviews each document on its own but still compares all of them against the question being asked. This keeps the part that matters, matching the documents to the query, while skipping the unnecessary document-to-document comparisons. The result is a system that runs much faster without losing accuracy.
  • Query-document block relevance:
    When the LLM reads the query, it doesn’t treat every word in that question as equally important. Some parts of the question, like specific keywords or punctuation that signal intent, help the model decide which document deserves more attention. The researchers found that the model’s internal attention patterns, particularly how certain words in the query focus on specific documents, often align with which documents are relevant. This behavior, which they call “query-document block relevance,” became something the researchers could train the model to use more effectively.

The researchers identified these two attention patterns and then designed a new approach informed by what they learned. The first pattern, inter-document block sparsity, revealed that the model was wasting computation by comparing documents to each other when that information wasn’t useful. The second pattern, query-document block relevance, showed that certain parts of a question already point toward the right document.

Based on these insights, they redesigned how the model handles attention and how it is trained. The result is BlockRank, a more efficient form of In-Context Retrieval that cuts unnecessary comparisons and teaches the model to focus on what truly signals relevance.

Benchmarking Accuracy Of BlockRank

The researchers tested BlockRank for how well it ranks documents on three major benchmarks:

  • BEIR
    A collection of many different search and question-answering tasks used to test how well a system can find and rank relevant information across a wide range of topics.
  • MS MARCO
    A large dataset of real Bing search queries and passages, used to measure how accurately a system can rank passages that best answer a user’s question.
  • Natural Questions (NQ)
    A benchmark built from real Google search questions, designed to test whether a system can identify and rank the passages from Wikipedia that directly answer those questions.

They used a 7-billion-parameter Mistral LLM and compared BlockRank to other strong ranking models, including FIRST, RankZephyr, RankVicuna, and a fully fine-tuned Mistral baseline.

BlockRank performed as well as or better than those systems on all three benchmarks, matching the results on MS MARCO and Natural Questions and doing slightly better on BEIR.

The researchers explained the results:

“Experiments on MSMarco and NQ show BlockRank (Mistral-7B) matches or surpasses standard fine-tuning effectiveness while being significantly more efficient at inference and training. This offers a scalable and effective approach for LLM-based ICR.”

They also acknowledged that they didn’t test multiple LLMs and that these results are specific to Mistral 7B.

Is BlockRank Used By Google?

The research paper says nothing about it being used in a live environment. So it’s purely conjecture to say that it might be used. Also, it’s natural to try to identify where BlockRank fits into AI Mode or AI Overviews but the descriptions of how AI Mode’s FastSearch and RankEmbed work are vastly different from what BlockRank does. So it’s unlikely that BlockRank is related to FastSearch or RankEmbed.

Why BlockRank Is A Breakthrough

What the research paper does say is that this is a breakthrough technology that puts an advanced ranking system within reach of individuals and organizations that wouldn’t normally be able to have this kind of high quality ranking technology.

The researchers explain:

“The BlockRank methodology, by enhancing the efficiency and scalability of In-context Retrieval (ICR) in Large Language Models (LLMs), makes advanced semantic retrieval more computationally tractable and can democratize access to powerful information discovery tools. This could accelerate research, improve educational outcomes by providing more relevant information quickly, and empower individuals and organizations with better decision-making capabilities.

Furthermore, the increased efficiency directly translates to reduced energy consumption for retrieval-intensive LLM applications, contributing to more environmentally sustainable AI development and deployment.

By enabling effective ICR on potentially smaller or more optimized models, BlockRank could also broaden the reach of these technologies in resource-constrained environments.”

SEOs and publishers are free to their opinions of whether or not this could be used by Google. I don’t think there’s evidence of that but it would be interesting to ask a Googler about it.

Google appears to be in the process of making BlockRank available on GitHub, but it doesn’t appear to have any code available there yet.

Read about BlockRank here:
Scalable In-context Ranking with Generative Models

Featured Image by Shutterstock/Nithid