Rethink Your Product Detail Pages

Conversion is the primary job of ecommerce product pages. Ranking in search engines has always been a close second. Until now.

It’s near cliché in 2026 to note that search and product discovery are changing. AI Overviews, AI Mode, various answer solutions, AI chat interfaces, and emerging shopping agents are remaking how consumers find and buy, from luxury items to everyday goods.

Annotated blueprint diagram of an ecommerce product detail page, illustrating 13 UX and conversion best practices. The layout includes a header with a promotional banner, logo, search bar, and navigation menu; a product section featuring a large image gallery, product title, star ratings, pricing with discount, color and quantity selectors, and Add to Cart and Buy Now buttons; a social proof bar with purchase activity and ratings; and a tabbed content area for product details, specifications, reviews, and shipping. Numbered callouts identify key elements including trust signals, clear navigation, visual focus, friction reduction, social proof, benefit-driven titles, transparent pricing, variant selectors, and prominent calls to action.

Conversion is the primary aim of a product detail page. But it should also attract traffic via traditional rankings and generative AI visibility. Click image to enlarge.

Information Source

In this new environment, product detail pages must be “AI consumable” to provide answers and model products as structured entities.

Hence today’s product detail pages should be:

  • Rankable,
  • Extractable,
  • Understandable as an entity.

Each aligns with familiar practices. Search engine optimization supports ranking. Answer engine optimization supports extraction. Generative engine optimization supports how AI systems understand and use data.

And a single product page must address all three.

Content Focus

In preparing this article, I used AI to review product detail pages from Amazon, Walmart, Target, L.L.Bean, a collection of direct-to-consumer brands, and several smaller ecommerce sites. The focus was on how the content of these pages addresses ranking, extracting, and understanding — not structured data markup, but content alone.

The AI provided a subjective score for each category of retailer.

Segment Example Sources Rankable Extractable Understandable as Entity
Marketplaces Amazon Very High Medium Very High
Large Retailers Walmart, Target High Medium–High High
Specialty Retail L.L.Bean Medium High Medium–High
D2C (Structured) AG1, Beekman 1802 Low–Medium High Medium
D2C (Hybrid) Casper, Allbirds Medium Medium Medium
D2C (Aesthetic) Vuori, Glossier Low Low Low–Medium
Small Merchants Mixed Shopify stores Low Low–Medium Low–Medium

Rankable

Traditional search still drives visibility.

Almost without exception, the product detail pages passed a basic search-optimization content audit. But large retailers did better, unsurprisingly.

Marketplaces and enterprise retailers such as Amazon, Walmart, and Target tend to use expansive titles, dense attributes, and strong internal links. The pages match many queries, not just one.

Amazon’s product pages include:

  • Titles,
  • Bullet points (“About this item”),
  • Product descriptions,
  • Specifications,
  • Frequently asked questions,
  • Reviews (often thousands of words).

In some cases, the composite product information reaches 10,000 words (mostly shopper reviews), although the average is around 2,000.

Several D2C brands favor clean names and brand-consistent language. The approach improves readability, but likely limits organic reach.

Smaller merchants’ product pages resemble those of D2C brands and could benefit from mimicking Amazon by adding more information.

Extractable

Answers determine what gets used.

To be “extractable,” a product page needs to explain itself directly. What is the product? What does it do? Who is it for? The answers to those questions should be concise and easy to isolate. Discreet sections, labeled features, and question-and-answer formats help.

Many of the product pages reviewed underperform in this area. The exception was the large retail marketplaces, which often contain extensive answer information.

Here again, even small retailers could benefit from adding an FAQ section.

Understandable

Data determines visibility.

Search engines and AI systems increasingly treat products as entities or objects with attributes such as brand, category, price, specifications, and relationships to other products.

While a product entity is certainly communicated through structured data, content also plays a role.

To be understandable as an entity, a product page’s content should define attributes (name, variants, specifications) clearly and consistently.

Product pages from large retailers, especially marketplaces, consistently describe products with clear attributes, normalized naming, and consistent variant handling. This allows products to appear in shopping results, comparison features, and structured listings.

3 Layers Combined

Combined, the three layers should drive traffic from traditional search and generative AI channels.

  • A rankable page is discoverable.
  • Extractable content facilitates answers.
  • Easily understood products can appear consistently across multiple systems.

My AI-driven site review identified patterns related to these layers and their individual goals. But it also revealed a gap.

Marketplaces excel at providing product information. The difference is pronounced and should lead all merchants, large and small, to ensure their product content addresses SEO, AEO, and GEO.

In 2026, you need all three.

Google May Expand Unsupported Robots.txt Rules List via @sejournal, @MattGSouthern

Google may expand the list of unsupported robots.txt rules in its documentation based on analysis of real-world robots.txt data collected through HTTP Archive.

Gary Illyes and Martin Splitt described the project on the latest episode of Search Off the Record. The work started after a community member submitted a pull request to Google’s robots.txt repository proposing two new tags be added to the unsupported list.

Illyes explained why the team broadened the scope beyond the two tags in the PR:

“We tried to not do things arbitrarily, but rather collect data.”

Rather than add only the two tags proposed, the team decided to look at the top 10 or 15 most-used unsupported rules. Illyes said the goal was “a decent starting point, a decent baseline” for documenting the most common unsupported tags in the wild.

How The Research Worked

The team used HTTP Archive to study what rules websites use in their robots.txt files. HTTP Archive runs monthly crawls across millions of URLs using WebPageTest and stores the results in Google BigQuery.

The first attempt hit a wall. The team “quickly figured out that no one is actually requesting robots.txt files” during the default crawl, meaning the HTTP Archive datasets don’t typically include robots.txt content.

After consulting with Barry Pollard and the HTTP Archive community, the team wrote a custom JavaScript parser that extracts robots.txt rules line by line. The custom metric was merged before the February crawl, and the resulting data is now available in the custom_metrics dataset in BigQuery.

What The Data Shows

The parser extracted every line that matched a field-colon-value pattern. Illyes described the resulting distribution:

“After allow and disallow and user agent, the drop is extremely drastic.”

Beyond those three fields, rule usage falls into a long tail of less common directives, plus junk data from broken files that return HTML instead of plain text.

Google currently supports four fields in robots.txt. Those fields are user-agent, allow, disallow, and sitemap. The documentation says other fields “aren’t supported” without listing which unsupported fields are most common in the wild.

Google has clarified that unsupported fields are ignored. The current project extends that work by identifying specific rules Google plans to document.

The top 10 to 15 most-used rules beyond the four supported fields are expected to be added to Google’s unsupported rules list. Illyes did not name specific rules that would be included.

Typo Tolerance May Expand

Illyes said the analysis also surfaced common misspellings of the disallow rule:

“I’m probably going to expand the typos that we accept.”

His phrasing implies the parser already accepts some misspellings. Illyes didn’t commit to a timeline or name specific typos.

Why This Matters

Search Console already surfaces some unrecognized robots.txt tags. If Google documents more unsupported directives, that could make its public documentation more closely reflect the unrecognized tags people already see surfaced in Search Console.

Looking Ahead

The planned update would affect Google’s public documentation and how disallow typos are handled. Anyone maintaining a robots.txt file with rules beyond user-agent, allow, disallow, and sitemap should audit for directives that have never worked for Google.

The HTTP Archive data is publicly queryable on BigQuery for anyone who wants to examine the distribution directly.


Featured Image: Screenshot from: YouTube.com/GoogleSearchCentral, April 2026. 

Google Adds View-Through Conversion Optimization To Demand Gen via @sejournal, @MattGSouthern

Google announced two updates to Demand Gen ahead of Google Marketing Live.

View-through conversion (VTC) optimization is now available for Demand Gen campaigns in Google Ads. This setting lets campaigns optimize toward view-through conversions on YouTube.

Google is also expanding Commerce Media Suite to support Demand Gen inventory in Google Ads. This adds Google Ads to existing Commerce Media Suite support in Display & Video 360 and Search Ads 360.

What’s New

VTC Optimization

When enabled, VTC optimization lets Demand Gen campaigns optimize toward view-through conversions on YouTube. A view-through conversion happens when a user sees an ad, doesn’t click, but later converts.

Commerce Media Suite

With the Google Ads expansion, advertisers can use retailers’ first-party catalog and conversion data to reach shoppers. Inventory covers YouTube, Discover, and Gmail.

The Performance Claim

In the announcement, Google cited Fospha’s Demand Gen and YouTube Playbook, a third-party vendor report. Fospha attributes an 18% higher share of new-customer conversions to Demand Gen versus the paid media average. Coverage spans 127 retail brands across fashion, cosmetics, and consumer goods from 2024 to 2025.

Fospha is a marketing attribution vendor with a commercial interest in measurement across advertising platforms. Google didn’t publish its own performance data alongside the announcement.

Why This Matters

VTC optimization brings Demand Gen closer to the capabilities advertisers already use on other ad platforms. For teams running Demand Gen alongside video campaigns on those platforms, the optimization setup no longer has to differ by channel.

The Commerce Media Suite expansion gives Google Ads advertisers access to retailer first-party catalog and conversion data. This adds Google Ads to existing Commerce Media Suite support in Display & Video 360 and Search Ads 360.

Since last year, Google has added Demand Gen optimization levers, including in-store sales optimization and shoppable CTV. VTC optimization and Commerce Media Suite support continue that pattern.

Looking Ahead

This announcement lands ahead of Google Marketing Live, where Google says more Demand Gen solutions will follow.

OpenAI’s Crawler Docs Now List OAI-AdsBot For ChatGPT Ads via @sejournal, @MattGSouthern

OpenAI’s public crawler documentation now lists OAI-AdsBot, a bot that may visit pages submitted as ChatGPT ads to check policy compliance and help determine ad relevance.

The entry sits alongside OAI-SearchBot, GPTBot, and ChatGPT-User on OpenAI’s crawler docs page, bringing the documented bot count to four.

OpenAI states that OAI-AdsBot only visits pages submitted as ads and that the data it collects isn’t used to train its generative AI foundation models.

What The Bot Does

Per OpenAI’s docs, OAI-AdsBot may visit an ad’s landing page after the ad gets submitted. The bot checks whether the page complies with OpenAI’s ad policies. It may also use content from the landing page to help decide when to show the ad to ChatGPT users.

The bot identifies itself with the user-agent string Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot.

OAI-SearchBot and GPTBot are both at version 1.3, per OpenAI’s docs. The crawler only visits pages submitted as ad landing pages, not the wider web.

What The Bot Doesn’t Do

Data collected by OAI-AdsBot isn’t used to train generative AI foundation models. That keeps OAI-AdsBot out of GPTBot’s territory, which handles training data collection.

It also keeps OAI-AdsBot separate from OpenAI’s other bots. OAI-SearchBot surfaces content in ChatGPT search, while ChatGPT-User fetches pages during user-initiated browsing, and OAI-AdsBot is limited to ad validation.

OAI-SearchBot and GPTBot can be controlled independently through robots.txt. ChatGPT-User is user-initiated, and the company notes that robots.txt rules may not apply to it. The OAI-AdsBot entry doesn’t say how the bot treats robots.txt.

No Public IP List Yet

OpenAI publishes IP range files for its three earlier bots at openai.com/searchbot.json, openai.com/gptbot.json, and openai.com/chatgpt-user.json. At the time of publication, no equivalent openai.com/adsbot.json file appears in OpenAI’s docs.

Without a published list, verifying a real OAI-AdsBot visit becomes harder. User-agent strings can be spoofed, and the IP lists give you a way to cross-check for the other three OpenAI bots. For OAI-AdsBot, that cross-check isn’t available.

Why This Matters

OAI-AdsBot has two audiences. Advertisers buying placements on ChatGPT need the bot to reach their landing pages; otherwise, the ad may not validate. Anyone tracking AI bot activity in server logs gets a new user-agent to watch, one tied to paid inventory rather than search or training.

Aggressive bot protection through Cloudflare, Akamai, or similar tools may block OAI-AdsBot before it reaches the page. That could create validation friction for advertisers who use strict bot-mitigation tools.

Looking Ahead

ChatGPT’s ad program has moved fast since OpenAI started testing ads on Feb. 9. As access opens up to more advertisers, OAI-AdsBot traffic will start showing up in more server logs. Watch for an eventual IP range file at openai.com/adsbot.json if OpenAI chooses to publish one. For now, the user-agent string is what you have to work with.


Featured Image: Blossom Stock Studio/Shutterstock

Why your product is your most important SEO asset

For a long time, we defined SEO success by rankings and traffic. If you reached the top of the search results and brought people to your site, you did your job. That approach worked when discovery was linear, and search engines were the primary gatekeepers. But modern search behavior does not stop at discovery. Users want clarity, reassurance, and confidence before they make decisions. With so many options to choose from, users want to understand what a product does, how it compares to alternatives, and whether it fits their needs.

There is a shift in SEO, one that pushes closer to product thinking and long-term value creation. Search engines reward content and experiences that help users make informed decisions, not just pages that match keywords. That means SEO can no longer exist solely in the acquisition channel. SEO must support the entire journey, from first touch to post-purchase experience.

Table of contents

Key takeaways

  • SEO now focuses on user clarity and informed decision-making rather than just rankings and traffic.
  • Businesses should adopt an approach that integrates product understanding and user intent into keyword research.
  • Technical SEO remains crucial; a well-structured site improves visibility for both users and AI systems.
  • Product content, including descriptions and FAQs, serves as a powerful SEO asset that should be optimized.
  • Schema markup is essential for AI systems to accurately interpret product information, enhancing visibility and recommendations.

Technical SEO has always been product thinking

Technical SEO has always mattered, and it’s been tied to product quality, or at least product page quality. Site speed, internal linking, structured content, and clear navigation all shape how users experience a product online.

A fast, well-structured site helps users and AI platforms better understand your products. That means better visibility in search engines and AI recommendations alike. Good SEO looks at the system as a whole, prioritizes changes based on impact, and focuses on removing friction, which are the same principles that guide good product decisions.

Think like a product marketer, not just an SEO

Ranking for keywords does not automatically mean you are reaching the right audience or communicating the right value. Product marketers spend time understanding who the product is for, what problem it solves, and why someone should choose it over alternatives. SEO benefits enormously from that same approach.  

Keyword research is not just a targeting exercise. It reveals how people describe their problems, what they care about, and what information they need before making a decision. Applying those insights to product descriptions, category pages, and supporting content pulls SEO closer to real user intent. 

This is how SEO moves beyond traffic and starts contributing to the full customer journey: awareness, consideration, conversion, and, just as importantly, retention.  

Your product is your most underrated SEO asset

Many SEO strategies still treat content as something separate from the product. Blogs live in one place while product pages are left to focus purely on conversion.  

But products are content. Product names, descriptions, specifications, FAQs, reviews, and even post-purchase information all reflect the real information users are looking for. This content often holds far more SEO value than a generic blog post. Still, most brands do not optimize it with the same level of care.

When product pages are clear, well-structured, and written in the language customers actually use, they become powerful discovery assets.

AI is changing how products are discovered and bought

Users are turning to AI platforms to ask for recommendations, evaluate options, and understand differences between products.  

ChatGPT now supports direct purchases through integrations with platforms like Shopify, using OpenAI’s Agentic Commerce Protocol. That means users can discover and buy products directly within an AI conversation without ever visiting a product page on a website.  

For businesses, this changes what visibility looks like. SEO is no longer just about ranking in search results. SEO is about making sure your products are understandable, trustworthy, and accessible to AI systems that act as intermediaries.  

And the scope of that is broader than it first appears. Google’s Universal Commerce Protocol (UCP) extends AI-mediated commerce well beyond the checkout, covering the full lifecycle from product discovery through to order management, post-purchase support, and loyalty. That means the journey SEO needs to support has grown significantly. It is not just about being found and bought; it is about being the kind of brand an AI agent would confidently recommend, follow up with, and return to. Read more about ACP and UCP and what they mean for SEOs.

Why schema matters more than ever

If AI systems are going to recommend and sell products, they need structured information to rely on. Schema provides that structure. It tells search engines and AI platforms what a product is, how much it costs, whether it is available, how it is reviewed, and how it fits into a broader catalog.  

Without structured data, products become harder for machines to interpret and surface. With it, they become eligible for richer visibility across search engines, LLMs, and emerging shopping experiences.  

This goes beyond the basics. Pricing, availability, reviews, FAQs, shipping details, and even compatibility information all contribute to how well an AI agent can evaluate and surface your products. Third-party reviews on platforms like Trustpilot also play a role. Agents use external signals to validate brand credibility before making a recommendation. If that structured data is incomplete or inconsistent, your products risk being entirely invisible to agent-mediated discovery. 

Conclusion

The rules of SEO have not been torn up but extended. Product thinking, structured data, clear content, and technical rigor have always mattered. What has changed is the audience you are optimizing for. Alongside the human visitor, you now have AI agents evaluating, recommending, and, in some cases, completing purchases on a user’s behalf. The businesses that will thrive are those that make their products easy to understand, easy to trust, and easy to surface, whether a person or a machine is doing the searching. 

The Real Reason Your SEO Team Hasn’t Made The AI Transition Yet via @sejournal, @DuaneForrester

This series has spent five articles mapping what the AI search transition requires of your team, your content, your technical infrastructure, and your strategic framing. This piece addresses the question those five articles don’t answer: How do you actually make the organizational shift happen?

Most teams won’t fail here because they lack vision. The failure mode is execution, specifically the gap between knowing change is necessary and building the structure that makes it real.

The Transition Problem Is A People Problem, Not A Technology Problem

Only about 30% of enterprise SEO teams have restructured roles and responsibilities as a result of AI implementation. That means roughly 70% of teams who understand the shift intellectually haven’t made a structural move yet. The tools exist. The research is available. The urgency is visible in the data. And most teams are still running the same org chart they had three years ago.

This isn’t a strategic failure. It’s a change management failure, and it has a predictable shape. Three stall patterns show up consistently.

Analysis paralysis is the team that has attended every conference session, read every report, and built a compelling internal case, but can’t commit to a starting point because the landscape keeps shifting. The logic feels defensible: Why restructure when the platform behavior might change next quarter? The answer is that waiting for stability in an unstable environment isn’t patience. It’s avoidance dressed up as diligence.

Pilot purgatory is more widespread than most leaders want to admit. A survey of 200 U.S. marketing leaders found that 82% of teams using AI for campaigns are still operating in pilot or experimental mode, with 61% using AI only at the individual level rather than building it into collaborative team workflows. The pilot never fails cleanly; it just never graduates to production.

Reorg fatigue is the subtlest of the three. Teams that have been through digital transformation cycles carry scar tissue. They’ve watched priority initiatives get announced, resourced, and quietly abandoned when the next priority arrived. When a VP announces a pivot to AI visibility, the team’s first internal question often isn’t how to do it; it’s how long until this one goes away, too. Credibility for this transition requires demonstrating that it’s structurally different from the previous three, which means visible commitment in budget, headcount, and KPI design, not just slide decks.

The Resistance Map

Not all resistance is the same, and treating it as a uniform problem produces uniform failure. Four distinct patterns appear in SEO and marketing teams, each requiring a different response.

Seniority-based resistance sounds like: I’ve been doing this for 15 years, and I know what works. This is often the hardest pattern to address because it’s partly legitimate. Senior practitioners have real pattern recognition that junior team members lack, and they’ve watched enough vendor-driven hype cycles to be appropriately skeptical of any new essential framework. The correct response isn’t to dismiss the experience; it’s to reframe the transition as an addition to what they know, not a replacement of it. As established in the context moat piece earlier in this series, the fundamentals of relevance and trust don’t disappear in an AI search environment. They compound. Senior practitioners who make that conceptual bridge become accelerants, not obstacles.

Skills-based anxiety is a different problem entirely. This person isn’t resisting because they distrust the framework; they’re resisting because they don’t know how to operate inside it. The language of vector indexes, structured data expansion, and retrieval architecture is genuinely foreign to someone who built their career on keyword clustering and link building. A useful diagnostic lens here comes from the ADKAR model, a change management framework developed by Prosci that identifies five sequential conditions an individual needs to reach for change to stick: Awareness, Desire, Knowledge, Ability, and Reinforcement. Skills-based anxiety is almost always a Knowledge or Ability gap, not a motivation problem. Treating it as motivation resistance wastes time and confirms the team member’s fear that leadership doesn’t understand what they’re actually being asked to do.

Political resistance is structural, not personal. If AI visibility expands SEO scope to include retrieval architecture, machine-facing content design, and cross-functional data coordination, someone’s budget conversation changes. Marketing ops, IT, and content teams all have a plausible claim on parts of that expanded scope. This resistance rarely surfaces as direct opposition; it shows up as slow approvals, ambiguous priorities, and repeated requests to align with stakeholders before anything moves. The response requires making budget and ownership decisions explicitly, not hoping that clarity emerges from collaboration.

Legitimate skepticism deserves its own category because it’s the resistance pattern most leaders mishandle. When someone asks to see the revenue connection, that isn’t obstruction; it’s the right question. The answer needs to be honest, which means acknowledging that the measurement infrastructure for AI visibility is still developing. Trying to manufacture certainty in response to legitimate skepticism destroys credibility faster than admitting the gap. Acknowledging where the data is incomplete while demonstrating directional progress is more durable.

Running Both Operations At Once

Most teams can’t switch from traditional SEO to AI visibility operations in a single reorg cycle, and the honest answer is that most won’t need to. The practical reality is a period of parallel operation, where traditional work continues while AI visibility capabilities are built alongside it, and for the majority of organizations, that parallel period won’t resolve into a clean new structure. It will simply become how the team operates. The most common near-term pattern is already visible: The existing SEO gets handed AEO responsibilities alongside their current work, budgets don’t expand to match the expanded scope, and the team figures it out. That state will persist for years in most organizations, and in many it will persist indefinitely. New dedicated roles will emerge at larger organizations and in more competitive verticals, but that’s the exception rather than the rule.

Ultimately, the right allocation isn’t a fixed ratio dropped in from outside your organization; it’s a function of where your current traffic and business value are coming from, and how fast that’s shifting. What research on enterprise AI adoption does confirm is a consistent structural principle: Organizations that successfully scale AI spend the majority of their transition effort on people and process, not on the technology layer itself. That inversion, most attention on tools and least on people, is the primary driver of the pilot purgatory pattern described above. Your capacity allocation decisions need to reflect that. Building a new AI visibility capability on inadequate team development produces a capability that exists on paper and stalls in practice.

Two operational principles matter during the parallel period. First, not all traditional SEO activities need equal intensity to maintain. Technical hygiene, crawl accessibility, and core structured data work protect your existing position and directly support AI retrieval; they aren’t legacy activities to deprioritize. High-volume tactical content production, by contrast, is where capacity can be reallocated toward AI-era work without meaningful risk to current performance. Second, the AI visibility workstream needs dedicated ownership, not shared bandwidth. Work that lives in everyone’s job description at the margin of their other responsibilities doesn’t graduate from pilot mode. Someone needs to own the new work as a primary accountability.

Sequencing The Role Transitions

Not all roles change at the same time, and trying to restructure everything simultaneously is how reorg fatigue gets manufactured. A phased sequence reduces disruption while building the internal momentum that carries later phases.

Phase one starts with content strategists, because the conceptual bridge is shortest. The move from “what does my audience search for” to “what context does a retrieval model need to surface my content accurately” is an extension of existing thinking, not a departure from it. As covered in the roles series, this is the capability layer with the most upskilling potential and the least new-hire dependency. Start here, build early wins, and let the internal success story carry credibility into subsequent phases.

Phase two moves to technical SEOs, who face a more demanding knowledge transition. Vector index hygiene, structured data expansion beyond standard schema implementations, and crawl accessibility for AI bots require genuine new technical literacy, and not every existing practitioner will choose to develop it. This is where the upskill-versus-hire question starts to get real, and more on that in the next section. The technical SEO role isn’t disappearing, but its scope is expanding in directions that require deliberate investment.

Phase three introduces roles that may not yet exist on your team: an AI visibility analyst responsible for monitoring retrieval inclusion and brand representation, and someone focused on machine-facing content architecture. These may start as partial responsibilities before they justify dedicated headcount, but they need to exist as named functions with owners before the measurement conversation in phase four can work.

Phase four restructures reporting lines and performance metrics to reflect the new operating model. Teams held accountable to AI visibility outcomes, while their performance reviews are built entirely around traditional organic traffic metrics, produce the behavior you’d expect: compliance theater. This phase shouldn’t wait until phase three is complete; it should be designed in phase one and communicated clearly so the team understands what the finish line looks like from the start.

The Training Investment Decision

Whether to upskill existing team members or hire new ones is often framed as a budget decision. It’s actually a knowledge gap assessment.

If the gap is conceptual, covering how retrieval works, how AI models use structured data, how community signals feed into model training as discussed in the community signals piece, invest in training. These are learnable frameworks, and experienced practitioners who understand the underlying logic of traditional SEO have strong transfer potential. Analysis of more than 10,000 SEO job postings shows a 21% year-over-year increase in AI-related skill requirements, which reflects real employer demand but also signals that the market expects existing practitioners to develop these capabilities, not that companies are replacing their teams wholesale.

If the gap is technical execution, building APIs, working directly with embedding architectures, constructing systems that require software engineering background, the calculus shifts toward hiring or contracting. This is specialized enough that the training timeline to bring an existing practitioner to production competency may exceed the cost and speed of hiring someone who already has it.

A practical diagnostic for each capability gap: ask whether a competent practitioner with your team’s existing background could reach working proficiency in 90 days with focused investment. If yes, train. If the honest answer is longer, or if the gap requires a completely different mental model of how software systems work, consider hiring. The important discipline here is answering honestly rather than answering in the direction of what’s cheaper.

Measuring The Transition Itself

The transition needs its own measurement framework, separate from the visibility metrics the transition is designed to improve. Without it, leadership has no way to distinguish between a team that is genuinely progressing and a team that is performing progress.

Leading indicators tell you whether the structural shift is actually happening: team fluency with retrieval concepts verified through practical exercises rather than self-reporting, the number of AI visibility experiments in active testing rather than sitting in a backlog, and cross-functional collaboration frequency between SEO, content, and technical teams on AI-era work.

Lagging indicators connect to the outcomes the transition is meant to produce: Brand citation share in AI-generated responses, retrieval inclusion rates across major platforms, and the accuracy of brand representation when your content is surfaced. The framework for approaching these metrics was laid out in the GenAI KPIs piece, and the methodology there applies directly to the lagging indicators here.

The honest acknowledgment is that standardized measurement infrastructure for AI visibility is still developing. The industry hasn’t produced the equivalent of what organic search has in terms of agreed-upon tracking methodology. That isn’t a reason to defer the transition; it’s a reason to document your own methodology consistently from the start, so you’re building a proprietary baseline as standards eventually emerge. Companies that begin measuring now, even imperfectly, will have comparative data that teams starting eighteen months from now won’t be able to reconstruct.

A 90-day scorecard for the transition itself should include: at least one role with formal AI visibility responsibilities assigned, a named owner for the dual operating model, at least two active retrieval experiments generating learning data, and a completed skills gap assessment for every team member against the phase three role definitions. None of those are visibility metrics. They’re execution metrics, and execution is where most transitions fail.

Who Wins?

The organizations that navigate this transition successfully won’t be the ones with the clearest vision of what AI search requires. They’ll be the ones that converted that vision into structure: named owners, phased timelines, honest skills assessments, and measurement that tracks the work before it tracks the outcomes. Vision is table stakes, and every team reading this already has it. The ones that pull ahead will be the ones that open Mondays with a plan.

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: GaudiLab/Shutterstock; Paulo Bobita/Search Engine Journal

Why Google Has Changed & Who’s Really Paying for It

Money, obviously. But it’s deeper than that.

Google’s market share has broadly held firm in the wake of everything AI. By held firm, I mean its share price has gone through the roof, and its AI offering is growing ever stronger.

Google's stock price in the last 5 years
Happy, happy shareholders. Sad, sad people. (Image Credit: Harry Clarkson-Bennett)

But I don’t think all is as rosy as it seems.

Google’s search product isn’t addictive – as much as they’re trying to change that. Nobody hangs out there except saddos like us. And audiences – particularly younger ones – have options.

They’re turning away from more traditional methods of information retrieval, and that’s a big problem. Even for Google.

Google audience share over ther last 36 months using Similarweb data
Google’s worldwide audience share by age group (Image Credit: Harry Clarkson-Bennett)

Even the search engine giant isn’t immune.

Older audiences – those already ingrained in the system – are taking up a larger percentage of their audience. The younger ones have more exciting and addictive options, and best believe they’re using them to find stuff.

Engagement data to Google.com broken down by age group
Worldwide Google engagement data broken down by age group (Image Credit: Harry Clarkson-Bennett)

Across every engagement metric, 18-24-year-olds have deteriorated faster than 65+ users over the same period. Shorter visit duration, fewer pages per visit, and a worse bounce rate. And it’s declining more rapidly with younger audiences.

Evolution for Google and the wider web is a necessity.

Although interesting to note that the 18-24 year old audience share has only suffered a small decline according to Similarweb data. The real losses were in the 25-34 cohort.

TL;DR

  1. The publishing industry and Google have more in common than perhaps either of us cares to admit.
  2. The changes Google has made are a very deliberate effort to engage with – and retain – younger audiences. Audiences who behave differently.
  3. Engagement data on news websites (pages per visit, bounce rate, and time on site) declines with audience age. Exactly the same is true of Google.
  4. AI Mode is Google’s attempt to create a “sticky” product. One aimed at younger audiences.

What’s Changed?

Well, the obvious:

Just look at the SERP for almost any term, particularly middle-of-the-funnel comparison ones.

Google SERP for 'best carpet cleaners'
You can’t move for video, which I sort of hate (Image Credit: Harry Clarkson-Bennett)

What people apparently want is not very publisher, or legacy-search-friendly. What they want is video.

Particularly the youth.

Right now, it’s feasible children spend almost four hours per day watching video on YouTube and TikTok. Four hours. That same group spends just four minutes on publisher websites.

The younger you are, the more time you spend watching, the less you spend reading. So the obvious counter (from a company that primarily organizes written content) is to saturate the market with video content.

Obviously, it’s very helpful if you own the market.

And this doesn’t just affect organic search. Adverts are more expensive to run because AIOs have destroyed the entire search ecosystem’s click-through rates. So for almost all businesses, customer acquisition is more expensive.

You could say that’s Google’s way of paying for AIOs – a far more expensive SERP to generate  due to the massive computational power and energy needed to run large language models (LLMs).

But I am not going to insinuate anything of the sort. It would be incomprehensible to me that the guys who earn the entire ad and search market would make the ad side of the business more expensive to run to pay for their search experiments.

Wait a minute…

Why Now?

I think this is a direct response to two things;

  1. The 2023 Code Red Google sent out in response to OpenAI.
  2. Younger audiences shifting information retrieval methods.

One is obvious.

OpenAI forced Google to move quicker than they would’ve liked. Hence, all the absolute trash in AI Overviews in the beginning. Well, and sort of now. It smacked of a product that hadn’t gone through the required amount of rigorous testing.

Two is more nuanced.

Google website traffic by age group
The youngest demographic spends less time on search (Image Credit: Harry Clarkson-Bennett)

This data correlates almost perfectly with the Similarweb data I pulled. In isolation, this may not be a problem. Could be as simple as saying younger audiences will grow into it.

But I don’t think that argument works. We see it in news and publishing. We are living through it, and we’re watching the decline in real time.

Younger audiences have the highest recorded screen time on record (globally, 7 hours 22 minutes), but are spending less and less time reading. More on far more visually engaging, stimulating, and addictive technologies.

Based on screen time alone, younger audiences should spend the most time on Google. But they don’t. I’m sure that is blatantly obvious to the Googlers.

Proportion who say they prefer watching or listening to the news by age group
Reuters – Understanding Younger Audiences (Image Credit: Harry Clarkson-Bennett)

While content consumption is at an all-time high, the way a person consumes content is not conducive to more traditional publishing practices.

Just 4 minutes a day on news websites for younger audiences vs. 18 minutes for the over 55s. A 350% increase.

The same principle is true of more traditional search.

At the risk of sounding a bit too AI-y, this is a really seismic shift. Ironically, not one driven by AI. Not entirely. One driven by a combination of big tech’s insatiable appetite for money, a lack of trust in more traditional brands, and the rise of the creator ecosystem.

And AI, obviously.

As someone in the comments said, Google is Unc. Maybe a little like news websites. Their ability to attract younger audiences has diminished.

Audience share by age group based on 6 top UK publishers - anonymised SImilarweb data
Similarweb publisher data – last 24 months (using six major UK publishers) (Image Credit: Harry Clarkson-Bennett)

I think we can clearly correlate the changes Google has made to the reduction in the younger audience share for publishers. A generation less inclined to click.

One could argue that the traffic losses so many seem to have suffered are almost exclusively from younger audiences. I certainly am.

Audiences more likely to adopt new technologies – particularly flashy ones.

There Are Clear Parallels Between News And Search

Google has gotten richer, as has the AI bubble. All that money has to come from somewhere.

It’s everyone else who struggles.

These changes are designed to counter a younger generation’s shift toward people and ultra-engaging platforms that encourage passive or more incidental methods of information retrieval.

Since 2015, interest in news has declined – more significantly (43%) in 18-24-year-olds than in any other age group. And just 64% of 18-24-year-olds consume news on a daily basis, compared with 87% of people 55 and over.

Proportion very or extremely interested in news
Reuters – Understanding Younger Audiences (Image Credit: Harry Clarkson-Bennett)

Historically, news has been sought out.

Either you browsed a news website (a real paper if you felt fancy) or you searched for it. But the discovery layer changed, and search – the engine that powered the volume-driven publishing model for two decades – is responding.

Responding to younger audiences’ shifting consumption habits. Just like publishers and websites will have to.

Proportion that say social media is their main form of news over time
Reuters – Digital News Report 2025 (Image Credit: Harry Clarkson-Bennett)

Passive consumption is just the norm now with younger audiences. This is why 44% of 18-24-year-olds see social media as the main source of news, compared to just 15% of 55+.

They expect you to just appear. Algorithmic consumption has reduced the need, want, and desire to actively seek something out. If what you serve isn’t delivered directly to their feed, you don’t exist.

Combine this with diminishing trust in more traditional brands, zero-click searches, and the rise of the creator, and you can see why publishers and Google are having to change.

There have been alternatives to Google when it comes to accessing and retrieving information – Instagram, Amazon, YouTube, et al., for years.

Really, this is, or has been, Search Everywhere Optimization. It has been around for a decade. It is also, IMO, why reframing SEO as GEO or some other BS because of LLMs is so moronic.

Dave Jorgenson on TikTok
Views for The Washington Post’s YouTube channel dropped by 85% from its peak in April (54 million views) to 8.2 million views in September 2025, two months after Jorgenson’s exit. (Image Credit: Harry Clarkson-Bennett)

And now the individual has become the competition. The creator economy – soon to be worth $480 billion – has produced a new class of competitor: individuals with direct audience relationships, authentic voices, and none of the structural cost of a legacy newsroom.

51% of 18-24-year-olds pay attention to creators and personalities, compared to 39% who pay attention to traditional media and journalists – a 12 PP inversion.

And this is a problem for Google, too. People used to use their organizational skills to satisfy all of their needs. Now, it is so heavily navigational that it’s hard to know how much “new” stuff people really use it for.

Outside of news, at least, ironically.

Will This Work?

If it’s anything like news publishers, their primary concern is to continually generate new and engaged audiences with habitual products. AI Mode could absolutely be that product. Discover is their version of a social network. They are, in their own way, engaging products.

Although the low intent nature of Discover makes the advertising rubbish, and Google not really care about it. Sad, but true.

Like Google, the engagement data for publishers tells a pretty bleak story.

Engagement data from SImilarweb based on 6 top UK publishers
Similarweb publisher data (using six major UK publishers) (Image Credit: Harry Clarkson-Bennett)

If we isolate this to the youngest and oldest audience, it’s pretty clear what is going on.

Pages per visit, bounce rate and time on site by old and young audiences - based on 6 top publishers
(Image Credit: Harry Clarkson-Bennett)

Younger audiences:

  • Are far less engaged with the traditional news offering than older audiences.
  • Use these (and any) websites differently.

There’s no denying that younger audiences have more diverse and engaging options. This means they use websites like news publishers differently. To fact-check. To confirm something isn’t just spurious BS. To scan and skim.

The same is true of Google. Less of a discovery journey. More one of fact-checking and navigational searching.

Now, I’m not insinuating that older audiences get stuck with adverts and can’t use a menu. That can’t account for an extra 14 minutes of time spent on news websites.

But having watched my mother with a computer, it’s not impossible.

So, What’s The Answer?

To lean into what the new generation likes. Adapt and evolve.

Recommendations slide from the FT Strategies x WAN IFRA News Creator Project
Exec summary from WAN-FRA x the FT Strategies News Creator Project (Image Credit: Harry Clarkson-Bennett)

The same is true for search (internally and externally) and publishers. If you work for Google, it makes complete sense you would try to expand your video presence in the SERP and prioritize “quality” UGC.

The quality part is lacking as most of the internet – as we’re finding out – is a stinking pile of garbage.

But notoriously, the tide is tricky to swim against.

For publishers, it means working with creators, leveraging their audiences and ability to deliver things quickly. Differently. And creators can benefit from the trust associated with proper news organizations.

Is it that unreasonable to think Google should do the same?

Instead of abusing their position, they could start by giving people an idea of the impact of AIOs and AI Mode. I’m not a financial guru, but I reckon Google has enough money to build and foster creator and publisher programs that are not one-sided. That brings genuine value to people and the wider information retrieval ecosystem.

In this scenario, everyone benefits. When AI companies refuse to pay for publisher content, everyone loses.

  • LLMs lose because they have less unique, human-created, quality content to train on.
  • Publishers lose because they are forced to suppress their visibility and don’t get any money.
  • Users lose because the end output isn’t as good.

Model collapse is on the horizon. AI learning on AI falsehoods. A repetitive cycle of garbage. Joyous.

Lily Ray's AI Slop Loop
Lily Ray called it the AI Slop Loop, which has a nice, albeit bleak ring to it (Image Credit: Harry Clarkson-Bennett)

These companies should invest in the ecosystems that built them. Particularly Google.

For publishers:

  1. Build owned channels. Get away from relying on big tech.
  2. Create brilliant, unique journalism.
  3. Supplement it with habit-forming products – puzzles being the obvious example.
  4. Build and sponsor audio and video programs that reach your intended audience.
  5. Implement channel-specific strategies.

Even the New York Times doesn’t rely solely on subscriptions from written content. Not by a long shot. It isn’t enough.

Inside The New York Times Business Model: How Bundling Saved Journalism
They’re as diverse and resilient as any publisher (Image Credit: Harry Clarkson-Bennett)

Final Thoughts

Unfortunately, I think the recent spate of job losses in the publishing industry is just the beginning. Bauer, the BBC, The Washington Post. It’s not UK or SEO-specific. 100,000 roles are becoming 70,000 ones. Teams are shrinking. And there are real-world ramifications.

We are not in a good moment. Some of this can be attributed to AI. But I think more of it is due to longer-term economic difficulties, audiences switching off from traditional news, and things like the Site Reputation Abuse update destroying much-needed revenue lines overnight.

It is hard to make these businesses profitable. Google doesn’t have that problem. But they’re not immune to changing behaviors and becoming yesterday’s news either.

Should you be enough of a psychopath, you can follow the job cuts via this updated Press Gazette article.

More Resources:


Read Leadership In SEO. Subscribe now.


Featured Image: Roman Samborskyi/Shutterstock

Why Great Content Is No Longer Enough & What Beats It In AI Search via @sejournal, @TaylorDanRW

The assumption has been that producing something more detailed, more original, and more useful would naturally lead to stronger results, since that approach worked in a search ecosystem where discovery (and success) depended on rankings, clicks, and users actively choosing what to read.

That ecosystem rewarded the most compelling, scannable, or comprehensive option on the page, which made craftsmanship feel like the primary lever for success.

It is no longer the ecosystem we are working in, and continuing to apply that same logic without adjusting is exactly where many teams are starting to fall behind. We’ve seen this with the gamification of listicles already, and how large language models (and Google) are having to “patch” exploits as they’re found.

AI has not reduced the importance of content, but it has shifted where value is created and how that value is realized, which now revolves around who gets surfaced, cited, and reused within systems that sit between users and the web.

Content quality still matters, but it is no longer the deciding factor, and treating it as such creates a blind spot that is becoming increasingly difficult to ignore.

The Shift From Authorship To Retrieval

In traditional search, authorship carried clear weight because you created a page, earned visibility through rankings, and relied on users to click through and engage directly with what you had produced.

Success was closely tied to ownership and placement within a list of results, which made the relationship between effort and outcome feel transactional, and easily reportable to stakeholders.

Authorship still matters, and it still influences whether content is trusted, referenced, and reused, but its role has shifted toward how it supports retrieval rather than how it drives direct consumption.

Content now needs to function not only as a complete piece for human readers but also as a collection of ideas that can be extracted and reused across different contexts. This creates pressure on structure, clarity, and alignment with recognizable entities, since an author is no longer just a name attached to a page but an entity that exists across a broader ecosystem of signals, references, and mentions.

When those connections are strong, authorship reinforces retrieval and increases the likelihood that content will be selected and reused. When they are weak or absent, even high-quality content can struggle to gain traction.

AI systems don’t ignore authorship, but the way that we’ve thought about Google and authorship vectors is adapting. LLMs compress it by relying on signals of credibility and consistency, then expressing that trust through what they retrieve and include in generated responses.

This changes the unit of competition from pages to fragments and shifts the focus from ownership to accessibility, while still anchoring value in who created the content and how that creator is understood elsewhere. Strong writing and clear expertise improve the chances of being retrieved, but they do not guarantee it, which means success depends on combining credible authorship with high retrievability.

Does Being Cited Matter More Than Being Read?

For the past two decades, content strategies have been built around generating clicks, with teams refining headlines, descriptions, and formats to encourage users to visit their pages and engage directly with their work.

The visit itself served as the primary measure of success, which made traffic a reliable proxy for impact. In AI-driven experiences, that step is often removed because answers are formed within the interface before a user considers visiting a website, which fundamentally changes what visibility looks like.

Being read becomes less important than being cited, since citations now act as the mechanism through which influence is established. When content is consistently used to construct answers, it shapes user decisions even without a measurable visit, which makes its impact harder to track but no less significant.

Content that is not used in this way becomes effectively invisible, regardless of how much effort was invested in creating it.

This shift disrupts the feedback loop that marketers have relied on for years, since traffic is no longer a reliable indicator of presence or influence, even though many teams continue to optimize for it.

Distribution Wins

Challenging the idea that better work leads to better outcomes is uncomfortable because it runs counter to a belief that has been widely accepted for a long time. The ability to write excellent content still plays a role, but it is no longer the primary driver of success, and overinvesting in it while neglecting other factors is becoming a strategic risk (depending on how strong your brand and distribution mechanisms are).

Distribution has taken on a more important role, although it needs to be understood in a broader sense than traditional concepts like social reach or link building. In an AI-driven search ecosystem, distribution refers to how information exists across a network of sources that inform and validate what systems retrieve and use.

This includes being referenced across multiple trusted platforms, appearing in formats that are easy for machines to interpret, reinforcing consistent narratives about your brand, and showing up in places where systems look for confirmation.

The goal is to create alignment between what you publish and how systems evaluate credibility, relevance, and usefulness. It is entirely possible to produce an exceptional piece of content and still underperform if it exists in isolation, while a network of average content that is widely distributed and consistently reinforced can outperform it.

Content Needs To Do More Than ‘Be Read’

Great content that is not surfaced has no meaningful impact, which highlights a shift that many teams are still coming to terms with.

Quality continues to matter because weak content cannot sustain visibility over time, but the threshold for what qualifies as good enough is lower than many assume, especially when compared to the level of effort being invested.

Once that threshold is met, positioning becomes the factor that determines whether content is retrieved, cited, and embedded into answers or ignored entirely.

This reflects a broader change in how outcomes are determined, since effort no longer has a clear or direct relationship with results.

Alignment with systems on the platforms where content exists now plays a larger role, which requires a different way of thinking about strategy.

What This Means In Practice

A strategy that focuses only on improving content quality addresses only part of the challenge and leaves a significant opportunity untapped, particularly as AI continues to shape more of the user journey.

It becomes essential to consider how easily content can be extracted and reused, where ideas are reinforced outside of owned platforms, whether structure supports both human understanding and machine interpretation, and how consistently narratives appear across the broader ecosystem.

This shift also requires rethinking how success is measured, since influence can increase without a corresponding rise in traffic, which can feel uncomfortable for teams that are used to clear attribution models.

The goal is not to abandon quality but to recognize that it is no longer sufficient on its own, and that positioning needs to be treated as a core component of strategy.

More Resources:


Featured Image: Roman Samborskyi/Shutterstock

The Facts About Google Click Signals, Rankings, And SEO via @sejournal, @martinibuster

Clicks as a ranking-related signal have been a subject of debate for over twenty years, although nowadays most SEOs understand that clicks are not a direct ranking factor. The simple truth about clicks is that they are raw data and, surprisingly, processed with some similarity to human rater scores.

Clicks Are A Raw Signal

The DOJ Antitrust memorandum opinion from September 2025 mentions clicks as a “raw signal” that Google uses. It also categorizes content and search queries as raw signals. This is important because a raw signal is the lowest-level data point which is processed into higher level ranking signals or used for training a model like RankEmbed and its successor, RankEmbedBERT.

Those are considered raw signals because they are:

  • Directly observed
  • But not yet interpreted or used for training data

The DOJ document quotes professor James Allan, who gave expert testimony on behalf of Google:

“Signals range in complexity. There are “raw” signals, like the number of clicks, the content of a web page, and the terms within a query.

…These signals can be created with simple methods, such as counting occurrences (e.g., how many times a web page was clicked in response to a particular query). Id.
at 2859:3–2860:21 (Allan) (discussing Navboost signal) “

He then contrasts the raw signals with how they are processed:

“At the other end of the spectrum are innovative deep-learning models, which are machine-learning models that discern complex patterns in large datasets.

Deep models find and exploit patterns in vast data sets. They add unique capabilities at high cost.”

Professor Allan explains that “top-level signals” are used to produce the “final” scores for a web page, including popularity and quality.

Raw Signals Are Data To Be Further Processed

Navboost is mentioned several times in the September 2025 antitrust document as popularity data. It’s not mentioned in the context of clicks having a ranking effect on individal sites.

It’s referred to as a way to measure popularity and intent:

“…popularity as measured by user intent and feedback systems including Navboost/Glue…”

And elsewhere, in the context of explaining why some of the Navboost data is privileged:

“They are ‘popularity as measured by user intent and feedback systems including Navboost/Glue’…”

In the context of explaining why some of the Navboost data is privileged:

“Under the proposed remedy, Google must make available to Qualified Competitors …the following datasets:

1. User-side Data used to build, create, or operate the GLUE statistical model(s);

2. User-side Data used to train, build, or operate the RankEmbed model(s); and

3. The User-side Data used as training data for GenAI Models used in Search or any GenAI Product that can be used to access Search.

Google uses the first two datasets to build search signals and the third to train and refine the models underlying AI Overviews and (arguably) the Gemini app.”

Clicks, like human rater scores, are just a raw signal that is used further up the algorithm chain to train AI models to better able match web pages to queries or to generate a quality or relevance signal that is then added to the rest of the ranking signals by a ranking engine or a rank modifier engine.

70 Days Of Search Logs

The DOJ document makes reference to using 70 days of search logs. But that’s just eleven words in a larger context.

Here is the part that is frequently quoted:

“70 days of search logs plus scores generated by human raters”

I get it, it’s simple and direct. But there is more context to it:

“RankEmbed and its later iteration RankEmbedBERT are ranking models that rely on two main sources of data: [Redacted]% of 70 days of search logs plus scores generated by human raters and used by Google to measure the quality of organic search results.”

The 70 days of search logs are not click data used for ranking purposes in Google, AI Mode, or Gemini. It’s data in aggregate that is further processed in order to train specialized AI models like RankEmbedBERT that in turn rank web pages based on natural language analysis.

That part of the DOJ document does not claim that Google is directly using click data for ranking search results. It’s data, like the human rater data, that’s used by other systems for training data or to be further processed.

What Is Google’s RankEmbed?

RankEmbed is a natural language approach to identifying relevant documents and ranking them.

The same DOJ document explains:

“The RankEmbed model itself is an AI-based, deep-learning system that has strong natural-language understanding. This allows the model to more efficiently identify the best documents to retrieve, even if a query lacks certain terms.”

It’s trained on less data than previous models. The data partially consists of query terms and web page pairs:

“…RankEmbed is trained on 1/100th of the data used to train earlier ranking models yet provides higher quality search results.

…Among the underlying training data is information about the query, including the salient terms that Google has derived from the query, and the resultant web pages.”

That’s training data for training a model to recognize how query terms are relevant to web pages.

The same document explains:

“The data underlying RankEmbed models is a combination of click-and-query data and scoring of web pages by human raters.”

It’s crystal clear that in the context of this specific passage, it’s describing the use of click data (and human rater data) to train AI models, not to directly influence rankings.

What About Google’s Click Ranking Patent?

Way back in 2006 Google filed a patent related to clicks called, Modifying search result ranking based on implicit user feedback. The invention is about the mathematical formula for creating a “measure of relevance” out of the aggregated raw data of clicks (plural).

The patent distinguishes between the creation of the signal and the act of ranking itself. The “measure of relevance” is output to a ranking engine, which then can add it to existing ranking scores to rank search results for new searches.

Here’s what the patent describes:

“A ranking Sub-system can include a rank modifier engine that uses implicit user feedback to cause re-ranking of search results in order to improve the final ranking
presented to a user of an information retrieval system.

User selections of search results (click data) can be tracked and transformed into a click fraction that can be used to re-rank future search results.”

That “click fraction” is a measure of relevance. The invention described in the patent isn’t about tracking the click; it’s about the mathematical measure (the click fraction) that results from combining all those individual clicks together. That includes the Short Click, Medium Click, Long Click, and the Last Click.

Technically, it’s called the LCIC (Long Click divided by Clicks) Fraction. It’s “clicks” plural because it’s making decisions based on the sums of many clicks (aggregate), not the individual click.

That click fraction is an aggregate because:

  • Summation:
    The “first number” used for ranking is the sum of all those individual weighted clicks for a specific query-document pair.
  • Normalization:
    It takes that sum and divides it by the total count of all clicks (the “second number”).
  • Statistical Smoothing:
    The system applies “smoothing factors” to this aggregate number to ensure that a single click on a “rare” query doesn’t unfairly skew the results, especially for spammers.

That 2006 patent describes it’s weighting formula like this:

“A base LCC click fraction can be defined as:

LCC_BASE=[#WC(Q,D)]/[#C(Q,D)+S0)

where iWC(Q.D) is the sum of weighted clicks for a query URL…pair, iC(Q.D) is the total number of clicks (ordinal count, not weighted) for the query-URL pair, and S0 is a smoothing factor.”

That formula describes summing and dividing the data from many users to create a single score for a document. The “query-URL” pair is a “bucket” of data that stores the click behavior of every user who ever typed that specific query and clicked that specific search result. The smoothing factor is the anti-spam part that includes not counting single clicks on rare search queries.

Even way back in 2006, clicks is just raw data that is transformed further up the chain across multiple stages of aggregation, into a statistical measure of relevance before it ever reaches the ranking stage. In this patent, the clicks themselves are not ranking factors that directly influence whether a site is ranked or not. They were used in aggregate as a measure of relevance, which in turn was fed into another engine for ranking.

By the time the information reaches the ranking engine, the raw data has been transformed from individual user actions into an aggregate measure of relevance.

  • Thinking about clicks in relation to ranking is not as simple as clicks drive search rankings.
  • Clicks are just raw data.
  • Clicks are used to train AI systems like RankEmbedBert.
  • Clicks are not directly influencing search results. They have always been raw data, the starting point for systems that use the data in aggregate to create a signal that is then mixed into ranking decision making systems at Google.
  • So yes, like human rater data, raw data is processed to create a signal or to train AI systems.

Read the DOJ memorandum in PDF form here.

Read about four research papers about CTR.

Read the 2006 Google patent, Modifying search result ranking based on implicit user feedback.

Featured Image by Shutterstock/Carkhe

3 things Michelle Kim is into right now

Isegye Idol

If you thought K-pop was weird, virtual idolshumans who perform as anime-style digital characters via motion capturewill blow your mind. My favorite is a girl group called Isegye Idol, created by Woowakgood, a Korean VTuber (a streamer who likewise performs as a digital persona). Isegye Idol’s six members are anonymous, which seems to let them deploy a rare breed of honesty and humor. They play games (League of Legends, Go, Minecraft), chitchat, and perform kitschy music that’s somewhere between anime soundtrack and video-­game score. It’s very DIYand very intimate. And the group’s wild popularity speaks to the mood of Gen Z South Koreans, famously lonely and culturally adriftstruggling to find work, giving up on dating, trying to find friendships online. Isegye Idol shows what a magical online universe people can build when reality stops working for them.

Mr. Nobody Against Putin

Pavel Talankin didn’t have the easiest life as a schoolteacher in the copper-­smelting town of Karabash, Russia; UNESCO once called it the most toxic place on Earth. But video he shot, partially in secret, makes it clear he loved itthe smokestacks, the cold, the ice mustache he’d get walking around outside, and, most of all, his bright-eyed students. That makes it all the more painful when a distant, grinding war and state propaganda change the town. An antiwar progressive with a democracy flag in his classroom, Talankin had to deal with a new patriotic curriculum, mandatory parades, visits from mercenariesand the loss of the creative space he’d built with his students. Talankin’s footage tells his story in this Oscar-winning documentary from director David Borenstein, and what struck me most is how strange it is being an adult around kids. We shape them in profound ways we might not even recognize.

Repertoire by James Acaster

I am the kind of person who will pay $150 to watch a comedian in a smelly theater in San Francisco that charges $20 for a can of waterbecause I am crazy enough to hope that standup will not die. In February, I saw the British comedian James Acaster perform live … and it was a mediocre show. But Repertoire, his 2018 miniseries on Netflix, is gold. Shot shortly after Acaster went through a breakup, the four-part show features him portraying, among other characters, a cop who goes undercover as a standup comedian, forgets who he is, and gets divorced. And then things get weird. “What if every relationship you’ve ever been in,” Acaster asks, “is somebody slowly figuring out they didn’t like you as much as they hoped they would?” If the best comedy comes from paying attention to the hellhole that you’re in, I wish Acaster many more pitfalls.