Google Won’t Act On Spam Reports If They Contain Personal Information via @sejournal, @martinibuster

Google updated their spam reporting documentation to make it clearer that spam reports are not wholly confidential and that it’s possible for personal identifiable information to be shared with the sites receiving a manual action.

Change In Response To Feedback

Google’s changelog noted that they were updating the spam reporting form based on feedback they’d received about personal information contained in the spam report that is shared with spammy sites that receive a manual action (formerly known as a penalty).

The update contains a new notice that spam reports containing personal information will not be processed.

The changelog noted:

“Clarifying when and why we may take manual action based on spam reports
What: Further clarified when and why we may take manual action based on spam reports.
Why: To address feedback we received about the change on using spam reports to take manual action.”

Google removed the following from their documentation:

“If we issue a manual action, we send whatever you write in the submission report verbatim to the site owner to help them understand the context of the manual action. We don’t include any other identifying information when we notify the site owner; as long as you avoid including personal information in the open text field, the report remains anonymous.”

The above wording was replaced with the following:

“Don’t include any personally identifying information in your submission. To comply with regulations, we must send the submission text to the site owner to help them understand the context of a manual action, if one is issued.

Because of this, we won’t process your submission if we determine it contains personally identifying information to protect privacy. Not including such information fully ensures your information is safe and prevents your submission from being discarded.”

Action Moving Forward

On the one hand it’s good that Google won’t proceed with a manual action if the report contains personal information. This means that if you’re submitting spam reports to Google, don’t name your site, business name, personal name or anything else that you don’t want the affected spammer to know.

Read the updated documentation here:

Report spam, phishing, or malware

Learn more about Google’s spam reporting tool: Google Just Made It Easy For SEOs To Kick Out Spammy Sites

Featured Image by Shutterstock/andre_dechapelle

The Real Reason Your SEO Team Hasn’t Made The AI Transition Yet via @sejournal, @DuaneForrester

This series has spent five articles mapping what the AI search transition requires of your team, your content, your technical infrastructure, and your strategic framing. This piece addresses the question those five articles don’t answer: How do you actually make the organizational shift happen?

Most teams won’t fail here because they lack vision. The failure mode is execution, specifically the gap between knowing change is necessary and building the structure that makes it real.

The Transition Problem Is A People Problem, Not A Technology Problem

Only about 30% of enterprise SEO teams have restructured roles and responsibilities as a result of AI implementation. That means roughly 70% of teams who understand the shift intellectually haven’t made a structural move yet. The tools exist. The research is available. The urgency is visible in the data. And most teams are still running the same org chart they had three years ago.

This isn’t a strategic failure. It’s a change management failure, and it has a predictable shape. Three stall patterns show up consistently.

Analysis paralysis is the team that has attended every conference session, read every report, and built a compelling internal case, but can’t commit to a starting point because the landscape keeps shifting. The logic feels defensible: Why restructure when the platform behavior might change next quarter? The answer is that waiting for stability in an unstable environment isn’t patience. It’s avoidance dressed up as diligence.

Pilot purgatory is more widespread than most leaders want to admit. A survey of 200 U.S. marketing leaders found that 82% of teams using AI for campaigns are still operating in pilot or experimental mode, with 61% using AI only at the individual level rather than building it into collaborative team workflows. The pilot never fails cleanly; it just never graduates to production.

Reorg fatigue is the subtlest of the three. Teams that have been through digital transformation cycles carry scar tissue. They’ve watched priority initiatives get announced, resourced, and quietly abandoned when the next priority arrived. When a VP announces a pivot to AI visibility, the team’s first internal question often isn’t how to do it; it’s how long until this one goes away, too. Credibility for this transition requires demonstrating that it’s structurally different from the previous three, which means visible commitment in budget, headcount, and KPI design, not just slide decks.

The Resistance Map

Not all resistance is the same, and treating it as a uniform problem produces uniform failure. Four distinct patterns appear in SEO and marketing teams, each requiring a different response.

Seniority-based resistance sounds like: I’ve been doing this for 15 years, and I know what works. This is often the hardest pattern to address because it’s partly legitimate. Senior practitioners have real pattern recognition that junior team members lack, and they’ve watched enough vendor-driven hype cycles to be appropriately skeptical of any new essential framework. The correct response isn’t to dismiss the experience; it’s to reframe the transition as an addition to what they know, not a replacement of it. As established in the context moat piece earlier in this series, the fundamentals of relevance and trust don’t disappear in an AI search environment. They compound. Senior practitioners who make that conceptual bridge become accelerants, not obstacles.

Skills-based anxiety is a different problem entirely. This person isn’t resisting because they distrust the framework; they’re resisting because they don’t know how to operate inside it. The language of vector indexes, structured data expansion, and retrieval architecture is genuinely foreign to someone who built their career on keyword clustering and link building. A useful diagnostic lens here comes from the ADKAR model, a change management framework developed by Prosci that identifies five sequential conditions an individual needs to reach for change to stick: Awareness, Desire, Knowledge, Ability, and Reinforcement. Skills-based anxiety is almost always a Knowledge or Ability gap, not a motivation problem. Treating it as motivation resistance wastes time and confirms the team member’s fear that leadership doesn’t understand what they’re actually being asked to do.

Political resistance is structural, not personal. If AI visibility expands SEO scope to include retrieval architecture, machine-facing content design, and cross-functional data coordination, someone’s budget conversation changes. Marketing ops, IT, and content teams all have a plausible claim on parts of that expanded scope. This resistance rarely surfaces as direct opposition; it shows up as slow approvals, ambiguous priorities, and repeated requests to align with stakeholders before anything moves. The response requires making budget and ownership decisions explicitly, not hoping that clarity emerges from collaboration.

Legitimate skepticism deserves its own category because it’s the resistance pattern most leaders mishandle. When someone asks to see the revenue connection, that isn’t obstruction; it’s the right question. The answer needs to be honest, which means acknowledging that the measurement infrastructure for AI visibility is still developing. Trying to manufacture certainty in response to legitimate skepticism destroys credibility faster than admitting the gap. Acknowledging where the data is incomplete while demonstrating directional progress is more durable.

Running Both Operations At Once

Most teams can’t switch from traditional SEO to AI visibility operations in a single reorg cycle, and the honest answer is that most won’t need to. The practical reality is a period of parallel operation, where traditional work continues while AI visibility capabilities are built alongside it, and for the majority of organizations, that parallel period won’t resolve into a clean new structure. It will simply become how the team operates. The most common near-term pattern is already visible: The existing SEO gets handed AEO responsibilities alongside their current work, budgets don’t expand to match the expanded scope, and the team figures it out. That state will persist for years in most organizations, and in many it will persist indefinitely. New dedicated roles will emerge at larger organizations and in more competitive verticals, but that’s the exception rather than the rule.

Ultimately, the right allocation isn’t a fixed ratio dropped in from outside your organization; it’s a function of where your current traffic and business value are coming from, and how fast that’s shifting. What research on enterprise AI adoption does confirm is a consistent structural principle: Organizations that successfully scale AI spend the majority of their transition effort on people and process, not on the technology layer itself. That inversion, most attention on tools and least on people, is the primary driver of the pilot purgatory pattern described above. Your capacity allocation decisions need to reflect that. Building a new AI visibility capability on inadequate team development produces a capability that exists on paper and stalls in practice.

Two operational principles matter during the parallel period. First, not all traditional SEO activities need equal intensity to maintain. Technical hygiene, crawl accessibility, and core structured data work protect your existing position and directly support AI retrieval; they aren’t legacy activities to deprioritize. High-volume tactical content production, by contrast, is where capacity can be reallocated toward AI-era work without meaningful risk to current performance. Second, the AI visibility workstream needs dedicated ownership, not shared bandwidth. Work that lives in everyone’s job description at the margin of their other responsibilities doesn’t graduate from pilot mode. Someone needs to own the new work as a primary accountability.

Sequencing The Role Transitions

Not all roles change at the same time, and trying to restructure everything simultaneously is how reorg fatigue gets manufactured. A phased sequence reduces disruption while building the internal momentum that carries later phases.

Phase one starts with content strategists, because the conceptual bridge is shortest. The move from “what does my audience search for” to “what context does a retrieval model need to surface my content accurately” is an extension of existing thinking, not a departure from it. As covered in the roles series, this is the capability layer with the most upskilling potential and the least new-hire dependency. Start here, build early wins, and let the internal success story carry credibility into subsequent phases.

Phase two moves to technical SEOs, who face a more demanding knowledge transition. Vector index hygiene, structured data expansion beyond standard schema implementations, and crawl accessibility for AI bots require genuine new technical literacy, and not every existing practitioner will choose to develop it. This is where the upskill-versus-hire question starts to get real, and more on that in the next section. The technical SEO role isn’t disappearing, but its scope is expanding in directions that require deliberate investment.

Phase three introduces roles that may not yet exist on your team: an AI visibility analyst responsible for monitoring retrieval inclusion and brand representation, and someone focused on machine-facing content architecture. These may start as partial responsibilities before they justify dedicated headcount, but they need to exist as named functions with owners before the measurement conversation in phase four can work.

Phase four restructures reporting lines and performance metrics to reflect the new operating model. Teams held accountable to AI visibility outcomes, while their performance reviews are built entirely around traditional organic traffic metrics, produce the behavior you’d expect: compliance theater. This phase shouldn’t wait until phase three is complete; it should be designed in phase one and communicated clearly so the team understands what the finish line looks like from the start.

The Training Investment Decision

Whether to upskill existing team members or hire new ones is often framed as a budget decision. It’s actually a knowledge gap assessment.

If the gap is conceptual, covering how retrieval works, how AI models use structured data, how community signals feed into model training as discussed in the community signals piece, invest in training. These are learnable frameworks, and experienced practitioners who understand the underlying logic of traditional SEO have strong transfer potential. Analysis of more than 10,000 SEO job postings shows a 21% year-over-year increase in AI-related skill requirements, which reflects real employer demand but also signals that the market expects existing practitioners to develop these capabilities, not that companies are replacing their teams wholesale.

If the gap is technical execution, building APIs, working directly with embedding architectures, constructing systems that require software engineering background, the calculus shifts toward hiring or contracting. This is specialized enough that the training timeline to bring an existing practitioner to production competency may exceed the cost and speed of hiring someone who already has it.

A practical diagnostic for each capability gap: ask whether a competent practitioner with your team’s existing background could reach working proficiency in 90 days with focused investment. If yes, train. If the honest answer is longer, or if the gap requires a completely different mental model of how software systems work, consider hiring. The important discipline here is answering honestly rather than answering in the direction of what’s cheaper.

Measuring The Transition Itself

The transition needs its own measurement framework, separate from the visibility metrics the transition is designed to improve. Without it, leadership has no way to distinguish between a team that is genuinely progressing and a team that is performing progress.

Leading indicators tell you whether the structural shift is actually happening: team fluency with retrieval concepts verified through practical exercises rather than self-reporting, the number of AI visibility experiments in active testing rather than sitting in a backlog, and cross-functional collaboration frequency between SEO, content, and technical teams on AI-era work.

Lagging indicators connect to the outcomes the transition is meant to produce: Brand citation share in AI-generated responses, retrieval inclusion rates across major platforms, and the accuracy of brand representation when your content is surfaced. The framework for approaching these metrics was laid out in the GenAI KPIs piece, and the methodology there applies directly to the lagging indicators here.

The honest acknowledgment is that standardized measurement infrastructure for AI visibility is still developing. The industry hasn’t produced the equivalent of what organic search has in terms of agreed-upon tracking methodology. That isn’t a reason to defer the transition; it’s a reason to document your own methodology consistently from the start, so you’re building a proprietary baseline as standards eventually emerge. Companies that begin measuring now, even imperfectly, will have comparative data that teams starting eighteen months from now won’t be able to reconstruct.

A 90-day scorecard for the transition itself should include: at least one role with formal AI visibility responsibilities assigned, a named owner for the dual operating model, at least two active retrieval experiments generating learning data, and a completed skills gap assessment for every team member against the phase three role definitions. None of those are visibility metrics. They’re execution metrics, and execution is where most transitions fail.

Who Wins?

The organizations that navigate this transition successfully won’t be the ones with the clearest vision of what AI search requires. They’ll be the ones that converted that vision into structure: named owners, phased timelines, honest skills assessments, and measurement that tracks the work before it tracks the outcomes. Vision is table stakes, and every team reading this already has it. The ones that pull ahead will be the ones that open Mondays with a plan.

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: GaudiLab/Shutterstock; Paulo Bobita/Search Engine Journal

Why Google Has Changed & Who’s Really Paying for It

Money, obviously. But it’s deeper than that.

Google’s market share has broadly held firm in the wake of everything AI. By held firm, I mean its share price has gone through the roof, and its AI offering is growing ever stronger.

Google's stock price in the last 5 years
Happy, happy shareholders. Sad, sad people. (Image Credit: Harry Clarkson-Bennett)

But I don’t think all is as rosy as it seems.

Google’s search product isn’t addictive – as much as they’re trying to change that. Nobody hangs out there except saddos like us. And audiences – particularly younger ones – have options.

They’re turning away from more traditional methods of information retrieval, and that’s a big problem. Even for Google.

Google audience share over ther last 36 months using Similarweb data
Google’s worldwide audience share by age group (Image Credit: Harry Clarkson-Bennett)

Even the search engine giant isn’t immune.

Older audiences – those already ingrained in the system – are taking up a larger percentage of their audience. The younger ones have more exciting and addictive options, and best believe they’re using them to find stuff.

Engagement data to Google.com broken down by age group
Worldwide Google engagement data broken down by age group (Image Credit: Harry Clarkson-Bennett)

Across every engagement metric, 18-24-year-olds have deteriorated faster than 65+ users over the same period. Shorter visit duration, fewer pages per visit, and a worse bounce rate. And it’s declining more rapidly with younger audiences.

Evolution for Google and the wider web is a necessity.

Although interesting to note that the 18-24 year old audience share has only suffered a small decline according to Similarweb data. The real losses were in the 25-34 cohort.

TL;DR

  1. The publishing industry and Google have more in common than perhaps either of us cares to admit.
  2. The changes Google has made are a very deliberate effort to engage with – and retain – younger audiences. Audiences who behave differently.
  3. Engagement data on news websites (pages per visit, bounce rate, and time on site) declines with audience age. Exactly the same is true of Google.
  4. AI Mode is Google’s attempt to create a “sticky” product. One aimed at younger audiences.

What’s Changed?

Well, the obvious:

Just look at the SERP for almost any term, particularly middle-of-the-funnel comparison ones.

Google SERP for 'best carpet cleaners'
You can’t move for video, which I sort of hate (Image Credit: Harry Clarkson-Bennett)

What people apparently want is not very publisher, or legacy-search-friendly. What they want is video.

Particularly the youth.

Right now, it’s feasible children spend almost four hours per day watching video on YouTube and TikTok. Four hours. That same group spends just four minutes on publisher websites.

The younger you are, the more time you spend watching, the less you spend reading. So the obvious counter (from a company that primarily organizes written content) is to saturate the market with video content.

Obviously, it’s very helpful if you own the market.

And this doesn’t just affect organic search. Adverts are more expensive to run because AIOs have destroyed the entire search ecosystem’s click-through rates. So for almost all businesses, customer acquisition is more expensive.

You could say that’s Google’s way of paying for AIOs – a far more expensive SERP to generate  due to the massive computational power and energy needed to run large language models (LLMs).

But I am not going to insinuate anything of the sort. It would be incomprehensible to me that the guys who earn the entire ad and search market would make the ad side of the business more expensive to run to pay for their search experiments.

Wait a minute…

Why Now?

I think this is a direct response to two things;

  1. The 2023 Code Red Google sent out in response to OpenAI.
  2. Younger audiences shifting information retrieval methods.

One is obvious.

OpenAI forced Google to move quicker than they would’ve liked. Hence, all the absolute trash in AI Overviews in the beginning. Well, and sort of now. It smacked of a product that hadn’t gone through the required amount of rigorous testing.

Two is more nuanced.

Google website traffic by age group
The youngest demographic spends less time on search (Image Credit: Harry Clarkson-Bennett)

This data correlates almost perfectly with the Similarweb data I pulled. In isolation, this may not be a problem. Could be as simple as saying younger audiences will grow into it.

But I don’t think that argument works. We see it in news and publishing. We are living through it, and we’re watching the decline in real time.

Younger audiences have the highest recorded screen time on record (globally, 7 hours 22 minutes), but are spending less and less time reading. More on far more visually engaging, stimulating, and addictive technologies.

Based on screen time alone, younger audiences should spend the most time on Google. But they don’t. I’m sure that is blatantly obvious to the Googlers.

Proportion who say they prefer watching or listening to the news by age group
Reuters – Understanding Younger Audiences (Image Credit: Harry Clarkson-Bennett)

While content consumption is at an all-time high, the way a person consumes content is not conducive to more traditional publishing practices.

Just 4 minutes a day on news websites for younger audiences vs. 18 minutes for the over 55s. A 350% increase.

The same principle is true of more traditional search.

At the risk of sounding a bit too AI-y, this is a really seismic shift. Ironically, not one driven by AI. Not entirely. One driven by a combination of big tech’s insatiable appetite for money, a lack of trust in more traditional brands, and the rise of the creator ecosystem.

And AI, obviously.

As someone in the comments said, Google is Unc. Maybe a little like news websites. Their ability to attract younger audiences has diminished.

Audience share by age group based on 6 top UK publishers - anonymised SImilarweb data
Similarweb publisher data – last 24 months (using six major UK publishers) (Image Credit: Harry Clarkson-Bennett)

I think we can clearly correlate the changes Google has made to the reduction in the younger audience share for publishers. A generation less inclined to click.

One could argue that the traffic losses so many seem to have suffered are almost exclusively from younger audiences. I certainly am.

Audiences more likely to adopt new technologies – particularly flashy ones.

There Are Clear Parallels Between News And Search

Google has gotten richer, as has the AI bubble. All that money has to come from somewhere.

It’s everyone else who struggles.

These changes are designed to counter a younger generation’s shift toward people and ultra-engaging platforms that encourage passive or more incidental methods of information retrieval.

Since 2015, interest in news has declined – more significantly (43%) in 18-24-year-olds than in any other age group. And just 64% of 18-24-year-olds consume news on a daily basis, compared with 87% of people 55 and over.

Proportion very or extremely interested in news
Reuters – Understanding Younger Audiences (Image Credit: Harry Clarkson-Bennett)

Historically, news has been sought out.

Either you browsed a news website (a real paper if you felt fancy) or you searched for it. But the discovery layer changed, and search – the engine that powered the volume-driven publishing model for two decades – is responding.

Responding to younger audiences’ shifting consumption habits. Just like publishers and websites will have to.

Proportion that say social media is their main form of news over time
Reuters – Digital News Report 2025 (Image Credit: Harry Clarkson-Bennett)

Passive consumption is just the norm now with younger audiences. This is why 44% of 18-24-year-olds see social media as the main source of news, compared to just 15% of 55+.

They expect you to just appear. Algorithmic consumption has reduced the need, want, and desire to actively seek something out. If what you serve isn’t delivered directly to their feed, you don’t exist.

Combine this with diminishing trust in more traditional brands, zero-click searches, and the rise of the creator, and you can see why publishers and Google are having to change.

There have been alternatives to Google when it comes to accessing and retrieving information – Instagram, Amazon, YouTube, et al., for years.

Really, this is, or has been, Search Everywhere Optimization. It has been around for a decade. It is also, IMO, why reframing SEO as GEO or some other BS because of LLMs is so moronic.

Dave Jorgenson on TikTok
Views for The Washington Post’s YouTube channel dropped by 85% from its peak in April (54 million views) to 8.2 million views in September 2025, two months after Jorgenson’s exit. (Image Credit: Harry Clarkson-Bennett)

And now the individual has become the competition. The creator economy – soon to be worth $480 billion – has produced a new class of competitor: individuals with direct audience relationships, authentic voices, and none of the structural cost of a legacy newsroom.

51% of 18-24-year-olds pay attention to creators and personalities, compared to 39% who pay attention to traditional media and journalists – a 12 PP inversion.

And this is a problem for Google, too. People used to use their organizational skills to satisfy all of their needs. Now, it is so heavily navigational that it’s hard to know how much “new” stuff people really use it for.

Outside of news, at least, ironically.

Will This Work?

If it’s anything like news publishers, their primary concern is to continually generate new and engaged audiences with habitual products. AI Mode could absolutely be that product. Discover is their version of a social network. They are, in their own way, engaging products.

Although the low intent nature of Discover makes the advertising rubbish, and Google not really care about it. Sad, but true.

Like Google, the engagement data for publishers tells a pretty bleak story.

Engagement data from SImilarweb based on 6 top UK publishers
Similarweb publisher data (using six major UK publishers) (Image Credit: Harry Clarkson-Bennett)

If we isolate this to the youngest and oldest audience, it’s pretty clear what is going on.

Pages per visit, bounce rate and time on site by old and young audiences - based on 6 top publishers
(Image Credit: Harry Clarkson-Bennett)

Younger audiences:

  • Are far less engaged with the traditional news offering than older audiences.
  • Use these (and any) websites differently.

There’s no denying that younger audiences have more diverse and engaging options. This means they use websites like news publishers differently. To fact-check. To confirm something isn’t just spurious BS. To scan and skim.

The same is true of Google. Less of a discovery journey. More one of fact-checking and navigational searching.

Now, I’m not insinuating that older audiences get stuck with adverts and can’t use a menu. That can’t account for an extra 14 minutes of time spent on news websites.

But having watched my mother with a computer, it’s not impossible.

So, What’s The Answer?

To lean into what the new generation likes. Adapt and evolve.

Recommendations slide from the FT Strategies x WAN IFRA News Creator Project
Exec summary from WAN-FRA x the FT Strategies News Creator Project (Image Credit: Harry Clarkson-Bennett)

The same is true for search (internally and externally) and publishers. If you work for Google, it makes complete sense you would try to expand your video presence in the SERP and prioritize “quality” UGC.

The quality part is lacking as most of the internet – as we’re finding out – is a stinking pile of garbage.

But notoriously, the tide is tricky to swim against.

For publishers, it means working with creators, leveraging their audiences and ability to deliver things quickly. Differently. And creators can benefit from the trust associated with proper news organizations.

Is it that unreasonable to think Google should do the same?

Instead of abusing their position, they could start by giving people an idea of the impact of AIOs and AI Mode. I’m not a financial guru, but I reckon Google has enough money to build and foster creator and publisher programs that are not one-sided. That brings genuine value to people and the wider information retrieval ecosystem.

In this scenario, everyone benefits. When AI companies refuse to pay for publisher content, everyone loses.

  • LLMs lose because they have less unique, human-created, quality content to train on.
  • Publishers lose because they are forced to suppress their visibility and don’t get any money.
  • Users lose because the end output isn’t as good.

Model collapse is on the horizon. AI learning on AI falsehoods. A repetitive cycle of garbage. Joyous.

Lily Ray's AI Slop Loop
Lily Ray called it the AI Slop Loop, which has a nice, albeit bleak ring to it (Image Credit: Harry Clarkson-Bennett)

These companies should invest in the ecosystems that built them. Particularly Google.

For publishers:

  1. Build owned channels. Get away from relying on big tech.
  2. Create brilliant, unique journalism.
  3. Supplement it with habit-forming products – puzzles being the obvious example.
  4. Build and sponsor audio and video programs that reach your intended audience.
  5. Implement channel-specific strategies.

Even the New York Times doesn’t rely solely on subscriptions from written content. Not by a long shot. It isn’t enough.

Inside The New York Times Business Model: How Bundling Saved Journalism
They’re as diverse and resilient as any publisher (Image Credit: Harry Clarkson-Bennett)

Final Thoughts

Unfortunately, I think the recent spate of job losses in the publishing industry is just the beginning. Bauer, the BBC, The Washington Post. It’s not UK or SEO-specific. 100,000 roles are becoming 70,000 ones. Teams are shrinking. And there are real-world ramifications.

We are not in a good moment. Some of this can be attributed to AI. But I think more of it is due to longer-term economic difficulties, audiences switching off from traditional news, and things like the Site Reputation Abuse update destroying much-needed revenue lines overnight.

It is hard to make these businesses profitable. Google doesn’t have that problem. But they’re not immune to changing behaviors and becoming yesterday’s news either.

Should you be enough of a psychopath, you can follow the job cuts via this updated Press Gazette article.

More Resources:


Read Leadership In SEO. Subscribe now.


Featured Image: Roman Samborskyi/Shutterstock

The Facts About Google Click Signals, Rankings, And SEO via @sejournal, @martinibuster

Clicks as a ranking-related signal have been a subject of debate for over twenty years, although nowadays most SEOs understand that clicks are not a direct ranking factor. The simple truth about clicks is that they are raw data and, surprisingly, processed with some similarity to human rater scores.

Clicks Are A Raw Signal

The DOJ Antitrust memorandum opinion from September 2025 mentions clicks as a “raw signal” that Google uses. It also categorizes content and search queries as raw signals. This is important because a raw signal is the lowest-level data point which is processed into higher level ranking signals or used for training a model like RankEmbed and its successor, RankEmbedBERT.

Those are considered raw signals because they are:

  • Directly observed
  • But not yet interpreted or used for training data

The DOJ document quotes professor James Allan, who gave expert testimony on behalf of Google:

“Signals range in complexity. There are “raw” signals, like the number of clicks, the content of a web page, and the terms within a query.

…These signals can be created with simple methods, such as counting occurrences (e.g., how many times a web page was clicked in response to a particular query). Id.
at 2859:3–2860:21 (Allan) (discussing Navboost signal) “

He then contrasts the raw signals with how they are processed:

“At the other end of the spectrum are innovative deep-learning models, which are machine-learning models that discern complex patterns in large datasets.

Deep models find and exploit patterns in vast data sets. They add unique capabilities at high cost.”

Professor Allan explains that “top-level signals” are used to produce the “final” scores for a web page, including popularity and quality.

Raw Signals Are Data To Be Further Processed

Navboost is mentioned several times in the September 2025 antitrust document as popularity data. It’s not mentioned in the context of clicks having a ranking effect on individal sites.

It’s referred to as a way to measure popularity and intent:

“…popularity as measured by user intent and feedback systems including Navboost/Glue…”

And elsewhere, in the context of explaining why some of the Navboost data is privileged:

“They are ‘popularity as measured by user intent and feedback systems including Navboost/Glue’…”

In the context of explaining why some of the Navboost data is privileged:

“Under the proposed remedy, Google must make available to Qualified Competitors …the following datasets:

1. User-side Data used to build, create, or operate the GLUE statistical model(s);

2. User-side Data used to train, build, or operate the RankEmbed model(s); and

3. The User-side Data used as training data for GenAI Models used in Search or any GenAI Product that can be used to access Search.

Google uses the first two datasets to build search signals and the third to train and refine the models underlying AI Overviews and (arguably) the Gemini app.”

Clicks, like human rater scores, are just a raw signal that is used further up the algorithm chain to train AI models to better able match web pages to queries or to generate a quality or relevance signal that is then added to the rest of the ranking signals by a ranking engine or a rank modifier engine.

70 Days Of Search Logs

The DOJ document makes reference to using 70 days of search logs. But that’s just eleven words in a larger context.

Here is the part that is frequently quoted:

“70 days of search logs plus scores generated by human raters”

I get it, it’s simple and direct. But there is more context to it:

“RankEmbed and its later iteration RankEmbedBERT are ranking models that rely on two main sources of data: [Redacted]% of 70 days of search logs plus scores generated by human raters and used by Google to measure the quality of organic search results.”

The 70 days of search logs are not click data used for ranking purposes in Google, AI Mode, or Gemini. It’s data in aggregate that is further processed in order to train specialized AI models like RankEmbedBERT that in turn rank web pages based on natural language analysis.

That part of the DOJ document does not claim that Google is directly using click data for ranking search results. It’s data, like the human rater data, that’s used by other systems for training data or to be further processed.

What Is Google’s RankEmbed?

RankEmbed is a natural language approach to identifying relevant documents and ranking them.

The same DOJ document explains:

“The RankEmbed model itself is an AI-based, deep-learning system that has strong natural-language understanding. This allows the model to more efficiently identify the best documents to retrieve, even if a query lacks certain terms.”

It’s trained on less data than previous models. The data partially consists of query terms and web page pairs:

“…RankEmbed is trained on 1/100th of the data used to train earlier ranking models yet provides higher quality search results.

…Among the underlying training data is information about the query, including the salient terms that Google has derived from the query, and the resultant web pages.”

That’s training data for training a model to recognize how query terms are relevant to web pages.

The same document explains:

“The data underlying RankEmbed models is a combination of click-and-query data and scoring of web pages by human raters.”

It’s crystal clear that in the context of this specific passage, it’s describing the use of click data (and human rater data) to train AI models, not to directly influence rankings.

What About Google’s Click Ranking Patent?

Way back in 2006 Google filed a patent related to clicks called, Modifying search result ranking based on implicit user feedback. The invention is about the mathematical formula for creating a “measure of relevance” out of the aggregated raw data of clicks (plural).

The patent distinguishes between the creation of the signal and the act of ranking itself. The “measure of relevance” is output to a ranking engine, which then can add it to existing ranking scores to rank search results for new searches.

Here’s what the patent describes:

“A ranking Sub-system can include a rank modifier engine that uses implicit user feedback to cause re-ranking of search results in order to improve the final ranking
presented to a user of an information retrieval system.

User selections of search results (click data) can be tracked and transformed into a click fraction that can be used to re-rank future search results.”

That “click fraction” is a measure of relevance. The invention described in the patent isn’t about tracking the click; it’s about the mathematical measure (the click fraction) that results from combining all those individual clicks together. That includes the Short Click, Medium Click, Long Click, and the Last Click.

Technically, it’s called the LCIC (Long Click divided by Clicks) Fraction. It’s “clicks” plural because it’s making decisions based on the sums of many clicks (aggregate), not the individual click.

That click fraction is an aggregate because:

  • Summation:
    The “first number” used for ranking is the sum of all those individual weighted clicks for a specific query-document pair.
  • Normalization:
    It takes that sum and divides it by the total count of all clicks (the “second number”).
  • Statistical Smoothing:
    The system applies “smoothing factors” to this aggregate number to ensure that a single click on a “rare” query doesn’t unfairly skew the results, especially for spammers.

That 2006 patent describes it’s weighting formula like this:

“A base LCC click fraction can be defined as:

LCC_BASE=[#WC(Q,D)]/[#C(Q,D)+S0)

where iWC(Q.D) is the sum of weighted clicks for a query URL…pair, iC(Q.D) is the total number of clicks (ordinal count, not weighted) for the query-URL pair, and S0 is a smoothing factor.”

That formula describes summing and dividing the data from many users to create a single score for a document. The “query-URL” pair is a “bucket” of data that stores the click behavior of every user who ever typed that specific query and clicked that specific search result. The smoothing factor is the anti-spam part that includes not counting single clicks on rare search queries.

Even way back in 2006, clicks is just raw data that is transformed further up the chain across multiple stages of aggregation, into a statistical measure of relevance before it ever reaches the ranking stage. In this patent, the clicks themselves are not ranking factors that directly influence whether a site is ranked or not. They were used in aggregate as a measure of relevance, which in turn was fed into another engine for ranking.

By the time the information reaches the ranking engine, the raw data has been transformed from individual user actions into an aggregate measure of relevance.

  • Thinking about clicks in relation to ranking is not as simple as clicks drive search rankings.
  • Clicks are just raw data.
  • Clicks are used to train AI systems like RankEmbedBert.
  • Clicks are not directly influencing search results. They have always been raw data, the starting point for systems that use the data in aggregate to create a signal that is then mixed into ranking decision making systems at Google.
  • So yes, like human rater data, raw data is processed to create a signal or to train AI systems.

Read the DOJ memorandum in PDF form here.

Read about four research papers about CTR.

Read the 2006 Google patent, Modifying search result ranking based on implicit user feedback.

Featured Image by Shutterstock/Carkhe

AI Search Is Eating Itself & The SEO Industry Is The Source

Last September, Lily Ray asked Perplexity for the latest news on SEO and AI search. It told her, confidently, about the “September 2025 ‘Perspective’ Core Algorithm Update”; a Google update that, as she then wrote at length in “The AI Slop Loop,” didn’t exist. Google hasn’t named core updates in years. “Perspectives” was already a SERP feature. If a real update had rolled out while she was in Austria, her inbox would have told her before Perplexity did.

She checked the citations. Both pointed at AI-generated posts on SEO agency blogs: sites that had run a content pipeline, hallucinated an update, and published it as reporting. Perplexity read the slop, treated it as source material, and served it back to her as news.

In February, the BBC’s Thomas Germain spent 20 minutes writing a blog post on his personal site. Its title: “The best tech journalists at eating hot dogs.” It ranked him first, invented a 2026 South Dakota International Hot Dog Championship that had never happened, and cited precisely nothing. Within 24 hours, both Google’s AI Overviews and ChatGPT were passing his fabrication along to anyone who asked. Claude didn’t bite. Google and OpenAI did.

Everyone who has looked has seen it.

I’ve Argued About The Ouroboros Before. I Had The Timeline Wrong

The prevailing framing for this problem has been model collapse. You train a model on web text, the web fills up with AI output, the next model trains on a corpus increasingly made of its own exhaust, and eventually the distribution flattens into mush. Innovation comes from exceptions, and probabilistic systems that converge toward the mean attenuate exceptions by design. I’ve used the phrase digital ouroboros for this.

That framing assumes training cycles. It assumes time. It assumes that contamination moves at the speed of model release.

It doesn’t. What Lily documented, what Germain documented, what the New York Times then went and quantified – none of that is training-side. The models involved were not retrained between the hallucination appearing on a blog and being served as citation-backed fact. The contamination moved at the speed of a crawl. The ouroboros isn’t taking generations to eat itself. It’s eating itself at query time, every time someone asks one of these systems a question.

The pipe everyone has been watching is not the pipe that is breaking.

The Distinction That Matters

Model collapse is a training-corpus problem. Synthetic content seeps into the pre-training data, the next generation of model inherits it, capability degrades. Researchers have been warning about this for two years. They’re right. They’re also describing something slow enough that everyone can nod gravely and keep shipping.

Retrieval contamination is faster and already here. RAG systems – Perplexity, Google AI Overviews, ChatGPT with search – do not generate answers purely from parametric memory. They fetch documents from the live web, stuff them into context, and generate a response conditioned on what they retrieved. If the retriever surfaces a hallucinated SEO post, the answer inherits the hallucination. No retraining required.

The academic literature on this is clear. PoisonedRAG (Zou et al., 2024) showed that injecting a small number of crafted passages into a retrieval corpus was sufficient to control the output of a RAG system on targeted queries. BadRAG (Xue et al., 2024) demonstrated the same class of attack using semantic backdoors. Both papers treat this as an adversarial problem: what happens when an attacker deliberately poisons the corpus.

What Germain and Lily accidentally proved is that the adversarial model is the normal operating model. You don’t need a crafted adversarial passage. You need a blog post. The open web is the corpus, and anyone with a domain can write to it.

The Oumi analysis commissioned by the New York Times put numbers on what this costs. Across 4,326 SimpleQA tests, Google’s AI Overviews answered correctly 85% of the time on Gemini 2, 91% on Gemini 3. At Google’s scale – more than five trillion searches a year – a 9% error rate still translates to tens of millions of wrong answers every hour. But the more revealing figure is this: on Gemini 3, 56% of the correct answers were ungrounded, up from 37% on Gemini 2. The upgrade improved surface accuracy and made the citations worse. When the model got something right, more than half the time, the source it pointed to didn’t support the claim.

The retrieval layer is not a filter. It is the infection vector.

Who’s Seeding The Corpus

The industry that has most enthusiastically produced it – and then most enthusiastically written about the consequences of consuming it – is the SEO industry. I’ve written before about content scaling being just content spinning with better grammar, and about the AI visibility tool complex that builds dashboards from the output of non-deterministic systems. This is the same loop, one layer deeper. An SEO agency runs an AI content pipeline because AI Overviews have cut their clients’ traffic. The pipeline publishes speculative “winners and losers” posts during a core update that’s still rolling out, citing nothing. Another agency’s pipeline picks those up as sources. The output floods into the retrieval index. AI Overviews cites one of them. The original agency then writes a case study about how AI Overviews are “surfacing” their content.

An Ahrefs study of over 26,000 ChatGPT source URLs found that “best X” listicles accounted for nearly 44% of all cited page types, including cases where brands rank themselves first against their competitors. Harpreet Chatha told the BBC you can publish “the best waterproof shoes for 2026,” put yourself first, and be cited in AI Overviews and ChatGPT within days. Lily, during the actual March 2026 core update, found AI-generated articles claiming to list winners and losers while the update was still rolling out; articles that opened with filler and listed brands without a single real citation.

The practitioners scaling AI content are also the ones most directly harmed when AI search systems cite that content as fact. Nobody forced this. The industry built the pipeline, fed it, and complained about what came out the other end. Not adversarial poisoning. Just the industry polluting its own water supply and then hiring consultants to test it.

The Tier That Matters

The Oumi study is about AI Overviews, which is free by design. Google AI Overviews reportedly reached over two billion monthly active users by mid-2025. ChatGPT has around 900 million weekly active users, of which roughly 50 million pay. Meaning about 94% of the people interacting with OpenAI’s product are on the free tier.

The paid tiers are better. Per OpenAI’s own launch claims, cited in Lily’s piece, GPT-5.4 is 33% less likely to produce false individual claims than GPT-5.2. The free-tier GPT-5.3 is also improved over its predecessor (26.8% fewer hallucinations with web search, 19.7% fewer without), but it’s still meaningfully less reliable than the paywalled version. Gemini 3, which made AI Overviews more accurate on surface tests, also made the ungrounded rate worse. Better answer, weaker citation.

Nobody seems to mind. The reliable version of the product is paywalled. The version most of the planet gets – including the version at the top of Google Search – can be manipulated by 20 minutes of work on a personal website. Intelligence is the marketing category. What two billion users actually receive is a confident summarization of whatever the crawler happened to find.

Grokipedia As The Terminal State

The accidents of the retrieval layer are one thing. Grokipedia is the version where accident is no longer a useful word.

Elon Musk’s xAI launched Grokipedia on Oct. 27, 2025, with 885,279 articles, all generated or rewritten by Grok. Some of them were lifted from Wikipedia wholesale, with a disclaimer at the bottom acknowledging the CC-BY-SA license; a license Wikipedia maintains precisely because a community of human editors writes and verifies the content. Others were rewritten from scratch. PolitiFact found Grokipedia citations, including Instagram reels as sources, which Wikipedia’s own policies rule out as “generally unacceptable.” Grokipedia’s entry on Canadian singer Feist said her father died in May 2021, citing a 2017 Vice article about Canadian indie rock that made no mention of the death. And her father was still alive when that article was written. The Nobel Prize in Physics entry added an uncited sentence claiming physics is traditionally the first prize awarded at the ceremony, which isn’t true.

Musk said the goal is to “research the rest of the internet, whatever is publicly available, and correct the Wikipedia article.” The rest of the internet now includes the synthetic content produced by every AI content pipeline pointed at it. An AI system reading the open web, rewriting Wikipedia based on what it finds, and presenting the result as a reference work is the retrieval-contamination problem with the feedback loop made explicit and shipped as a product.

By mid-February 2026, Grokipedia had lost most of its Google visibility. Wikipedia outranks Grokipedia for searches about Grokipedia itself.

“This human-created knowledge is what AI companies rely on to generate content; even Grokipedia needs Wikipedia to exist.” – The Wikimedia Foundation

The synthetic encyclopedia is subsidized by the human one. When the subsidy stops, the thing depending on it stops making sense.

Wikipedia is not beyond criticism. Its edit wars, ideological gatekeeping, and systemic gaps in who gets to shape articles are well-documented and real. But the response to a flawed human editorial process is not to remove the humans entirely and call the result an improvement. I’ve written before about the accountability vacuum that opens when you replace human judgment with API calls. Wikipedia’s problems are the problems of a messy, contested, accountable system. Grokipedia’s problems are the problems of a system with no accountability at all.

The Citation Layer Is Decoupling From Authorship

I wrote recently about Reddit selling “Authentic Human Conversation™” to AI companies while the platform’s own moderators report that they can no longer tell which comments are human. The Oumi study found that of 5,380 sources cited by AI Overviews, Facebook and Reddit were the second and fourth most common. The citation layer of the most-used answer engine in the world is substantially built on two platforms that cannot verify the human origin of their own content.

Human creators are pulling out of the open web because the traffic bargain has collapsed. Answer engines are citing content whose authorship cannot be verified, or was never human to begin with. The citation is still there. The thing being cited is not what it used to be.

The ouroboros framing was right. The timeline wasn’t. Retrieval collapse doesn’t wait for the next training run. It needs an indexable URL and a retrieval system willing to trust it.

The systems are willing. And more than half the time they get an answer right, they can’t point to a source that supports what they just told you.

More Resources:


This post was originally published on The Inference.


Featured Image: Anton Vierietin/Shutterstock

Does AI Actually Reward Quality Content?

For well over a decade, SEOs and marketers have debated the importance of high-quality, original content. After just about every major update, the message from Google was clear: If you want to rank, cut it out with the derivative listicles and other quick-churn assets that are big on keywords and light on substance.

More recently, our current understanding of how LLMs select which sources to cite in responses has SEOs and content marketers championing high-quality, original, and in-depth content with renewed fervor. If you want AI to identify your content as the best source with which to answer a user’s query, logically, it must be among the best online content available on the topic.

While that’s all great in theory, I’m sure many of you reading this have experienced that crushing disappointment on publishing, only for it to sink like a stone with barely a ripple. Somehow, your magnum opus languishes on page 4 of the relevant search results, outranked by content that, in your humble opinion, isn’t that remarkable.

Can we really call something high quality if it doesn’t achieve the strategic outcome that led us to create it?

Even when our content succeeds, there’s still the nagging worry that we might perhaps be investing too much time and money trying to achieve content perfection. Did that white paper really need to be 10 pages? Or would a simpler, five-page version have done just as well?

Might it be possible to achieve the same results with a little less quality? How do we find the sweet spot? In short, what’s the minimal viable product?

I’m not going to pretend to have the answer. And that’s because the question isn’t clear on what we mean by quality content.

A Question Of Quality

I’m as guilty as anyone of writing about the need for high-quality content as if it’s obvious what it is and how to achieve it without any further explanation. It’s a form of industry shorthand that has become increasingly meaningless through overuse.

Ask 10 CMOs, SEOs, and content marketers to define what they mean by high-quality content, and you’ll probably get 15 different answers.

Is “quality” determined by thought leadership and subject matter expertise? Or can a few average thoughts be elevated to high quality with skilled writing, a strong layout, and some clever design work?

Is “depth” characterized by longer word counts and more detailed research? Or is it really about demonstrating a superior understanding of a topic by exploring more nuanced or highfalutin’ ideas? Never mind the graphs, can you somehow weave in some Ancient Greek philosophy to get the point across?

And how much originality adds up to “original”? If you reference someone else’s work, are you somehow detracting from your own originality score?

While I can’t confidently give you a single, unambiguous definition of what high quality is, I can tell you what it isn’t: While it may be important, high-quality content is no silver bullet.

Just because your content is meticulously researched and extremely well executed doesn’t mean it’s somehow entitled to high rankings.

Does Original Content Actually Perform Better?

I tasked my team with conducting some qualitative research to answer the question: Does original content perform better than repurposed, unoriginal content, in both traditional search and AI-generated responses?

Of course, the internet is a big place (who knew?). So, for the purposes of this study, we restricted the definition of “search” to Google’s search results and to citations within AI platforms Gemini, ChatGPT, and Perplexity.

Similarly, because you’ve got to compare apples with apples, the team focused on popular search queries in the B2B SaaS and professional services space; mid-funnel, informational queries like “marketing automation tools” and “email deliverability tools.”

The team then identified and analyzed the top-ranking URLs for each query before assigning each one a score from 0 to 3 in five different categories.

  • Primary contribution.
  • Structural novelty.
  • Interpretive depth.
  • Source dependence.
  • Contextual insight.

With a maximum total score of 15, each page was then classified as follows:

  • 12-15: Group A (Original).
  • 7-11: Borderline (Excluded).
  • 0-6: Group B (Repurposed).

When the data came back, it appeared at first glance that URLs with higher originality scores (Group A) do tend to rank more consistently in Google and appear more frequently in AI responses than repurposed or derivative content (Group B).

However, before all the content marketers scream “I told you so” at anyone in earshot, you might want to read this next bit first.

Data analysts are notoriously skeptical of knee-jerk first glance conclusions (again, who knew?). The team crunched the data further, using data sciency techniques involving far more Greek letters than I’m used to seeing. They concluded that, while the correlation exists, it’s weak. Strong performance in one part of the dataset doesn’t reliably predict strong performance elsewhere in the dataset. The relationship simply isn’t consistent enough to say with any confidence that highly original content performs better every time.

Even so, while the correlation may be weak, it doesn’t appear to be entirely random. Looking at the overall averages, stripped of extreme cases that might skew the results, we did detect a pattern.

For example, original content appeared to perform better in relation to queries requiring interpretation or judgment, such as “benefits of marketing automation” or “email marketing best practices.” But that relationship virtually disappeared for more straightforward requests for information like “what is marketing automation.”

This makes sense. When the answer is factual, being original matters less than being accurate. When the answer requires perspective or judgment, originality becomes more valuable.

So, where does that leave us? We can’t confidently prove that original content always outperforms repurposed content. On the other hand, we can rule out the idea that originality has no impact at all. Therefore, what we can say is that original insight helps in some contexts, for some query types. It just isn’t a guaranteed lever you can pull for predictable results.

When Mediocre Content Has The Edge

Back in the 2010s, the API industry was booming. And that meant lots of content being published on every aspect of how APIs function. At the very least, a software company would need to publish detailed documentation for each of its APIs, from technical specifications and structures to implementation guides and walkthroughs.

This created a problem for one of our clients, a small startup of 10 people: How could they compete for visibility in search, let alone attract positive attention, when the entire conversation around APIs appeared to be dominated by industry giants? The competitors already had massive online footprints, larger content budgets, established domain authority, and significantly more comprehensive resources. How could we ever outrank them?

Conventional wisdom might have seen us attempt to fight quantity with quality by creating the best possible online resource on the topic of APIs. If we could publish content that goes far deeper and offers more value than the competition, we might gradually earn trust and authority through original, detailed research and thought leadership.

With enough budget and a long-term commitment, you could definitely build a strategy around such an approach. Except, of course, we would have needed both quality and quantity to have any chance of overtaking their competitors.

Trying to compete for visibility in every relevant subtopic and keyword on the subject of APIs would mean fighting on way too many fronts at once. How could we find an original angle on a topic that’s already well served online? How could we talk about APIs in a way that would differentiate their software from everyone else’s?

Short answer: We couldn’t. So, we flipped the problem. What if, instead of being last to join the race for the most relevant keywords today, we could be first out of the blocks in the race for whichever keyword might become relevant tomorrow?

I sent out a survey to the relevant audience, asking a bunch of typical users what search terms they would use in certain scenarios. The results revealed a plethora of short- and long-term keywords, but when we looked for any common themes, two words stood out. One was “API,” naturally. The other was “design.”

“API design” hadn’t cropped up in our initial keyword research as a potential opportunity. But as the search volume for “API design” was practically zero, that’s hardly surprising. Yet we now had clear evidence that, as the industry matured, so too would the search terms people used.

And because very few currently search for “API design,” none of the competitors appeared to be targeting the keyword or publishing content on the topic at all.

This was our window of opportunity. Never mind original content: We had an original keyword, an entire topic niche, to ourselves.

However, we also knew the value of that keyword would evaporate overnight if one or more competitors got there before us.

Forget spending six months developing an award-winning whitepaper series. We didn’t need perfection – with all the time, expense, and effort that entails – because we were staring at the SEO equivalent of an open goal.

In just a few days, we threw together a simple landing page focused on API design. It wasn’t exceptional. At only about 1,500 words, it wasn’t comprehensive. As content goes, it was pretty mediocre. But that’s all it took.

About 12 months later, just as predicted, the search volume materialized. Our single modest page continued to outrank every major competitor, even when they started chasing that new search volume with their own landing pages and content hubs.

Within two years, the keyword “API design” was worth approximately £200 per click. But our client didn’t need to pay for clicks. In effect, we won the space before anyone else even realized there was a space worth winning.

Perfection Is The Enemy Of Good

Striving to achieve the best possible iteration of your content, endlessly refining and polishing and second-guessing every detail, can get in the way of just getting it out there. Sometimes, good enough really is good enough.

I’m not arguing that we should stop striving for excellence in our content. As I hope our little study demonstrated, there are situations where well-researched, original content can give you an advantage. And, of course, success doesn’t end with rankings, citations, and clicks. Once they land on your content, you still want visitors to be wowed, persuaded, and motivated into action.

But like so many things in life, success depends on timing at least as much as it does on quality or originality. In a way, that’s what originality is all about; not necessarily being best but being first.

The API design landing page didn’t succeed because it was mediocre. It succeeded because they got there first. Quality mattered, but not in the way most content strategies define it.

This matters even more in AI search. LLMs can curate ideas and summarize information, but they can’t have original thoughts, provide firsthand experiences, or offer up fresh perspectives (as of now). While there are no guarantees, as our limited research shows, in AI at least, being the original source has influence.

Start asking what your content can say that hasn’t already been said, and then say it before someone else does.

More Resources:


Featured Image: ImageFlow/Shutterstock

The forgotten funnel: how brands can nurture post-conversion

Most SEO strategies are built with one goal: getting people through the door. That usually means driving traffic to the website, ranking for high-volume keywords, and bringing in new users. But what happens after someone signs up or makes a purchase? That part of the funnel often gets ignored. SEO doesn’t stop at acquisition. It can and should be used to support retention, improve onboarding or post-purchase experience, and make your product or offering easier to understand. So let’s break down the opportunity in post-conversion content, why it matters for SEO, and how to identify and optimize it effectively.

Table of contents

Key takeaways

  • A lot of SEO strategies overlook post-conversion content, even though this type of content is great for an improved user experience.
  • Post-conversion content can include help docs, knowledge bases or product guides serving as long-tail SEO assets.
  • Engaged users generate positive signals, aiding in SEO through branded searches and reduced churn.
  • Identify post-conversion content by analyzing support tickets, customer interactions, and internal search queries.
  • Creating valuable guides and linking related content boosts retention and makes SEO efforts more effective.

Most brands stop too early

SEO strategies (understandably) love to focus on the top of the funnel: traffic, rankings, and new users. However, conversion isn’t the finish line. After someone signs up or makes a purchase, they’re still searching. They’re still learning, and they’re still deciding if they want to stick with you.  

This is where SEO can step in to support:  

  • Onboarding flows or post-purchase journeys  
  • Help docs
  • Community content
  • Knowledge bases

All of these are searchable, indexable, and incredibly useful. Not just for users, but for long-term organic growth.

The opportunity in post-purchase content 

Once someone starts using your product or receives their purchase, they often turn to Google (or your internal search) for answers about setup, usage, sizing, care, troubleshooting, or returns, depending on your business and industry. This is where content such as help centers, knowledge bases, product explainers, FAQs, or how-to guides comes into play. If they’re structured well, optimized for real user queries, and regularly updated, they become long-tail SEO machines.  

Another overlooked asset is community forums or customer reviews/Q&A sections. Real user questions and real answers lead to long-tail keywords and user-generated content that basically maintains itself.  

SEO benefits of retaining users and reducing churn

Retention isn’t just a product or support goal, but an SEO goal too. Engaged users generate more branded searches, click through internal content more often, share links, leave reviews, and make repeat purchases, creating positive engagement signals.

Reducing churn means people stay in your ecosystem longer, giving your website content more opportunities to show up, get linked, and build authority.

How to identify high-value post-conversion content 

This part isn’t guesswork; you already have the answers. The key is to tap into the real questions and friction points your users experience after they convert. Here’s how to do it: 

1. Support tickets

Look at the most common questions that indicate that something is not working or that users don’t understand something. If the same issue keeps popping up, that’s a signal you need better documentation or that your current documentation is not easy to find.  

How to use it
Turn top support issues into searchable help documents, step-by-step tutorials, or even short videos embedded in your knowledge base or product pages.  

2. Customer interactions

Your customer-facing teams hear things you won’t get from tickets. They will understand why certain products, features, or steps in the buying journey cause confusion. 

How to use it:  
Create content that supports onboarding or post-purchase usage, expands on underused products, features, or clarifies key steps in getting value from what was purchased. Pull direct language from how customers describe problems and try to use it to your advantage. They’ll likely use the same language to search for a solution.  

3. Internal search queries

Pro-tip: If you have a WordPress website, you can read our guide on how to optimize your internal search.

Your internal site or knowledge base search is one of the best indicators of intent. What users search for after logging in or visiting your site tells you exactly what they are struggling with.  

How to use it:  
Identify top queries that return poor results or no results. Create or improve content that answers those questions. Optimize titles, headers, and metadata so the right article appears first. 

4. Feature usage or product engagement data

Low usage doesn’t always mean low interest; it might mean unclear setup, poor discoverability, or hidden value.  

How to use it:  
Look at features or products with low adoption but high impact. Interview users who use them and reverse-engineer what made it work for them. Then build content that guides others to the same outcome.  

Types of high-value content to create

  • Feature walkthroughs or product usage guides: clear, step-by-step guides and how-tos with screenshots or GIFs.
  • Setup checklists: especially for more complex products
  • Integration or compatibility guides
  • Advanced use case tutorials
  • Other explainers and tactful guides for common mistakes

These pieces not only improve user experience but also target long-tail search queries, reduce support load, and strengthen retention. 

Below are examples of great post-conversion content:

Microsoft combines training hubs, such as the Educator Center, with help content and community resources to support users throughout their post-purchase journey.
An image of 3 articles from Nike's product care content section
This example comes from Nike’s website, which mainly focuses on product care and styling tips to help customers use and maintain their products.

Internal linking strategies that keep users engaged 

Post-conversion content shouldn’t live in isolation. It should be linked, surfaced, and reused across your entire ecosystem.  

Ways to keep users moving:  

  • Link between related help documents 
  • Add “next steps” CTAs to knowledge base articles 
  • Include product education content in lifecycle emails
  • Use breadcrumbs, related content widgets and in-context links

Done right, this turns your post-conversion content into an internal SEO web that improves engagement and makes users more confident in using your products.  

Why supporting existing users is good SEO and good business 

If your SEO strategy only focuses on acquisition, you’re leaving money (and traffic) on the table. Post-conversion content helps users get more value from your products, reduces friction, and builds long-term loyalty, all while creating indexable, intent-driven pages that search engines can surface at key moments.  

Want to take action? Start by auditing your post-conversion content. Map out the key moments after signup or purchase, and ensure users receive support at each step. Surface help docs, feature guides, and tutorials where they are needed most and connect them with clear, intentional internal links.  

SEO isn’t just about discovery. It’s about usability. It’s about confidence. It’s about making sure your users stay, not just show up. If you want to build long-term, defensible growth, that’s where you should be focusing. 

WooCommerce Stores Can Now Sell Products Via YouTube Videos via @sejournal, @martinibuster

Google and WooCommerce announced today that the Google for WooCommerce extension now enables merchants to sell products directly through YouTube. The update connects WooCommerce stores to YouTube channels enabling them to tap into 2.7 billion shoppers.

Merchants can tag products in videos and Shorts, where they appear as shoppable cards during playback and in a dedicated shopping tab on the channel.

  • The cards are pulled from the merchant’s existing product catalog
  • They stay synced automatically through Google Merchant Center
  • The same data is reused across YouTube, Shopping, and ads

Connect WooCommerce Stores To YouTube Shoppers

WooCommerce is an open source eCommerce platform built on WordPress that helps merchants manage products, payments, and orders. Google supports online selling through tools such as Merchant Center and Google Ads, which make product data available across search results, shopping listings, and ads. The Google for WooCommerce extension connects these systems so merchants can manage product data in one place and use it across Google channels.

The update adds YouTube Shopping as a direct sales channel for WooCommerce stores. Merchants can link their store to a YouTube channel and tag products from their catalog in videos and Shorts. Tagged products appear as clickable items while the video plays and remain visible in a shopping tab on the channel.

A product feed syncs automatically with Google Merchant Center, including titles, descriptions, prices, and inventory levels. This same data feeds Google Shopping listings and ad campaigns, so merchants do not need to update each channel separately and can keep product information consistent across search, ads, and video.

Performance Max campaigns use this same Merchant Center feed to generate ads in formats such as video thumbnails, display ads, and text headlines. Google runs experiments in real time and adjusts spend based on conversion trends, while merchants set budgets and return-on-ad-spend goals. While YouTube Shopping enables product tagging within videos, Performance Max handles automated ad creative that can run across YouTube and other Google channels using the same underlying data.

The extension also supports Performance Max campaigns for businesses that sell services, such as bookings or appointments, which do not require a product catalog. These campaigns focus on actions like form submissions, phone calls, or scheduling, expanding the tool beyond physical product sales.

Takeaways

YouTube now serves two roles for WooCommerce merchants:

  1. A place where products are discovered:
    YouTube is the world’s second-largest search engine and the largest platform for researching products via video. It enables merchants to reach an audience of 2.7 billion shoppers.
  2. And a place where those products can be purchased immediately:
    YouTube Shopping is now a direct sales channel for WooCommerce stores. Merchants can tag products in videos and Shorts so they appear as shoppable cards while viewers are watching.

For merchants, this means they can create videos about their products that can directly lead to sales. In terms of SEO, videos are content that can rank across multiple search surfaces, and now they can lead to sales too.

Featured Image by Shutterstock/So happy 59

The Ghost Citation Problem via @sejournal, @Kevin_Indig

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

When an AI answers a question using your content, it usually cites you with a source link. What it doesn’t do, 62% of the time, is say your name. The link is there. The brand mention is not. This is what I like to call a ghost citation: the AI using your content doesn’t mention you in the answer.

This week, I’m sharing:

  • Why being cited and being mentioned are two different outcomes that require different strategies.
  • Which LLMs name brands vs. which treat them as anonymous source material.
  • The query format and content type that produce 30x more brand mentions.

A note from Kevin: I’m a big fan of HubSpot’s Marketing Against the Grain. I had Kieran, one of the co-hosts, on my Tech Bound podcast back in 2023. Now, they launched a newsletter with smart experiments, fresh perspectives, and practical lessons on what’s working right now. So, I thought I would give a friendly shoutout: Check it out.

This analysis draws on 3,981 domains across 115 prompts, 14 countries, and four AI search engines (ChatGPT, Google AI Overviews, Gemini, AI Mode), using data from the Semrush AI Toolkit. Every appearance is tagged as “cited” (source link present) and/or “mentioned” (brand name appears in the answer text). The gap between those two states is the ghost citation problem.

1. 62% Of Your Brand’s LLM Citations Are Functionally Invisible

Most brands assume being cited means being seen. The data says otherwise.

Image Credit: Kevin Indig

74.9% of domains were cited, and 38.3% mentioned. 61.7% of citations are ghost citations: the domain gets a source link but zero name recognition in the answer text.

Only 13.2% of appearances convert into both a citation and a mention. Not a single domain was cited, but not mentioned at all, or vice versa.

2. Every LLM Shows A Different Behavior

The four AI engines treat citations and mentions in fundamentally different ways:

  • Gemini names brands in 83.7% of appearances, but only generates a citation link 21.4% of the time. It operates more like a conversationalist drawing on brand knowledge.
  • ChatGPT is the opposite: It cites 87.0% of the time but mentions brands in only 20.7% of answers, functioning more like an academic paper with footnotes.
  • Google AI Overviews (AIOs) sit in the middle but lean toward citation.
  • Google’s AI Mode offers about 17% more brand mentions than ChatGPT in its outputs, but also functions closer to an academic paper than its Gemini sibling.

For brands, this means Gemini visibility and ChatGPT visibility are not the same thing. (This data set showed clear evidence that there wasn’t much overlap with ChatGPT citations/mentions and Gemini citation/mentions for the same prompts.) Optimizing for one does not help with the other. There is no single “AI visibility metric.” There are at least 4 different behavioral systems running in parallel.

Image Credit: Kevin Indig

3. Strong Brands Get Named In The Text

A clear pattern emerges among domains appearing three or more times: Content aggregators and academic sources are cited repeatedly but almost never mentioned.

  • Medium.com was cited 16 times for the same prompts across three different engines and named zero times.
  • Wikipedia.org was cited 27 times and mentioned in only two answers, both times for the same conversational query (“What is the most dangerous creature in the world?”).
  • Wired.com, sciencedirect.com, harvard.edu: same pattern.

Consumer brands with strong public identity get mentioned in the output at near 100%. The AI doesn’t feel the need to cite. Instead, it mentions consumer brands outright. It knows the data about the brands came from somewhere, but doesn’t feel the need to explicitly say so to users. For publishers whose value proposition is information authority, this is a structural problem.

*Mention rate above 100% means the brand is named in the answer text even when not cited as a source link – the engine references the brand by name without linking to it. For values in this data set over 100%, think about being cited 10x and mentioned 10x as = 100%. If a brand is mentioned 12x and cited 10x, that’s 120%.

Image Credit: Kevin Indig

4. LLMs Disagree On The Same Brand 22% Of The Time

454 prompt+domain combinations were tested across multiple engines. In 22% of those outputs (100 total), LLMs disagreed on whether to mention the brand:

  • Instagram.com was mentioned by ChatGPT and Gemini but only cited (not named) by Google.
  • Facebook.com was mentioned by Gemini in 3 out of 3 appearances.
  • Google AI cited Facebook 9 out of 9 times, but named it in only 1.
Image Credit: Kevin Indig

The same brand, the same query, but different engines and different outcomes. This matters for measurement: A brand can appear “visible” in one engine’s data while being completely anonymous in another. Aggregate AI visibility metrics mask this divergence.

5. In-Text Brand Mention Rates Vary By Geography

Controlling for the LLM, country-level differences in mention rates are meaningful:

  • India and Sweden show the highest mention rates (50%), suggesting more conversational or brand-forward query patterns in those markets.
  • Italy, Brazil, and the Netherlands show the lowest mention rates (18-22%), with very high citation rates (82-94%).
  • The UK and Canada are mid-range but above the global average.

*Note: the dataset uses localized prompts confirmed by Semrush, so language is not a confound.

Image Credit: Kevin Indig

Being Cited And Being Named Are Not The Same, And Require A Different Approach

From this analysis, four takeaways stood out to me the most for brands and their content strategies:

1. Being cited means an AI is drawing on your content. Being mentioned means it is naming you. We don’t yet know enough about the implications of mentions and citations, but we can say for sure that there’s a system that decides when you’re cited vs. mentioned.

2. Your strategy must be LLM-specific. A Gemini-first strategy is different from a ChatGPT-first strategy. Any AI visibility report that aggregates across LLMs is misleading.

3. Comparative content gets brands named. Informational content feeds the machine anonymously. If the goal is brand mentions, not just citations, focus your content strategy toward evaluation, comparison, and recommendation.

4. Prompt format matters. Brands should map not just which topics they want to appear in, but specifically which phrasing patterns produce mentions vs. ghost citations. Short conversational queries and long structured queries behave like different products.

Methodology

Data source: Semrush AI Toolkit: 3,981 domain appearances across 115 prompts, 14 countries, and four AI search engines (ChatGPT, Google AI Overviews, Gemini, Google).

Every row in the dataset represents a domain that appeared in an AI answer. Each appearance is tagged as “cited” (the domain appears as a source link) and/or “mentioned” (the brand name appears in the answer text). The gap between those two states is what this analysis calls a ghost citation: the AI used your content but did not say your name.


Featured Image: Roman Samborskyi/Shutterstock; Paulo Bobita/Search Engine Journal

What’s The Biggest Technical SEO Blind Spot From Over-Relying On Tools? – Ask An SEO via @sejournal, @HelenPollitt1

We are fortunate to have a wide range of SEO tools available, designed to help us understand how our websites might be crawled, indexed, used, and ranked. They often have a similar interface of bold charts, color-coded alerts, and a score that sums up the “health” of your website. For those of us high-achievers who love to be graded.

But these tools can be a curse as well as a blessing, so today’s question is a really important one:

“What’s the biggest technical SEO blind spot caused by SEOs over-relying on tools instead of raw data?”

It’s the false sense of completeness. The belief that the tool is showing you the full picture, when in reality, you’re only seeing a representative model of it.

Everything else, mis-prioritization, conflicting insights, and misguided fixes all flow from that single issue.

Why Technical SEO Tools “Feel Complete” But Aren’t

Technical SEO programs are a critical part of an SEO’s toolkit. They provide insight into how a website is functioning as well as how it may be perceived by users and search bots.

A Snippet In Time Of The State Of Your Website

With a lot of the tools currently on the market, you are presented with a snapshot of the website at the point you set the crawler or report to run. This is helpful for spot-checking issues and fixes. It can be highly beneficial in spotting technical issues that could cause problems in the future, before they have made an impact.

However, they don’t necessarily show how issues have developed over time, or what might be the root cause.

Prioritized List Of Issues

The tools often help to cut through the noise of data by providing prioritized lists of issues. They may even give you a checklist of items to address. This can be very helpful for marketers who haven’t got much experience in SEO and need a hand knowing where to start.

All of these give the illusion that the tool is showing a complete picture of how a search engine perceives your site. But it’s far from accurate.

What’s Missing From Technical SEO Tools

Every tool is constricted in some way. They apply their own crawl limits, assumptions about site structure, prioritization algorithms, and data sampling or aggregation.

Even when tools integrate with each other, they are still stitching together partial views.

By contrast, raw data shows what actually happened, not what could happen or what a tool infers.

In technical SEO, raw data can include:

Without these, you are often diagnosing a simulation of your site and not the real thing.

Joined Up Data

These tools will often only report on data from their own crawl findings. Sometimes it is possible to link tools together, so your crawler can ingest information from Google Search Console, or your keyword tracking tool uses information from Google Analytics. However, they are largely independent of each other.

This means you may well be missing critical information about your website by only looking at one of two of the tools. For a holistic understanding of a website’s potential or actual performance, multiple data sets may be needed.

For example, looking at a crawling tool will not necessarily give you clarity over how the website is currently being crawled by the search engines, just how it potentially could be crawled. For more accurate crawl data, you would need to look at the server log files.

Non-Comparable Metrics

The reverse of this issue is that using too many of these tools in parallel can lead to confusing perspectives on what is going well or not with the website. What do you do if the tools provide conflicting priorities? Or the number of issues doesn’t match up?

Looking at the data through the lens of the tool means there can be an extra layer added to the data that makes it not comparable. For example, sampling could be occurring, or a different prioritization algorithm used. This might result in two tools giving conflicting results or recommendations.

Some Tools Give Simulations Rather Than Actual Data

The other potential pitfall is that, sometimes, the data provided through these reports is simulated rather than actual data. Simulated “lab” data is not the same as actual bot or user data. This can lead to false assumptions and incorrect conclusions being drawn.

In this context, “simulated” doesn’t mean the data is fabricated. It means the tool is recreating conditions to estimate how a page might behave, rather than measuring what actually did happen.

A common example of lab vs. real data is found in speed tests. Tools like Lighthouse simulate page load performance under controlled conditions.

For example, a Lighthouse mobile test runs under throttled network conditions simulating a slow 4G connection. That lab result might show an LCP of 4.5s. But CrUX field data, reflecting real users across all their devices and connections, might show a 75th percentile LCP of 2.8s, because many of your actual visitors are on faster connections.

The lab result is helpful for debugging, but it doesn’t reflect the distribution of real user experiences in real-world scenarios.

Why This Is Important

Understanding the difference between the false sense of completeness shown through tools, and the actual experience of users and bots through raw data can be critical.

As an example, a crawler could flag 200 pages with missing meta descriptions. It suggests you address these missing meta descriptions as a matter of urgency.

Looking at server logs reveals something different. Googlebot only crawls 50 of those pages. The remaining 150 are effectively undiscovered due to poor internal linking. GSC data shows impressions are concentrated on a small subset of the URLs.

If you follow the tool, you spend time writing 200 meta descriptions.

If you follow the raw data, you fix internal linking, thereby unlocking crawlability for 150 pages that currently don’t have visibility in the search engines at all.

The Risk Of This Completeness Blind Spot

The “completeness” blind spot, caused by over-reliance on technical tools, causes a lot of knock-on effects. Through the false sense of completeness, key aspects are overlooked. As a result, time and effort are misguided.

Losing Your Industry Context

Tools often make recommendations without the context of your industry or organization. When SEOs rely too much on the tools and not the data, they may not put on this additional contextual overlay that is important for a high-performing technical SEO strategy.

Optimizing For The Tool, Not Users

When following the recommendations of a tool rather than looking at the raw data itself, there can be a tendency to optimize for the “green tick” of the tool, and not what’s best for users. For example, any tool that provides a scoring system for technical health can lead SEOs to make changes to the site purely so the score goes up, even if it is actually detrimental to users or their search visibility.

Ignoring The Best Way Forward By Following The Tool

For complex situations that take a nuanced approach, there is a risk that overly relying on tools rather than the raw data can lead to SEOs ignoring the complexity of a situation in favor of following the tools’ recommendations. Think of times when you have needed to ignore a tool’s alerts or recommendations because following them would lead to pages on your site being indexed that shouldn’t, or pages being crawlable that you would rather not be. Without the overall context of your strategy for the site, tools cannot possibly know when a “noindex” is good or bad. Therefore, they tend to report in a very black-and-white manner, which can go against what is best for your site.

Final Thought

Overall, there is a very real risk that by accessing all of your technical SEO data only through tools, you may well be nudged towards taking actions that are not beneficial for your overall SEO goals at best, or at worst, you may be doing harm to your site.

More Resources:


Featured Image: Paulo Bobita/Search Engine Journal