Why Is Organic Traffic Down? Here’s How To Segment The Data via @sejournal, @torylynne

As an SEO, there are few things that stoke panic like seeing a considerable decline in organic traffic. People are going to expect answers if they don’t already.

Getting to those answers isn’t always straightforward or simple, because SEO is neither of those things.

The success of an SEO investigation hinges on the ability to dig into the data, identify where exactly the performance decline is happening, and connect the dots to why it’s happening.

It’s a little bit like an actual investigation: Before you can catch the culprit or understand the motive, you have to gather evidence. In an SEO investigation, that’s a matter of segmenting data.

In this article, I’ll share some different ways to slice and dice performance data for valuable evidence that can help further your investigation.

Using Data To Confirm There’s An SEO Issue

Just because organic traffic is down doesn’t inherently mean that it’s an SEO problem.

So, before we dissect data to narrow down problem areas, the first thing we need to do is determine whether there’s actually an SEO issue at play.

After all, it could be something else altogether. In which case, we’re wasting unnecessary resources chasing a problem that doesn’t exist.

Is This A Tracking Issue?

In many cases, what looks like a big traffic drop is just an issue with tracking on the site.

To determine whether tracking is functioning correctly, there are a couple of things we need to look for in the data.

The first is consistent drops across channels.

Zoom out of organic search and see what’s happening in other sources and channels.

If you’re seeing meaningful drops across email, paid, etc., that are consistent with organic search, then it’s more than likely that tracking isn’t working correctly.

The other thing we’re looking for here is inconsistencies between internal data and Google Search Console.

Of course, there’s always a bit of inconsistency between first-party data and GSC-reported organic traffic. But if those differences are significantly more pronounced for the time period in question, that hints at a tracking problem.

Is This A Brand Issue?

Organic search traffic from Google falls into two primary camps:

  • Brand traffic: Traffic driven by user queries that include the brand name.
  • Non-brand traffic: Traffic driven by brand-agnostic user queries.

Non-brand traffic is directly affected by SEO work. Whereas, brand traffic is mostly impacted by the work that happens in other channels.

When a user includes the brand in their search, they’re already brand-aware. They’re a return user or they’ve encountered the brand through marketing efforts in channels like PR, paid social, etc.

When marketing efforts in other channels are scaled back, the brand reaches fewer users. Since fewer people see the brand, fewer people search for it.

Or, if customers sour on the brand, there are fewer people using search to come back to the site.

Either way, it’s not an SEO problem. But in order to confirm that, we need to filter the data down.

Go to Performance in Google Search Console and exclude any queries that include your brand. Then compare the data against a previous period – usually YoY if you need to account for seasonality. Do the same for queries that don’t include the brand name.

If non-brand traffic has stayed consistent, while brand traffic has dropped, then this is a brand issue.

filtering queries using regex in Google Search Console
Screenshot from Google Search Console, November 2025

Tip: Account for users misspelling your brand name by filtering queries using fragments. For example, at Gray Dot Co, we get a lot of brand searches for things like “Gray Company” and “Grey Dot Company.” By using the simple regex expression “gray|grey” I can catch brand search activity that would otherwise fall through the cracks. 

Is It Seasonal Demand?

The most obvious example of seasonal demand is holiday shopping on ecommerce sites.

Think about something like jewelry. Most people don’t buy jewelry every day; they buy it for special occasions. We can confirm that seasonality by looking at Google Trends.

Zooming out to the past five years of interest in “jewelry,” it clearly peaks in November and December.

Google Trends graph for interest in jewelry over the past five years
Screenshot from Google Trends, November 2025

As a site that sells jewelry, of course, traffic in Q1 is going to be down from Q4.

I use a pretty extreme example here to make my point, but in reality, seasonality is widespread and often more subtle. It impacts businesses where you might not expect much seasonality at all.

The best way to understand its impact is to look at organic search data year-over-year. Do the peaks and valleys follow the same patterns?

If so, then we need to compare data YoY to get a true sense of whether there’s a potential SEO problem.

Is It Industry Demand?

SEOs need to keep tabs on not just what’s happening internally, but also what’s going on externally. A big piece of that is checking the pulse of organic demand for the topics and products that are central to the brand.

Products fall out of vogue, technologies become obsolete, and consumer behavior changes – that’s just the reality of business. When there are fewer potential customers in the landscape, there are fewer clicks to win, and fewer sessions to drive.

Take cameras, for instance. As the cameras on our phones got more sophisticated, digital cameras became less popular. And as they became less popular, searches for cameras dwindled.

Now, they’re making a comeback with younger generations. More people searching, more traffic to win.

npr article headline why gen z loves the digital compact cameras that millennials used to covet
Screenshot from npr.com, November 2025

You can see all of this at play in the search landscape by turning to Google Trends. The downtrend in interest caused by advances in technology, AND the uptrend boosted by shifts in societal trends.

Google Trends graph showing search interest in cameras since 2004
Screenshot from Google Trends, November 2025

When there are drops in industry, product, or topic demand within the landscape, we need to ask ourselves whether the brand’s organic traffic loss is proportional to the overall loss in demand.

Is Paid Search Cannibalizing Organic Search?

Even if a URL on the site ranks well in organic results, ads are still higher on the SERP. So, if a site is running an ad for the same query it already ranks for, then the ad is going to get more clicks by nature.

When businesses give their PPC budgets a boost, there’s potential for this to happen across multiple, key SERPs.

Let’s say a site drives a significant chunk of its organic traffic from four or five product landing pages. If the brand introduces ads to those SERPs, clicks that used to go to the organic result start going to the ad.

That can have a significant impact on organic traffic numbers. But search users are still getting to the same URLs using the same queries.

To confirm, pull sessions by landing pages from both sources. Then, compare the data from before the paid search changes to the period following the change.

If major landing pages consistently show a positive delta that cancels out the negative delta in organic search, you’re not losing organic traffic; you’re lending it.

YoY comparison of sessions by landing page for paid search and organic search in GA4
Screenshot from Google Analytics, November 2025

Segmenting Data To Find SEO Issues

Once we have confirmation that the organic traffic declines point to an SEO issue, we can start zooming in.

Segmenting data in different ways helps pinpoint problem areas and find patterns. Only then can we trace those issues to the cause and craft a strategy for recovery.

URL

Most SEOs are going to filter their organic traffic down by URL. It lets us see which pages are struggling and analyze those pages for potential improvements.

It also helps find patterns across pages that make it easier to isolate the cause of more widespread issues. For example, if the site is losing traffic across its product listing pages, it could signal that there’s a problem with the template for that page.

But segmenting by URL also helps us answer a very important question when we pair it with conversion data.

Do We Really Care About This Traffic?

Clicks are only helpful if they help drive business-valuable user interactions like conversions or ad views. For some sites, like online publications, traffic is valuable in and of itself because users coming to the site are going to see ads. The site still makes money.

But for brands looking to drive conversions, it could just be empty traffic if it’s not helping drive that primary key performance indicator (KPI).

A top-of-funnel blog post might drive a lot of traffic because it ranks for very high-volume keywords. If that same blog post is a top traffic-driving organic landing page, a slip in rankings means a considerable organic traffic drop.

But the users entering those high-volume keywords might not be very qualified potential customers.

Looking at conversions by landing page can help brands understand whether the traffic loss is ultimately hurting the bottom line.

The best way to understand is to turn to attribution.

First-touch attribution quantifies an organic landing page’s value in terms of the conversions it helps drive down the line. For most businesses, someone isn’t likely to convert the first time they visit the site. They usually come back and purchase.

Whereas, last-touch attribution shows the organic landing pages that people come to when they’re ready to make a purchase. Both are valuable!

Query

Filtering performance by query can help understand which terms or topic areas to focus improvements on. That’s not new news.

Sometimes, it’s as easy as doing a period-over-period comparison in GSC, ordering by clicks lost, and looking for obvious patterns, i.e., are the queries with the most decline just subtle variants of one another?

If there aren’t obvious patterns and the queries in decline are more widespread, that’s where topic clustering can come into the mix.

Topic Clustering With AI

Using AI for topic clustering helps quickly identify any potential relationships between queries that are seeing performance dips.

Go to GSC and filter performance by query, looking for any YoY declines in clicks and average position.

YoY comparison in Google Search Console for clicks and average position by query
Screenshot from Google Search Console, November 2025

Then export this list of queries and use your favorite ML script to group the keywords into topic clusters.

The resulting list of semantic groupings can provide an idea of topics where a site’s authority is slipping in search.

In turn, it helps narrow the area of focus for content improvements and other optimizations to potentially build authority for the topics or products in question.

Identifying User Intent

When users search using specific terms, the type of content they’re looking for – and their objective – differs based on the query. These user expectations can be broken out into four different high-level categories:

User Intent Objective
Informational

(Top of funnel)

Users are looking for answers to questions, explanations, or general knowledge about topics, products, concepts, or events.
Commercial

(Middle of funnel)

Users are interested in comparing products, reading reviews, and gathering information before making a purchase decision.
Transactional

(Bottom of funnel)

Users are looking to perform a specific action, such as making a purchase, signing up for a service, or downloading a file.
Navigational Brand-familiar users are using the search engine as a shortcut to find a specific website or webpage.

By leveraging user intent, we identify user objectives for which the site or pages on the site are falling short. It gives us a lens into performance decline, making it easier to identify possible causes from the perspective of user experience.

If the majority of queries losing clicks and positionality are informational, it could signal shortcomings in the site’s blog content. If the queries are consistently commercial, it might call for an investigation into how the site approaches product detail and/or listing pages.

GSC doesn’t provide user intent in its reporting, so this is where a third-party SEO tool can come into play. If you have position tracking set up and GSC connected, you can use the tool’s rankings report to identify queries in decline and their user intent.

If not, you can still get the data you need by using a mix of GSC and a tool like Ahrefs.

Device

This view of performance data is pretty simple, but it’s equally easy to overlook!

When the large majority of performance declines are attributed to ONLY desktop or mobile, device data helps identify potential tech or UX issues within the mobile or desktop experience.

The important thing to remember is that any declines need to be considered proportionally. Take the metrics for the site below…

YoY comparison in Google Search Console of clicks by device type
Screenshot from Google Search Console, November 2025

At first glance, the data makes it look like there might be an issue with the desktop experience. But we need to look at things in terms of percentages.

Desktop: 1 – (648/1545) x 100 = 58% decline

Mobile: 1 – (149/316) x 100 = 52% decline

While desktop shows a much larger decline in terms of click count, the percentage of decline YoY is fairly similar across both desktop and mobile. So we’re probably not looking for anything device-specific in this scenario.

Search Appearance

Rich results and SERP features are an opportunity to stand out on the SERP and drive more traffic through enhanced results. Using the search appearance filter in Google Search Console, you can see traffic from different types of rich results and SERP features:

  • Forums.
  • AMP Top Story (AMP page + Article markup).
  • Education Q&A.
  • FAQ.
  • Job Listing.
  • Job Details.
  • Merchant Listing.
  • Product Snippet.
  • Q&A.
  • Review Snippet.
  • Recipe Gallery.
  • Video.

This is the full list of possible features with rich results (courtesy of SchemaApp), though you’ll only see filters for search appearances where the domain is currently positioned.

In most cases, Google is able to generate these types of results because there is structured data on pages. The notable exceptions are Q&A, translated results, and video.

So when there are significant traffic drops coming from a specific type of search appearance, it signals that there’s potentially a problem with the structured data that enables that search feature.

YoY comparison in Google Search Console for search appearance
Screenshot from Google Search Console, November 2025

You can investigate structured data issues in the Enhancements reports in GSC. The exception is product snippets, which nest under the Shopping menu. Either way, the reports only show up in your left-hand nav if Google is aware of relevant data on the site.

For example, the product snippets report shows why some snippets are invalid, as well as ways to potentially improve valid results.

Product snippets report in Google Search Console
Screenshot from Google Search Console, November 2025

This context is valuable as you begin to investigate the technical causes of traffic drops from specific search features. In this case, it’s clear that Google is able to crawl and utilize product schema on most pages – but there are some opportunities to improve that schema with additional data.

Featured Snippets

When featured snippets originally came on the scene, it was a major change to the SERP structure that resulted in a serious hit to traditional organic results.

Today, AI Overviews are doing the same. In fact, research from Seer shows that CTR has dropped 61% for queries that now include an AI overview (21% of searches). And that impact is outsized for informational queries.

In cases where rankings have remained relatively static, but traffic is dropping, there’s good reason to investigate whether this type of SERP change is a driver of loss.

While Google Search Console doesn’t report on featured snippets (example: PAA questions) and AI Overviews, third-party tools do.

In the third-party tool Semrush, you can use the Domain Overview report to check for featured snippet availability across keywords where the site ranks.

filtering to keyword with available AI overviews in the Semrush Domain Overview report
Screenshot from Semrush, November 2025

Do the keywords where you’re losing traffic have AI overviews? If you’re not cited, it’s time to start thinking about how you’re going to win that placement.

Search Type

Search type is another way to filter GSC data, where you’re seeing traffic declines despite healthy and consistent rankings.

After all, web search is just one prong of Google Search. Think about it: How often do you use Google Image search? At least in my case, that’s fairly often.

Filter performance data by each of these search types to understand which one(s) are having the biggest impact on performance decline. Then use that insight to start connecting the dots to the cause.

filtering to Google image search performance in Google Search Console
Screenshot from Google Search Console, November 2025

Images are a great example. One simple line in the robots.txt can block Google from crawling a subfolder that hosts multitudes of images. As those images disappear from image search results, any clicks from those results disappear in tandem.

We don’t know to look for this issue until we slice the data accordingly!

Geography

If the business operates physically in specific cities and states, then it likely already has geo-specific performance tracking set up through a tool.

But domains for online-only businesses shouldn’t dismiss geographic data – even at the city/state level! Declines are still a trigger to check geo-specific performance data.

Country

Just because the brand only sells and operates in one country doesn’t mean that’s where all the domain’s traffic is coming from. Drilling down by country in GSC allows you to see whether declines are coming from the country the brand is focused on or, potentially, another country altogether.

performance by country in Google Search Console
Screenshot from Google Search Console, November 2025

If it’s another country, it’s time to decide whether that matters. If the site is a publisher, it probably cares more about that traffic than an ecommerce brand that’s more focused on purchases in its country of operation.

Localization

When tools are reporting positionality at the country level, then rankings shifts in specific markets fly under the radar. It certainly happens, and major markets can have major traffic impact!

Tools like BrightLocal, Whitespark, and Semrush let you analyze SERP rankings one level deeper than GSC, providing data down to the city.

Checking for rankings discrepancies across cities is possible by checking a small sample of keywords with the greatest declines in clicks.

If I’m an SEO at the University of Phoenix, which is an online university, I’m probably pretty excited about ranking #1 in the United States for “online business degree.”

top five serp results for online business degree in the United States
Screenshot from Semrush, November 2025

But if I drill down further, I might be a little distraught to find that the domain isn’t in the top five SERP results for users in Denver, CO…

top five serp results for online business degree in Denver, Colorado
Screenshot from Semrush, November 2025

…or Raleigh, North Carolina.

top five serp results for online business degree in Raleigh. North Carolina
Screenshot from Semrush, November 2025

Catch Issues Faster By Leveraging AI For Data Analysis

Data segmentation is an important piece of any traffic drop investigation, because humans can see patterns in data that bots don’t.

However, the opposite is true too. With anomaly detection tooling, you get the best of both worlds.

When combined with monitoring and alert notifications, anomaly detection makes it possible to find and fix issues faster. Plus, it enables you to find data patterns in any after-the-impact investigations

All of this helps ensure that your analysis is comprehensive, and might even point out gaps for further investigation.

This Colab tool from Sam Torres can help get your site set up!

Congrats, You’re Close To Closing This Case

As Sherlock Holmes would say about an investigation, “It is a capital mistake to theorize before one has data.” With the right data in hand, the culprits start to reveal themselves.

Data segmentation empowers SEOs to uncover leads that point to possible causes. By narrowing it down based on the evidence, we ensure more accuracy, less work, faster answers, and quicker recovery.

And while leadership might not love a traffic drop, they’re sure to love that.

More Resources:


Featured Image: Vanz Studio/Shutterstock

Inside ChatGPT’s Confidential Report Visibility Metrics [Part 1] via @sejournal, @VincentTerrasi

A few weeks ago, I was given access to review a confidential OpenAI partner-facing report, the kind of dataset typically made available to a small group of publishers.

For the first time, from the report, we have access to detailed visibility metrics from inside ChatGPT, the kind of data that only a select few OpenAI site partners have ever seen.

This isn’t a dramatic “leak,” but rather an unusual insight into the inner workings of the platform, which will influence the future of SEO and AI-driven publishing over the next decade.

The consequences of this dataset far outweigh any single controversy: AI visibility is skyrocketing, but AI-driven traffic is evaporating.

This is the clearest signal yet that we are leaving the era of “search engines” and entering the era of “decision engines,” where AI agents surface, interpret, and synthesize information without necessarily directing users back to the source.

This forces every publisher, SEO professional, brand, and content strategist to fundamentally reconsider what online visibility really means.

1. What The Report Data Shows: Visibility Without Traffic

The report dataset gives a large media publisher a full month of visibility. With surprising granularity, it breaks down how often a URL is displayed inside ChatGPT, where it appears inside the UI, how often users click on it, how many conversations it impacts, and the surface-level click-through rate (CTR) across different UI placements.

URL Display And User Interaction In ChaGPT

Image from author, November 2025

The dataset’s top-performing URL recorded 185,000 distinct conversation impressions, meaning it was shown in that many separate ChatGPT sessions.

Of these impressions, 3,800 were click events, yielding a conversation-level CTR of 2%. However, when counting multiple appearances within conversations, the numbers increase to 518,000 total impressions and 4,400 total clicks, reducing the overall CTR to 0.80%.

This is an impressive level of exposure. However, it is not an impressive level of traffic.

Most other URLs performed dramatically worse:

  • 0.5% CTR (considered “good” in this context).
  • 0.1% CTR (typical).
  • 0.01% CTR (common).
  • 0% CTR (extremely common, especially for niche content).

This is not a one-off anomaly; it’s consistent across the entire dataset and matches external studies, including server log analyses by independent SEOs showing sub-1% CTR from ChatGPT sources.

We have experienced this phenomenon before, but never on this scale. Google’s zero-click era was the precursor. ChatGPT is the acceleration. However, there is a crucial difference: Google’s featured snippets were designed to provide quick answers while still encouraging users to click through for more information. In contrast, ChatGPT’s responses are designed to fully satisfy the user’s intent, rendering clicks unnecessary rather than merely optional.

2. The Surface-Level Paradox: Where OpenAI Shows The Most, Users Click The Least

The report breaks down every interaction into UI “surfaces,” revealing one of the most counterintuitive dynamics in modern search behavior. The response block, where LLMs place 95%+ of their content, generates massive impression volume, often 100 times more than other surfaces. However, CTR hovers between 0.01% and 1.6%, and curiously, the lower the CTR, the better the quality of the answer.

LLM Content Placement And CTR Relationship

Image from author, November 2025

This is the new equivalent of “Position Zero,” except now it’s not just zero-click; it’s zero-intent-to-click. The psychology is different from that of Google. When ChatGPT provides a comprehensive answer, users interpret clicking as expressing doubt about the AI’s accuracy, indicating the need for further information that the AI cannot provide, or engaging in academic verification (a relatively rare occurrence). The AI has already solved its problem.

The sidebar tells a different story. This small area has far fewer impressions, but a consistently strong CTR ranging from 6% to 10% in the dataset. This is higher than Google’s organic positions 4 through 10. Users who click here are often exploring related content rather than verifying the main answer. The sidebar represents discovery mode rather than verification mode. Users trust the main answer, but are curious about related information.

Citations at the bottom of responses exhibit similar behavior, achieving a CTR of between 6% and 11% when they appear. However, they are only displayed when ChatGPT explicitly cites sources. These attract academically minded users and fact-checkers. Interestingly, the presence of citations does not increase the CTR of the main answer; it may actually decrease it by providing verification without requiring a click.

Search results are rarely triggered and usually only appear when ChatGPT determines that real-time data is needed. They occasionally show CTR spikes of 2.5% to 4%. However, the sample size is currently too small to be significant for most publishers, although these clicks represent the highest intent when they occur.

The paradox is clear: The more frequently OpenAI displays your content, the fewer clicks it generates. The less frequently it displays your content, the higher the CTR. This overturns 25 years of SEO logic. In traditional search, high visibility correlates with high traffic. In AI-native search, however, high visibility often correlates with information extraction rather than user referral.

“ChatGPT’s ‘main answer’ is a visibility engine, not a traffic engine.”

3. Why CTR Is Collapsing: ChatGPT Is An Endpoint, Not A Gateway

The comments and reactions on LinkedIn threads analyzing this data were strikingly consistent and insightful. Users don’t click because ChatGPT solves their problem for them. Unlike Google, where the answer is a link, ChatGPT provides the answer directly.

This means:

  • Satisfied users don’t click (they got what they needed).
  • Curious users sometimes click (they want to explore deeper).
  • Skeptical users rarely click (they either trust the AI or distrust the entire process).
  • Very few users feel the need to leave the interface.

As one senior SEO commented:

“Traffic stopped being the metric to optimize for. We’re now optimizing for trust transfer.”

Another analyst wrote:

“If ChatGPT cites my brand as the authority, I’ve already won the user’s trust before they even visit my site. The click is just a formality.”

This represents a fundamental shift in how humans consume information. In the pre-AI era, the pattern was: “I need to find the answer” → click → read → evaluate → decide. In the AI era, however, it has become: “I need an answer” → “receive” → “trust” → “act”, with no click required. AI becomes the trusted intermediary. The source becomes the silent authority.

Shift In Information Consumption

Image from author, November 2025

This marks the beginning of what some are calling “Inception SEO”: optimizing for the answer itself, rather than for click-throughs. The goal is no longer to be findable. The goal is to be the source that the AI trusts and quotes.

4. Authority Over Keywords: The New Logic Of AI Retrieval

Traditional SEO relies on indexation and keyword matching. LLMs, however, operate on entirely different principles. They rely on internal model knowledge wherever possible, drawing on trained data acquired through crawls, licensed content, and partnerships. They only fetch external data when the model determines that its internal knowledge is insufficient, outdated, or unverified.

When selecting sources, LLMs prioritize domain authority and trust signals, content clarity and structure, entity recognition and knowledge graph alignment, historical accuracy and factual consistency, and recency for time-sensitive queries. They then decide whether to cite at all based on query type and confidence level.

This leads to a profound shift:

  • Entity strength becomes more important than keyword coverage.
  • Brand authority outweighs traditional link building.
  • Consistency and structured content matter more than content volume
  • Model trust becomes the single most important ranking factor
  • Factual accuracy over long periods builds cumulative advantage

“You’re no longer competing in an index. You’re competing in the model’s confidence graph.”

This has radical implications. The old SEO logic was “Rank for 1,000 keywords → Get traffic from 1,000 search queries.” The new AI logic is “Become the authoritative entity for 10 topics → Become the default source for 10,000 AI-generated answers.”

In this new landscape, a single, highly authoritative domain has the potential to dominate AI citations across an entire topic cluster. “Long-tail SEO” may become less relevant as AI synthesizes answers rather than matching specific keywords. Topic authority becomes more valuable than keyword authority. Being cited once by ChatGPT can influence millions of downstream answers.

5. The New KPIs: “Share Of Model” And In-Answer Influence

As CTR is declining, brands must embrace metrics that reflect AI-native visibility. The first of these is “share of model presence,” which is how often your brand, entity, or URLs appear in AI-generated answers, regardless of whether they are clicked on or not. This is analogous to “share of voice” in traditional advertising, but instead of measuring presence in paid media, it measures presence in the AI’s reasoning process.

LLM Decision Hierarchy

Image from author, November 2025

How to measure:

  • Track branded mentions in AI responses across major platforms (ChatGPT, Claude, Perplexity, Google AI Overviews).
  • Monitor entity recognition in AI-generated content.
  • Analyze citation frequency in AI responses for your topic area.

LLMs are increasingly producing authoritative statements, such as “According to Publisher X…,” “Experts at Brand Y recommend…,” and “As noted by Industry Leader Z…”

This is the new “brand recall,” except it happens at machine speed and on a massive scale, influencing millions of users without them ever visiting your website. Being directly recommended by an AI is more powerful than ranking No. 1 on Google, as the AI’s endorsement carries algorithmic authority. Users don’t see competing sources; the recommendation is contextualized within their specific query, and it occurs at the exact moment of decision-making.

Then, there’s contextual presence: being part of the reasoning chain even when not explicitly cited. This is the “dark matter” of AI visibility. Your content may inform the AI’s answer without being directly attributed, yet still shape how millions of users understand a topic. When a user asks about the best practices for managing a remote team, for example, the AI might synthesize insights from 50 sources, but only cite three of them explicitly. However, the other 47 sources still influenced the reasoning process. Your authority on this topic has now shaped the answer that millions of users will see.

High-intent queries are another crucial metric. Narrow, bottom-of-funnel prompts still convert, showing a click-through rate (CTR) of between 2.6% and 4%. Such queries usually involve product comparisons, specific instructions requiring visual aids, recent news or events, technical or regulatory specifications requiring primary sources, or academic research requiring citation verification. The strategic implication is clear: Don’t abandon click optimization entirely. Instead, identify the 10-20% of queries where clicks still matter and optimize aggressively for those.

Finally, LLMs judge authority based on what might be called “surrounding ecosystem presence” and cross-platform consistency. This means internal consistency across all your pages; schema and structured data that machines can easily parse; knowledge graph alignment through presence in Wikidata, Wikipedia, and industry databases; cross-domain entity coherence, where authoritative third parties reference you consistently; and temporal consistency, where your authority persists over time.

This holistic entity SEO approach optimizes your entire digital presence as a coherent, trustworthy entity, not individual pages. Traditional SEO metrics cannot capture this shift. Publishers will require new dashboards to track AI citations and mentions, new tools to measure “model share” across LLM platforms, new attribution methodologies in a post-click world, and new frameworks to measure influence without direct traffic.

6. Why We Need An “AI Search Console”

Many SEOs immediately saw the same thing in the dataset:

“This looks like the early blueprint for an OpenAI Search Console.”

Right now, publishers cannot:

  • See how many impressions they receive in ChatGPT.
  • Measure their inclusion rate across different query types.
  • Understand how often their brand is cited vs. merely referenced.
  • Identify which UI surfaces they appear in most frequently.
  • Correlate ChatGPT visibility with downstream revenue or brand metrics.
  • Track entity-level impact across the knowledge graph.
  • Measure how often LLMs fetch real-time data from them.
  • Understand why they were selected (or not selected) for specific queries.
  • Compare their visibility to competitors.

Google had “Not Provided,” hiding keyword data. AI platforms may give us “Not Even Observable,” hiding the entire decision-making process. This creates several problems. For publishers, it’s impossible to optimize what you can’t measure; there’s no accountability for AI platforms, and asymmetric information advantages emerge. For the ecosystem, it reduces innovation in content strategy, concentrates power in AI platform providers, and makes it harder to identify and correct AI bias or errors.

Based on this leaked dataset and industry needs, an ideal “AI Search Console” would provide core metrics like impression volume by URL, entity, and topic, surface-level breakdowns, click-through rates, and engagement metrics, conversation-level analytics showing unique sessions, and time-series data showing trends. It would show attribution and sourcing details: how often you’re explicitly cited versus implicitly used, which competitors appear alongside you, query categories where you’re most visible, and confidence scores indicating how much the AI trusts your content.

Diagnostic tools would explain why specific URLs were selected or rejected, what content quality signals the AI detected, your entity recognition status, knowledge graph connectivity, and structured data validation. Optimization recommendations would identify gaps in your entity footprint, content areas where authority is weak, opportunities to improve AI visibility, and competitive intelligence.

OpenAI and other AI platforms will eventually need to provide this data for several reasons. Regulatory pressure from the EU AI Act and similar regulations may require algorithmic transparency. Media partnerships will demand visibility metrics as part of licensing deals. Economic sustainability requires feedback loops for a healthy content ecosystem. And competitive advantage means the first platform to offer comprehensive analytics will attract publisher partnerships.

The dataset we’re analyzing may represent the prototype for what will eventually become standard infrastructure.

AI Search Console

Image from author, November 2025

7. Industry Impact: Media, Monetization, And Regulation

The comments raised significant concerns and opportunities for the media sector. The contrast between Google’s and OpenAI’s economic models is stark. Google contributes to media financing through neighbouring rights payments in the EU and other jurisdictions. It still sends meaningful traffic, albeit declining, and has established economic relationships with publishers. Google also participates in advertising ecosystems that fund content creation.

By contrast, OpenAI and similar AI platforms currently only pay select media partners under private agreements, send almost no traffic with a CTR of less than 1%, extract maximum value from content while providing minimal compensation, and create no advertising ecosystem for publishers.

AI Overviews already reduce organic CTR. ChatGPT takes this trend to its logical conclusion by eliminating almost all traffic. This will force a complete restructuring of business models and raise urgent questions: Should AI platforms pay neighbouring rights like search engines do? Will governments impose compensatory frameworks for content use? Will publishers negotiate direct partnerships with LLM providers? Will new licensing ecosystems emerge for training data, inference, and citation? How should content that is viewed but not clicked on be valued?

Several potential economic models are emerging. One model is citation-based compensation, where platforms pay based on how often content is cited or used. This is similar to music streaming royalties, though transparent metrics are required.

Under licensing agreements, publishers would license content directly to AI platforms, with tiered pricing based on authority and freshness. This is already happening with major outlets such as the Associated Press, Axel Springer, and the Financial Times. Hybrid attribution models would combine citation frequency, impressions, and click-throughs, weighted by query value and user intent, in order to create standardized compensation frameworks.

Regulatory mandates could see governments requiring AI platforms to share revenue with content creators, based on precedents in neighbouring rights law. This could potentially include mandatory arbitration mechanisms.

This would be the biggest shift in digital media economics since Google Ads. Platforms that solve this problem fairly will build sustainable ecosystems. Those that do not will face regulatory intervention and publisher revolts.

8. What Publishers And Brands Must Do Now

Based on the data and expert reactions, an emerging playbook is taking shape. Firstly, publishers must prioritize inclusion over clicks. The real goal is to be part of the solution, not to generate a spike in traffic. This involves creating comprehensive, authoritative content that AI can synthesize, prioritizing clarity and factual accuracy over tricks to boost engagement, structuring content so that key facts can be easily extracted, and establishing topic authority rather than chasing individual keywords.

Strengthening your entity footprint is equally critical. Every brand, author, product, and concept must be machine-readable and consistent. Publishers should ensure their entity exists on Wikidata and Wikipedia, maintain consistent NAP (name, address, phone number) details across all properties, implement comprehensive schema markup, create and maintain knowledge graph entries, build structured product catalogues, and establish clear entity relationships, linking companies to people, products, and topics.

Building trust signals for retrieval is important because LLMs prioritize high-authority, clearly structured, low-ambiguity content. These trust signals include:

  • Authorship transparency, with clear author bios, credentials, and expertise.
  • Editorial standards, covering fact-checking, corrections policies, and sourcing.
  • Domain authority, built through age, backlink profile, and industry recognition.
  • Structured data, via schema implementation and rich snippets.
  • Factual consistency, maintaining accuracy over time without contradictions.
  • Expert verification, through third-party endorsements and citations.

Publishers should not abandon click optimization entirely. Instead, they should target bottom-funnel prompts that still demonstrate a measurable click-through rate (CTR) of between 2% and 4%, since AI responses are insufficient.

Examples of high-CTR queries:

  • “How to configure [specific technical setup]” (requires visuals or code).
  • “Compare [Product A] vs [Product B] specs” (requires tables, detailed comparisons).
  • “Latest news on [breaking event]” (requires recency).
  • “Where to buy [specific product]” (transactional intent).
  • “[Company] careers” (requires job portal access).

Strategy: Identify the 10–20% of your topic space where AI cannot fully satisfy user intent, and optimize those pages for clicks.

In terms of content, it is important to lead with the most important information, use clear and definitive language, cite primary sources, avoid ambiguity and hedging unless accuracy requires it, and create content that remains accurate over long timeframes.

Perhaps the most important shift is mental: Stop thinking in terms of traffic and start thinking in terms of influence. Value has shifted from visits to the reasoning process itself. New success metrics should track how often you are cited by AI, the percentage of AI responses in your field that mention you, how your “share of model” compares with that of your competitors, whether you are building cumulative authority that persists across model updates, and whether AI recognizes you as the definitive source for your core topics.

The strategic focus shifts from “drive 1 million monthly visitors” to “influence 10 million AI-mediated decisions.”

Publishers must also diversify their revenue streams so that they are not dependent on traffic-based monetization. Alternative models include building direct relationships with audiences through email lists, newsletters, and memberships; offering premium content via paywalls, subscriptions, and exclusive access; integrating commerce through affiliate programmes, product sales, and services; forming B2B partnerships to offer white-label content, API access, and data licensing; and negotiating deals with AI platforms for direct compensation for content use.

Publishers that control the relationship with their audience rather than depending on intermediary platforms will thrive.

The Super-Predator Paradox

A fundamental truth about artificial intelligence is often overlooked: these systems do not generate content independently; they rely entirely on the accumulated work of millions of human creators, including journalism, research, technical documentation, and creative writing, which form the foundation upon which every model is built. This dependency is the reason why OpenAI has been pursuing licensing deals with major publishers so aggressively. It is not an act of corporate philanthropy, but an existential necessity. A language model that is only trained on historical data becomes increasingly disconnected from the current reality with each passing day. It is unable to detect breaking news or update its understanding through pure inference. It is also unable to invent ground truth from computational power alone.

This creates what I call the “super-predator paradox”: If OpenAI succeeds in completely disrupting traditional web traffic, causing publishers to collapse and the flow of new, high-quality content to slow to a trickle, the model’s training data will become increasingly stale. Its understanding of current events will degrade, and users will begin to notice that the responses feel outdated and disconnected from reality. In effect, the super-predator will have devoured its ecosystem and will now find itself starving in a content desert of its own creation.

The paradox is inescapable and suggests two very different possible futures. In one, OpenAI continues to treat publishers as obstacles rather than partners. This would lead to the collapse of the content ecosystem and the AI systems that depend on it. In the other, OpenAI shares value with publishers through sustainable compensation models, attribution systems, and partnerships. This would ensure that creators can continue their work. The difference between these futures is not primarily technological; the tools to build sustainable, creator-compensating AI systems largely exist today. Rather, it is a matter of strategic vision and willingness to recognize that, if artificial intelligence is to become the universal interface for human knowledge, it must sustain the world from which it learns rather than cannibalize it for short-term gain. The next decade will be defined not by who builds the most powerful model, but by who builds the most sustainable one by who solves the super-predator paradox before it becomes an extinction event for both the content ecosystem and the AI systems that cannot survive without it.

Note: All data and stats cited above are from the Open AI partner report, unless otherwise indicated.

More Resources:


Featured Image: Nadya_Art/Shutterstock

Google CEO Sundar Pichai Says Information Ecosystem Is Richer Than AI via @sejournal, @martinibuster

In a recent interview with the BBC, Sundar Pichai emphasized that AI is not a standalone source of information. He affirmed that AI works together with search and that AI and Search have their uses. Pichai also said that AI is not a replacement for either search, the information ecosystem, or actual subject matter experts.

A number of tweets and articles mischaracterized Pichai’s remarks, including a BBC News social media post summarizing the interview with the line, “Don’t blindly trust what AI tells you.”

Tweet By BBC News

That phrasing misleadingly suggests that Pichai said don’t trust AI. But that’s not what Pichai meant. His full answer emphasized that AI is not a standalone source of information, that the information ecosystem is greater than that.

AI Makes Mistakes, That’s Why There’s Grounding

Sundar Pichai had just finished describing how AI will, in a few years time, usher in new opportunities and create new kinds of jobs based on what humans can do with AI. He used the example of envisioning a feature-length movie.

In response to that statement, the interviewer challenged Pichai with a question about the fallibility of AI, saying that what Pichai described is built on the assumption that AI works.

Pichai’s statement was broadly about how people will use AI in a few years time. The interviewer’s question was narrowly focused on the accuracy and truth of AI. The conversation between the interviewer and Pichai contained this dynamic, where the interviewer kept narrowing the focus to AI in isolation and Pichai kept broadening the focus to the wider information ecosystem within which AI exists.

The interviewer keeps pressing Pichai with variations of the same narrow question:

  • Is AI reliable?
  • Doesn’t AI make information less reliable?
  • Shouldn’t Google be held responsible because this model was invented there?

Pichai repeatedly answers by placing AI within a wider context:

  • AI is not the only system people use.
  • Search and other grounded sources remain essential.
  • Journalism, doctors, teachers, and other experts matter.
  • The information ecosystem is larger than AI.

The interviewer kept zooming in to look at the AI “tree,” and Pichai responded by zooming out to explain AI within the context of the information ecosystem “forest.” This is the key to understanding what Pichai means by his answers.

In response to Pichai’s statements of how AI will transform society in the coming years, the interviewer asked about the truthfulness of AI today:

“So all of the hopes, the hype, the valuations, the social benefit of this transformation you’ve just described, you’ve built on a central assumption that the technology functions, that it works.

Let me propose one simple test of Gemini, which is your booming ChatGPT kind of competitor. Is it accurate always? Does it tell the truth?”

Pichai explained that generative AI is not a source of truth, it’s simply making a statistical prediction of how to respond. In that context he said that Google Search is what grounds AI in facts and truth. Grounding is a system for anchoring generative AI with real-world facts instead of relying on its training data.

Pichai responded:

“Look, we are working hard from a scientific standpoint to ground it in real world information. And there are areas, part of what we’ve done with Gemini is we’ve brought the power of Google Search. So it uses Google Search as a tool to try and answer, to give answers more accurately. But there are moments, these AI models fundamentally have a technology by which they’re predicting what’s next, and they are prone to errors.”

Use Tools For What They’re Good At

The next part of Pichai’s answer underlines the fact that AI and Search are tools that people use for different purposes. The point he is making is that AI is not a standalone technology that has replaced Search. He said to use each tool for “what they’re good at.”

Pichai explained:

“Today, I think, we take pride in the amount of work we put in to give as accurate information as possible. But the current state-of-the-art AI technology is prone to some errors.

This is why people also use Google Search, and we have other products which are more grounded in providing accurate information, right? But the same tools are helpful if you want to creatively write something.

So you have to learn to use these tools for what they’re good at and not blindly trust everything they say.”

Not One Standalone System: The Information Ecosystem Matters

The interviewer echoed Pichai’s statement about not blindly trusting then challenged him again about reliability.

The interviewer asked:

“OK, don’t blindly trust.

But let me suggest to you that you have a special responsibility because this whole model, type of model, transformer model, the T in ChatGPT, was invented here under you. And you know that it’s a probability. And I just wonder if you accept the end result of all this fantastic investment is the information is less reliable?”

Pichai returned to his first answer, that AI is not all that there is, that AI is just one source of information from a great many sources, including from actual human experts. The interviewer was trying to pin Pichai down to talking about generative AI and Pichai was answering by saying that it’s not just AI.

Pichai explained:

“I think if you only construct systems standalone, and you only rely on that, that would be true.

Which is why I think we have to make the information ecosystem… has to be much richer than just having AI technology being the sole product in it.

…Truth matters. Journalism matters. All of the surrounding things we have today matters, right?

So if you’re a student, you’re talking to your teacher.

If as a consumer, you’re going to a doctor, you want to trust your doctor.

Yeah, all of that matters.”

Pichai’s point is that AI exists within a larger world tools, human knowledge and expertise, not as a replacement for it. His emphasis on teachers, doctors, and journalism shows that human expertise remains a high standard for truth and accuracy. Pichai declined to answer questions in a way that treated AI as the sole system for answers. Instead, he kept emphasizing that AI is only one part of where we get information.

This is why Pichai’s answer cannot be reduced to a click-baity line like “Don’t blindly trust what AI tells you, says Google’s Sundar Pichai.” The deeper message is about how he, and by extension, Google, views AI as one tool out of many.

Watch the interview at about the 10 minute mark:

Featured image: Screenshot

Complete Crawler List For AI User-Agents [Dec 2025] via @sejournal, @vahandev

AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.

On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.

User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.

Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.

The Complete Verified AI Crawler List (December 2025)

Name Purpose Crawl Rate of SEJ (pages/hour) Verified IP List Robots.txt disallow Complete User Agent
GPTBot AI training data collection for GPT models (ChatGPT, GPT-4o) 100 Official IP List User-agent: GPTBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
ChatGPT-User AI agent for real-time web browsing when users interact with ChatGPT 2400 Official IP List User-agent: ChatGPT-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
OAI-SearchBot AI search indexing for ChatGPT search features (not for training) 150 Official IP List User-agent: OAI-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
ClaudeBot AI training data collection for Claude models 500 Official IP List User-agent: ClaudeBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Claude-User AI agent for real-time web access when Claude users browse <10>

Not available User-agent: Claude-User
Disallow: /sample-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; +Claude-User@anthropic.com)
Claude-SearchBot AI search indexing for Claude search capabilities <10>

Not available User-agent: Claude-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com)
Google-CloudVertexBot AI agent for Vertex AI Agent Builder (site owners’ request only) <10>

Official IP List User-agent: Google-CloudVertexBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search)
Google-Extended Token controlling AI training usage of Googlebot-crawled content. User-agent: Google-Extended
Allow: /
Disallow: /private-folder
Gemini-Deep-Research AI research agent for Google Gemini’s Deep Research feature <10>

Official IP List User-agent: Gemini-Deep-Research
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36
Google  Gemini’s chat when a user asks to open a webpage <10>

Google
Bingbot Powers Bing Search and Bing Chat (Copilot) AI answers 1300 Official IP List User-agent: BingBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Applebot-Extended Doesn’t crawl but controls how Apple uses Applebot data. <10>

Official IP List User-agent: Applebot-Extended
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
PerplexityBot AI search indexing for Perplexity’s answer engine 150 Official IP List User-agent: PerplexityBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Perplexity-User AI agent for real-time browsing when Perplexity users request information <10>

Official IP List User-agent: Perplexity-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Meta-ExternalAgent AI training data collection for Meta’s LLMs (Llama, etc.) 1100 Not available User-agent: meta-externalagent
Allow: /
Disallow: /private-folder
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Meta-WebIndexer Used to improve Meta AI search. <10>

Not available User-agent: Meta-WebIndexer
Allow: /
Disallow: /private-folder
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Bytespider AI training data for ByteDance’s LLMs for products like TikTok <10>

Not available User-agent: Bytespider
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)
Amazonbot AI training for Alexa and other Amazon AI services 1050 Not available User-agent: Amazonbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
DuckAssistBot AI search indexing for DuckDuckGo search engine 20 Official IP List User-agent: DuckAssistBot
Allow: /
Disallow: /private-folder
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)
MistralAI-User Mistral’s real-time citation fetcher for “Le Chat” assistant <10>

Not available User-agent: MistralAI-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots)
Webz.io Data extraction and web scraping used by other AI training companies. Formerly known as Omgili. <10>

Not available User-agent: webzio
Allow: /
Disallow: /private-folder
webzio (+https://webz.io/bot.html)
Diffbot Data extraction and web scraping used by companies all over the world. <10>

Not available User-agent: Diffbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
ICC-Crawler AI and machine learning data collection <10>

Not available User-agent: ICC-Crawler
Allow: /
Disallow: /private-folder
ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)
CCBot Open-source web archive used as training data by multiple AI companies <10>

Official IP List User-agent: CCBot
Allow: /
Disallow: /private-folder
CCBot/2.0 (https://commoncrawl.org/faq/)

The user-agent strings above have all been verified against Search Engine Journal server logs.

Popular AI Agent Crawlers With Unidentifiable User Agent

We’ve found that the following didn’t identify themselves:

  • you.com.
  • ChatGPT’s agent Operator.
  • Bing’s Copilot chat.
  • Grok.
  • DeepSeek.

There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.

We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:

Screenshot by author, December 2025

What About Agentic AI Browsers?

Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.

Chatgpt's Atlas browser user agetn string from server logs records
ChatGPT’s Atlas browser user agent string from server logs records (Screenshot by author, December 2025)

This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.

How To Check What’s Crawling Your Server

Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.

If your hosting doesn’t offer this, you can get server log files (usually located  /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.

Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.

How To Verify Legitimate Vs. Fake Bots

Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:

curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)' https://example.com

Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.

Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.

Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.

For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:

The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.

However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.

Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.

Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.

We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.

More Resources:


Featured Image: BestForBest/Shutterstock

SEO Pulse: Google Updates Console, Maps & AI Mode Flow via @sejournal, @MattGSouthern

Google packed a lot into this week, with Search Console picking up AI-powered configuration, Maps loosening its real-name rule for reviews, and a new test nudging more people from AI Overviews into AI Mode.

Here’s what that means for you.

Google Search Console Tests AI-Powered Report Configuration

Google introduced an experimental AI feature in Search Console that lets you describe the report you want and have the tool build it for you.

The feature, announced in a Google blog post, lives inside the Search results Performance report. You can type something like “compare clicks from UK versus France,” and the system will set filters, comparisons, and metrics to match what it thinks you mean.

For now, the feature is limited to Search results data, while Discover, News, and video reports still work the way they always have. Google says it’s starting with “a limited set of websites” and will expand access based on feedback.

The update is about configuration, not new metrics. It can help you set up a table, but it will not change how you sort or export data, and it does not add separate reporting for AI Overviews or AI Mode.

Why SEOs Should Pay Attention

If you spend a lot of time rebuilding the same types of reports, this can save you some setup time. It’s easier to describe a comparison in one sentence than to remember which checkboxes and filters you used last month.

The tradeoff is that you still need to confirm what the AI actually did. When a view comes from a written request instead of a manual series of clicks, it’s easy for a small misinterpretation to slip through and show up in a deck or a client email.

This is not a replacement for understanding how your reports are put together. It also does nothing to answer a bigger question for SEO professionals about how much traffic is coming from Google’s AI surfaces.

What SEO Professionals Are Saying

On LinkedIn, independent SEO consultant Brodie Clark summed up the launch with:

“Whoa, Google Search Console just rolled out another gem: a new AI-powered configuration to analyse your search traffic. The new feature is designed to reduce the effort it takes for you to select, filter, and compare your data.”

He then walks through how it can apply filters, set comparisons, and pick metrics for common tasks.

Under the official Search Central post, one commenter joked about the gap between configuration and data:

“GSC: ‘Describe the dataview you want to see’ Me: ‘Show me how much traffic I receive from AI overviews and AI mode’ :’)”

The overall mood is that this is a genuine quality-of-life improvement, but many SEO professionals would still rather get first-class reporting for AI Overviews and AI Mode than another way to slice existing Search results data.

Read our full coverage: Google Adds AI-Powered Configuration To Search Console

Google Maps Reviews No Longer Require Real Names

Google Maps now lets people leave reviews under a custom display name and profile picture instead of their real Google Account name. The change rolled out globally and is documented in recent Google Maps updates.

You set this up in the Contributions section of your profile. Once you choose a display name and avatar, that identity appears on new reviews and can be applied to older ones if you edit them, while Google still ties everything back to a real account with a full activity history.

The change is more than cosmetic because review identity shapes how people interpret trust and intent when they scan a local business profile.

Why SEOs Should Pay Attention

Reviews remain one of the strongest local ranking signals, based on Whitespark’s Local Search Ranking Factors survey. When names turn into nicknames, it shifts how business owners and customers read that feedback.

For local businesses, it becomes harder to recognize reviewers at a glance, review audits feel more manual because names are less useful, and owners may feel they have less visibility into who is talking about them, even though Google still sees the underlying accounts.

If you manage local clients, you will likely spend time explaining that this doesn’t make reviews truly anonymous, and that review solicitation and response strategies still matter.

What Local SEO Professionals Are Saying

In a LinkedIn post, Darren Shaw, founder of Whitespark, tried to calm some of the panic:

“Hot take: Everyone is freaking out that anonymous Google reviews will cause a surge in fake review spam, but I don’t think so.”

He points out that anyone determined to leave fake reviews can already create throwaway accounts, and that:

“Anonymous display names ≠ anonymous accounts”

Google still sees device data, behavior patterns, and full contribution history. In his view, the bigger story is that this change lowers the barrier for honest feedback in “embarrassed consumer” categories like criminal defense, rehab, and therapy, where people do not want their real names in search results.

The comments add useful nuance. Curtis Boyd expects “an increase in both 5 star reviews for ‘embarrassed consumer industries’ and correspondingly – 1 star reviews, across all industries as google makes it easier to hide identity.”

Taken together, the thread suggests you should watch for changes in review volume and rating mix, especially in sensitive verticals, without assuming this update alone will cause a sudden spike in spam.

Read our full coverage: Google Maps Lets Users Post Reviews Using Nicknames

Google Tests Seamless AI Overviews To AI Mode Transition

Google is testing a new mobile flow that sends people straight from AI Overviews into AI Mode when they tap “Show more,” based on a post from Robby Stein, VP of Product for Google Search.

In the examples Google has shown, you see an AI Overview at the top of the results page. When you expand it, an “Ask anything” bar appears at the bottom, and typing into that bar opens AI Mode with your original query pulled into a chat thread.

The test is limited to mobile and to countries where AI Mode is already available, and Google hasn’t said how long it will run or when it might roll out more broadly.

Why SEOs Should Pay Attention

This test blurs the line between AI Overviews as a SERP feature and AI Mode as a separate product. If it sticks, someone who sees your content cited in an Overview has a clear path to keep asking follow-up questions inside AI Mode instead of scrolling down to organic results.

On mobile, where this is running first, the effect is stronger because screen space is tight. A prominent “Ask anything” bar at the bottom of the screen gives people an obvious option that doesn’t involve hunting for blue links underneath ads, shopping units, and other features.

If your pages show up in AI Overviews today, it’s worth watching mobile traffic and AI-related impressions so you have before-and-after data if this behavior expands.

What SEO Professionals Are Saying

In a widely shared LinkedIn post, Lily Ray, VP of SEO Strategy & Research at Amsive, wrote:

“Google announced today that they’ll be testing a new way for users to click directly into AI Mode via AI Overviews.”

She notes that many people will likely expect “Show more” to lead back to traditional results, not into a chat interface, and ties the test to the broader state of the results page, arguing that ads and new sponsored treatments are making it harder to find organic listings.

Ray’s most pointed line is:

“Compared to the current chaotic state of Google’s search results, AI Mode feels frictionless.”

Her view is that Google is making traditional search more cluttered while giving AI Mode a cleaner, easier experience.

Other SEO professionals in the comments give concrete examples. One notes that “the well hidden sponsored ads have gotten completely out of control lately,” describing a number one organic result that sits below “5–6 sponsored ads.” Another says they have “been working with SEO since 2007” and only recently had to pause before clicking on a result because they were not sure whether it was organic or an ad.

There’s also frustration with AI Mode’s limits. One commenter describes how the context window “just suddenly refreshes and forgets everything after about 10 prompts/turns,” which makes longer research sessions difficult even as the entry point gets smoother.

Overall, the thread reads as a warning that AI Mode may feel cleaner but also keeps people on Google, and that this test is one more step in nudging searchers toward that experience.

Read our full coverage: Google Connects AI Overviews To AI Mode On Mobile

Theme Of The Week: Google Tightens Its Grip On The Journey

All three updates are pulling in the same direction: More of the search journey happens inside Google’s own interfaces.

Search Console’s AI configuration keeps you in the Performance report longer by taking some of the work out of report setup. Maps nicknames make it easier for people to speak freely, but on a platform where Google defines how identity is presented. The AI Overviews to AI Mode test turns follow-up questions into a chat that runs on Google’s terms rather than yours.

There are real usability wins in all of this, but also fewer clear moments where a searcher is nudged off Google and onto your site.

If you want to dig deeper into this week’s stories, you can read:

And for broader context:


Featured Image: Pixel-Shot/Shutterstock

The New Structure Of AI Era SEO via @sejournal, @DuaneForrester

People keep asking me what it takes to show up in AI answers. They ask in conference hallways, in LinkedIn messages, on calls, and during workshops. The questions always sound different, but the intent is the same. People want to know how much of their existing SEO work still applies. They want to know what they need to learn next and how to avoid falling behind. Mostly, they want clarity (hence my new book!). The ground beneath this industry feels like it moved overnight, and everyone is trying to figure out if the skills they built over the last twenty years still matter.

They do. But not in the same proportions they used to. And not for the same reasons.

When I explain how GenAI systems choose content, I see the same reaction every time. First, relief that the fundamentals still matter. Then a flicker of concern when they realize how much of the work they treated as optional is now mandatory. And finally, a mix of curiosity and discomfort when they hear about the new layer of work that simply did not exist even five years ago. That last moment is where the fear of missing out turns into motivation. The learning curve is not as steep as people imagine. The only real risk is assuming future visibility will follow yesterday’s rules.

That is why this three-layer model helps. It gives structure to a messy change. It shows what carries over, what needs more focus, and what is entirely new. And it lets you make smart choices about where to spend your time next. As always, feel free to disagree with me, or support my ideas. I’m OK with either. I’m simply trying to share what I understand, and if others believe things to be different, that’s entirely OK.

This first set contains the work every experienced SEO already knows. None of it is new. What has changed is the cost of getting it wrong. LLM systems depend heavily on clear access, clear language, and stable topical relevance. If you already focus on this work, you are in a good starting position.

You already write to match user intent. That skill transfers directly into the GenAI world. The difference is that LLMs evaluate meaning, not keywords. They ask whether a chunk of content answers the user’s intent with clarity. They no longer care about keyword coverage or clever phrasing. If your content solves the problem the user brings to the model, the system trusts it. If it drifts off topic or mixes multiple ideas in the same chunk/block, it gets bypassed.

Featured snippets prepared the industry for this. You learned to lead with the answer and support it with context. LLMs treat the opening sentences of a chunk as a kind of confidence score. If the model can see the answer in the first two or three sentences, it is far more likely to use that block. If the answer is buried under a soft introduction, you lose visibility. This is not stylistic preference. It is about risk. The model wants to minimize uncertainty. Direct answers lower that uncertainty.

This is another long-standing skill that becomes more important. If the crawler cannot fetch your content cleanly, the LLM cannot rely on it. You can write brilliant content and structure it perfectly, and none of it matters if the system cannot get to it. Clean HTML, sensible page structure, reachable URLs, and a clear robots.txt file are still foundational. Now they also affect the quality of your vector index and how often your content appears in AI answers.

Updating fast-moving topics matters more today. When a model collects information, it wants the most stable and reliable view of the topic. If your content is accurate but stale, the system will often prefer a fresher chunk from a competitor. This becomes critical in categories like regulations, pricing, health, finance, and emerging technology. When the topic moves, your updates need to move with it.

This has always been at the heart of SEO. Now it becomes even more important. LLMs look for patterns of expertise. They prefer sources that have shown depth across a subject instead of one-off coverage. When the model attempts to solve a problem, it selects blocks from sources that consistently appear authoritative on that topic. This is why thin content strategies collapse in the GenAI world. You need depth, not coverage for the sake of coverage.

This second group contains tasks that existed in old SEO but were rarely done with discipline. Teams touched them lightly but did not treat them as critical. In the GenAI era, these now carry real weight. They do more than polish content. They directly affect chunk retrieval, embedding quality, and citation rates.

Scanning used to matter because people skim pages. Now chunk boundaries matter because models retrieve blocks, not pages. The ideal block is a tight 100 to 300 words that covers one idea with no drift. If you pack multiple ideas into one block, retrieval suffers. If you create long, meandering paragraphs, the embedding loses focus. The best performing chunks are compact, structured, and clear.

This used to be a style preference. You choose how to name your product or brand and try to stay consistent. In the GenAI era, entity clarity becomes a technical factor. Embedding models create numeric patterns based on how your entities appear in context. If your naming drifts, the embeddings drift. That reduces retrieval accuracy and lowers your chances of being used by the model. A stable naming pattern makes your content easier to match.

Teams used to sprinkle stats into content to seem authoritative. That is not enough anymore. LLMs need safe, specific facts they can quote without risk. They look for numbers, steps, definitions, and crisp explanations. When your content contains stable facts that are easy to lift, your chances of being cited go up. When your content is vague or opinion-heavy, you become less usable.

Links still matter, but the source of the mention matters more. LLMs weigh training data heavily. If your brand appears in places known for strong standards, the model builds trust around your entity. If you appear mainly on weak domains, that trust does not form. This is not classic link equity. This is reputation equity inside a model’s training memory.

Clear writing always helped search engines understand intent. In the GenAI era, it helps the model align your content with a user’s question. Clever marketing language makes embeddings less accurate. Simple, precise language improves retrieval consistency. Your goal is not to entertain the model. Your goal is to be unambiguous.

This final group contains work the industry never had to think about before. These tasks did not exist at scale. They are now some of the largest contributors to visibility. Most teams are not doing this work yet. This is the real gap between brands that appear in AI answers and brands that disappear.

The LLM does not rank pages. It ranks chunks. Every chunk competes with every other chunk on the same topic. If your chunk boundaries are weak or your block covers too many ideas, you lose. If the block is tight, relevant, and structured, your chances of being selected rise. This is the foundation of GenAI visibility. Retrieval determines everything that follows.

Your content eventually becomes vectors. Structure, clarity, and consistency shape how those vectors look. Clean paragraphs create clean embeddings. Mixed concepts create noisy embeddings. When your embeddings are noisy, they lose queries by a small margin and never appear. When your embeddings are clean, they align more often and rise in retrieval. This is invisible work, but it defines success in the GenAI world.

Simple formatting choices change what the model trusts. Headings, labels, definitions, steps, and examples act as retrieval cues. They help the system map your content to a user’s need. They also reduce risk, because predictable structure is easier to understand. When you supply clean signals, the model uses your content more often.

LLMs evaluate trust differently than Google or Bing. They look for author information, credentials, certifications, citations, provenance, and stable sourcing. They prefer content that reduces liability. If you give the model clear trust markers, it can use your content with confidence. If trust is weak or absent, your content becomes background noise.

Models need structure to interpret relationships between ideas. Numbered steps, definitions, transitions, and section boundaries improve retrieval and lower confusion. When your content follows predictable patterns, the system can use it more safely. This is especially important in advisory content, technical content, and any topic with legal or financial risk.

The shift to GenAI is not a reset. It is a reshaping. People are still searching for help, ideas, products, answers, and reassurance. They are just doing it through systems that evaluate content differently. You can stay visible in that world, but only if you stop expecting yesterday’s playbook to produce the same results. When you understand how retrieval works, how chunks are handled, and how meaning gets modeled, the fog lifts. The work becomes clear again.

Most teams are not there yet. They are still optimizing pages while AI systems are evaluating chunks. They are still thinking in keywords while models compare meaning. They are still polishing copy while the model scans for trust signals and structured clarity. When you understand all three layers, you stop guessing at what matters. You start shaping content the way the system actually reads it.

This is not busywork. It is strategic groundwork for the next decade of discovery. The brands that adapt early will gain an advantage that compounds over time. AI does not reward the loudest voice. It rewards the clearest one. If you build for that future now, your content will keep showing up in the places your customers look next.


My new book, “The Machine Layer: How to Stay Visible and Trusted in the Age of AI Search,” is now on sale at Amazon.com. It’s the guide I wish existed when I started noticing that the old playbook (rankings, traffic, click-through rates) was quietly becoming less predictive of actual business outcomes. The shift isn’t abstract. When AI systems decide which content gets retrieved, cited, and trusted, they’re also deciding which expertise stays visible and which fades into irrelevance. The book covers the technical architecture driving these decisions (tokenization, chunking, vector embeddings, retrieval-augmented generation) and translates it into frameworks you can actually use. It’s built for practitioners whose roles are evolving, executives trying to make sense of changing metrics, and anyone who’s felt that uncomfortable gap opening between what used to work and what works now.

The Machine Layer
Image Credit: Duane Forrester

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: Master1305/Shutterstock

How CMOs Should Prioritize SEO Budgets In 2026 Q1 And H1 via @sejournal, @TaylorDanRW

Search evolved quickly throughout 2025 as AI systems became a primary route for information discovery, which, in turn, reduced the consistency and predictability of traditional organic traffic for many brands.

As blue‑link visibility tightened and click‑through rates became more erratic, CMOs found themselves under growing pressure to justify marketing spend while still demonstrating momentum. This shift required marketing leaders to think more seriously about resilience across their owned channels. It is no longer viable to rely solely on rankings.

Brands need stable visibility across AI surfaces, stronger and more coherent content operations, and cleaner technical foundations that support both users and AI systems.

Q1 and H1 2026 are the periods in which these priorities need to be funded and executed.

Principles For 2026 SEO Budgeting In Q1/H1

A well‑structured SEO budget for early 2026 is built on a clear set of principles that guide both stability and experimentation.

Protect A Baseline Allocation For Core SEO

This includes technical health, site performance, information architecture, and the ongoing maintenance of content. These activities underpin every marketing channel, and cutting them introduces unnecessary risk at a time when discovery patterns are shifting.

Create A Separate Experimental Pot For AI Discovery

As AI Overviews and other generative engines influence how users encounter brands, it becomes important to ring‑fence investment for testing answer‑led content, entity development, evolving schema patterns, and AI measurement frameworks. Without a dedicated pot, these activities either stall or compete with essential work.

Invest In Measurement That Explains Real User Behavior

Because AI visibility remains immature and uneven, analytics must capture how users move through journeys, where AI systems mention the brand, and which content shapes those outcomes.

This level of insight strengthens the CMO’s ability to defend and adjust budgets later in the year.

Where To Put Money In Q1

Q1 is the moment to stabilize the foundation while preparing for new patterns in discovery. The work done here shapes the results achieved in H1.

Technical Foundations

Begin with site health. Improve performance, resolve crawl barriers, modernize internal linking, and strengthen information architecture. AI systems and LLMs rely heavily on clean and consistent signals, so a strong technical environment supports every subsequent content, GEO, and measurement initiative.

Entity‑Rich, Question‑Led Content

Users are now expressing broader and more layered questions, and AI engines reward content that defines concepts clearly, addresses common questions in detail, and builds meaningful topical depth. Invest in structured content programmes aligned to real customer problems and journeys, placing emphasis on clarity, usefulness, and authority rather than chasing volume for its own sake.

Early GEO Experimentation

There is considerable overlap between SEO and LLM inclusion because both rely on strong technical foundations, consistent entity signals, and helpful content that is easy for systems to interpret. LLM discovery should be seen as an extension of SEO rather than a standalone discipline, since most of the work that strengthens SEO also strengthens LLM inclusion by improving clarity, coherence, and relevance.

Certain sectors are beginning to experience new nuances. One example is Agentic Commerce Protocol (ACP), which is influencing how AI systems understand products, evaluate them, and, in some cases, transact with them.

Whether we refer to this area as GEO, AEO, or LLMO, the principle is the same – brands are now optimising for multiple platforms and an expanding set of discovery engines, each with its own interpretation of signals.

Q1 is the right time to assess how your brand appears across these systems. Review answer hubs, evaluate your entity relationships, and examine how structured signals are interpreted. This initial experimentation will inform where budget should be expanded in H1.

H1 View: Scaling What Works

H1 is when early insights from Q1 begin to mature into scalable programmes.

Rolling Winning Experiments Into BAU

When early LLM discovery or structured content initiatives show clear signs of traction, they should be incorporated into business‑as‑usual SEO. Formalizing these practices allows them to grow consistently without requiring new budget conversations every quarter.

Cutting Low‑ROI Tools And Reinvesting In People And Process

Many organizations overspend on tools that fail to deliver meaningful value.

H1 provides the opportunity to review tool usage, identify duplication, and retire underused platforms. Redirecting that spend towards people, content quality, and operational improvements generally produces far stronger outcomes. The AI race that pretty much all tool providers have entered will begin to die down, and those that drive clear value will begin to emerge from the noise.

Adjusting Budget Mix As Data Emerges

By the latter part of H1, the business should have clearer evidence of where visibility is shifting and which activities genuinely influence discovery and engagement. Budgets should then be adjusted to support what is working, maintain core SEO activity, expand successful content areas, and reduce investment in experiments that have not produced results.

CMO Questions Before Sign‑Off

As CMOs review their SEO budgets for 2026, the final stage of sign‑off should be shaped by a balanced view of both offensive and defensive tactics, ensuring the organization invests in movement as well as momentum.

Defensive tactics protect what the brand has already earned: stability in rankings, continuity of technical performance, dependable content structures, and the preservation of existing visibility across both search and AI‑driven experiences.

Offensive tactics, on the other hand, are designed to create new points of visibility, unlock new categories of demand, and strengthen the brand’s presence across emerging discovery engines.

A balanced budget needs to fund both, because without defence the brand becomes fragile, and without offence it becomes invisible.

Movement refers to the activities that help the brand adapt to evolving discovery environments. These include early LLM discovery experiments, entity expansion, and the modernization of content formats.

Momentum represents the compounding effect of sustained investment in core SEO and consistent optimization across key journeys.

CMOs should judge budgets by their ability to generate both: movement that positions the brand for the future, and momentum that sustains growth.

With that in mind, CMOs may wish to ask the following questions before approving any budget:

  • To what extent does this budget balance defensive activity, such as technical stability and content maintenance, with offensive initiatives that expand future visibility?
  • How clearly does the plan demonstrate where movement will come from in early 2026, and how momentum will be protected and strengthened throughout H1?
  • Which elements of the programme directly enhance the brand’s presence across AI surfaces, GEO, and other emerging discovery engines?
  • How effectively does the proposed content strategy support both immediate user needs and longer‑term category growth?
  • How will we track changes in brand visibility across multiple platforms, including traditional search, AI‑driven answers, and sector‑specific discovery systems?
  • What roles do teams, processes, and first‑party data play in sustaining movement and momentum, and are they funded appropriately?
  • What reporting improvements will allow the leadership team to judge the success of both defensive and offensive investments by the end of H1?

More Resources:


Featured Image: N Universe/Shutterstock

5 Reasons To Use The Internet Archive’s New WordPress Plugin via @sejournal, @martinibuster

The Internet Archive, also known as the Wayback Machine, is generally regarded as a place to view old web pages, but its value goes far beyond reviewing old pages. There are five ways that Archive.org can help a website improve their user experience and SEO. The Wayback Machine’s new WordPress plugin  makes it easy to benefit from the Internet Archive automatically.

1. Copyright, DMCA, And Business Disputes

The Internet Archive can serve as an independent timestamped record to prove ownership of content or to defend against false claims that someone else wrote the content first. The Internet Archive is an independent non-profit organization and there is no way to fake an entry, which makes it an excellent way to prove who was first to publish disputed content.

2. The Worst Case Scenario Backup

Losing the entire website content due to hardware failure, ransomware, a vulnerability, or even a datacenter fire is almost always within the realm of possibility. While it’s a best-practice to always have an up to date backup stored off the server, unforseen mistakes can happen.

The Internet Archive does not offer a way to conveniently download website content. But there are services that facilitate it. It used to be a popular technique with spammers to use these services to download the previous content from expired domains and bring them back to the web. Although I’ve not used any of these services and therefore can’t vouch for any of them, if you search around you’ll be able to find them.

3. Fix Broken Links

Sometimes a URL gets lost in a website redesign or maybe it was purposely removed but then find out later that the page is popular and people are linking to it. What do you do?

Something like this happened to me in the past where I changed domains and decided I didn’t need certain of the pages. A few years later I discovered that people were still linking to those pages because they were still useful. The Internet Archive made it easy to reproduce the old content on the new domain. It’s one way to recover the Page Rank that would otherwise have been lost.

Having old pages archived can help in reviving old pages back into the current website. But you can’t do this unless the page is archived and the new plugin makes sure that this happens for every web page.

4. Can Indicate Trustworthiness

This isn’t about search algorithms or LLMs. This is about trust with other sites and site visitors. Spammy sites tend to not be around very long. A documented history on Archive.org can be a form of proof that a site has been around for a long time. A legitimate business can point to X years of archived pages to prove that they are an established business.

5. Identify Link Rot

The Internet Archive Wayback Machine Link Fixer plugin provides an easy way to archive your web pages at Archive.org. When you publish a new page or update an older page the Wayback Machine WordPress plugin will automatically create a new archive page.

But one of the useful features of the plugin is that it automatically scans all outbound links and tests them to see if the linked pages still exist. The plugin can automatically update the link to a saved page at the Internet Archive.

The official plugin lists these features and benefits:

  • “Automatically scans for outbound links in post content
  • Checks the Wayback Machine for existing archives
  • Creates new snapshots if no archive exists
  • Redirects broken or missing links to archived versions
  • Archives your own posts on updates
  • Works on both new and existing content
  • Helps maintain long-term content reliability and SEO”

I don’t know what they mean about maintaining SEO but one benefit they don’t mention is that it keeps users happy and that’s always a plus.

Wayback Machine Is Useful For Competitor Analysis

The Internet Archive makes it so easy to see how a competitor has changed over the years. It’s also a way to catch competitors who are copying or taking “inspiration” from your content when they do their annual content refresh.

The Wayback Machine can let you see what services or products a competitor offered and how they were offered. It can also give a peek into what changed during a redesign which tells something about what their competitive priorities are.

Takeaways

  • The Internet Archive provides practical benefits for website owners beyond simply viewing old pages.
  • Archived snapshots help address business disputes, lost content, broken links, and long-term site credibility.
  • Competitor history and past site versions become easy to evaluate through Archive.org.
  • The Wayback Machine WordPress plugin automates archiving and helps manage link rot.
  • Using the Archive proactively can improve user experience and support SEO-adjacent needs, even if indirectly.

The six examples in this article show that the Internet Archive is useful for SEO, competitor research, and for improving the user experience and maintaining trust. The Internet Archive’s new WordPress plugin makes archiving and link-checking easy because it’s completely automatic. Taken together, these strengths make the Archive a useful part of keeping a website reliable, recoverable, and easier for people to use.

The Internet Archive Wayback Machine Link Fixer is a project created by Automattic and the Internet Archive, which means that it’s a high quality and trusted plugin for WordPress.

Download The Internet Archive WordPress Plugin

Check it out at the official WordPress plugin repository: Internet Archive Wayback Machine Link Fixer By Internet Archive

Featured Image by Shutterstock/Red rose 99

Google Year In Search 2025: Gemini, DeepSeek Top Trending Lists via @sejournal, @MattGSouthern

Google released its Year in Search data, revealing the queries that saw the largest spikes in search interest.

AI tools featured prominently in the global list, with Gemini ranking as the top trending search worldwide and DeepSeek also appearing in the top 10.

The annual report tracks searches with the highest sustained traffic spikes in 2025 compared to 2024, rather than total search volume.

AI Tools Lead Global Trending Searches

Gemini topped the global trending searches list, reflecting the growth of Google’s AI assistant throughout 2025.

DeepSeek, the Chinese AI company that drew attention earlier this year, appeared in both the global (#6) and US (#7) trending lists.

The global top 10 trending searches were:

  1. Gemini
  2. India vs England
  3. Charlie Kirk
  4. Club World Cup
  5. India vs Australia
  6. DeepSeek
  7. Asia Cup
  8. Iran
  9. iPhone 17
  10. Pakistan and India

US Trending Searches Show Different Priorities

The US list diverged from global trends, with Charlie Kirk leading and entertainment properties ranking high. KPop Demon Hunters claimed the second spot.

The US top 10 trending searches were:

  1. Charlie Kirk
  2. KPop Demon Hunters
  3. Labubu
  4. iPhone 17
  5. One Big Beautiful Bill Act
  6. Zohran Mamdani
  7. DeepSeek
  8. Government shutdown
  9. FIFA Club World Cup
  10. Tariffs

AI-Generated Content Leads US Trends

A dedicated “Trends” category in the US data showed AI content creation drove search interest throughout 2025.

The top US trends included:

  1. AI action figure
  2. AI Barbie
  3. Holy airball
  4. AI Ghostface
  5. AI Polaroid
  6. Chicken jockey
  7. Bacon avocado
  8. Anxiety dance
  9. Unfortunately, I do love
  10. Ghibli

The Ghibli entry likely reflects the viral AI-generated images mimicking Studio Ghibli’s animation style that circulated on social media platforms.

News & Current Events

News-related trending searches reflected the year’s developments. Globally, the top trending news searches included the LA Fires, Hurricane Melissa, TikTok ban, and the selection of a new pope.

US news trends focused on domestic policy, with the One Big Beautiful Bill Act and tariffs appearing alongside the government shutdown and Los Angeles fires.

Why This Matters

This data shows where user interest spiked throughout 2025. The presence of AI tools at the top of global trends confirms continued growth in AI-related search behavior.

The split between global and US lists also shows regional differences in trending topics. Cricket matches dominated global sports interest while US searches leaned toward entertainment and policy.

Looking Ahead

Google’s Year in Search data is available on the company’s trends site.

Comparing this year’s trending topics against your content calendar can reveal gaps in coverage or opportunities for timely updates to existing content.