Next Steps for AI Shopping

For two decades, search has driven ecommerce. Whether through Google, Amazon, or on‑site, the sequence was simple: a shopper types a query, compares results, and makes a purchase. Search engine marketing, organic and paid, enhanced performance.

That model is evolving.

Generative AI “agents” can now help shoppers compare products, prices, and options, often in a single conversational step. Shoppers initiate the interactions; agents then search relevant sites and respond based on the query, past preferences, and purchase history.

What Is Agentic Shopping?

“Agentic” refers to autonomous systems that can independently act and decide. Agentic shopping uses AI agents —  ChatGPT, Gemini, Perplexity, Claude — to guide the buying journey for a user. Think of it as a personal shopping assistant that interprets a request, searches multiple retailers, and surfaces relevant product and price options.

Retailgentic, the Substack publication of ReFiBuy, a soon-to-launch agentic tool provider, recently published its test of Comet, Perplexity’s AI-driven browser:

Shopper: This bed set is pretty expensive, but I like it. Can you find it cheaper?

Comet: I searched for a better price on the Allied Home Down Alternative XLT Dorm Kit (comforter, pillow, and mattress pad, Twin XL) and it is still listed at $84.99 at Target — the standard price for this exact bundled set. I didn’t find this specific 3‑piece kit for less elsewhere.

However, other retail sites (like Kohl’s and Macy’s) offer Twin XL bedding pieces or comforter sets individually.

Instead of hopping from site to site, the shopper gets an answer in one dialog.

Why It Matters

Shoppers are warming to AI shopping, though unevenly by age. A February 2025 New Consumer survey (PDF) of approximately 3,000 U.S. residents found that 64% of Gen Zs (ages late 20s to early 40s) and Millennials (mid-teens to late 20s) are “very” or “somewhat” comfortable interacting with an AI shopping advisor, versus 40% for Gen Xs (mid 40s to early 60s).

AI platforms are capitalizing:

  • ChatGPT now embeds Shop Pay, Shopify’s hosted checkout and payment tool. Shoppers can discover, evaluate, and purchase goods from Shopify-powered merchants without leaving the chat, turning conversational AI into a sales channel.
  • Perplexity’s agent‑led checkout, in partnership with PayPal, enables purchases, travel bookings, and event ticket sales directly in chat.
  • Structured product feeds in Perplexity can ingest clean, up‑to‑date product data, such as from beauty brand Ulta (powered by Rithum, my employer), for accurate pricing, attributes, and real‑time recommendations.

Next Steps

There’s no definitive AI playbook, but merchants can still prepare.

Audit product data

Universal standards for AI product feeds don’t (yet) exist, but you’re likely in good shape if you already maintain a product feed, such as for Google Shopping. Make sure it includes all key attributes: size, color, material, weight, and use cases.

Track AI visibility

Test how your products appear in genAI platforms. Brands and manufacturers can prompt their name to see how it surfaces. Even better, try prompts that shoppers might use. See how AI ranks or references your products compared with competitors. For example, “Find me the best backpack that fits two days of clothes and fits under an airplane seat” or “List the highest-rated cordless drills from DeWalt under $200.”

Multiple Channels

Widespread use of AI shopping is far from certain.

Adoption varies. Younger shoppers are more comfortable, older shoppers less so.

Accuracy is uneven. AI can show outdated prices, inventory, and product details, as many are scraping product data, which is prone to errors, instead of using product feeds. In ChatGPT, products unrelated to a query sometimes appear in comparison carousels.

AI shopping agents could become an important revenue channel, but they’re not a replacement for direct customer relationships, traditional search, or advertising. Make your product data AI‑ready while continuing to diversify your sales mix.

Invest in multiple channels, customer engagement, and building a brand that can thrive regardless of how shoppers discover products.

Google Backtracks On Plans For URL Shortener Service via @sejournal, @martinibuster

Google announced that they will continue to support some links created by the deprecated goo.gl URL shortening service, saying that 99% of the shortened URLs receive no traffic. They were previously going to end support entirely, but after receiving feedback, they decided to continue support for a limited group of shortened URLs.

Google URL Shortener

Google announced in 2018 that they were deprecating the Google URL Shortener, no longer accepting new URLs for shortening but continuing to support existing URLs. Seven years later, they noticed that 99% of the shortened links did not receive any traffic at all, so on July 18 of this year, Google announced they would end support for all shortened URLs by August 25, 2025.

After receiving feedback, they changed their plan on August 1 and decided that they would move ahead with ending support for URLs that do not receive traffic, but continue servicing shortened URLs that still receive traffic.

Google’s announcement explained:

“While we previously announced discontinuing support for all goo.gl URLs after August 25, 2025, we’ve adjusted our approach in order to preserve actively used links.

We understand these links are embedded in countless documents, videos, posts and more, and we appreciate the input received.

…If you get a message that states, “This link will no longer work in the near future”, the link won’t work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.

…All other goo.gl links will be preserved and will continue to function as normal.”

If you have a goog.gl redirected link, Google recommends visiting the link to check if it displays a warning message. If it does move the link to another URL shortener. If it doesn’t display the warning then the link will continue to function.

Featured Image by Shutterstock/fizkes

Google Confirms It Uses Something Similar To MUVERA via @sejournal, @martinibuster

Google’s Gary Illyes answered questions during the recent Search Central Live Deep Dive in Asia about whether or not they use the new Multi‑Vector Retrieval via Fixed‑Dimensional Encodings (MUVERA) retrieval method and also if they’re using Graph Foundation Models.

MUVERA

Google recently announced MUVERA in a blog post and a research paper: a method that improves retrieval by turning complex multi-vector search into fast single-vector search. It compresses sets of token embeddings into fixed-dimensional vectors that closely approximate their original similarity. This lets it use optimized single-vector search methods to quickly find good candidates, then re-rank them using exact multi-vector similarity. Compared to older systems like PLAID, MUVERA is faster, retrieves fewer candidates, and still improves recall, making it a practical solution for large-scale retrieval.

The key points about MUVERA are:

  • MUVERA converts multi-vector sets into fixed vectors using Fixed Dimensional Encodings (FDEs), which are single-vector representations of multi-vector sets.
  • These FDEs (Fixed Dimensional Encodings) match the original multi-vector comparisons closely enough to support accurate retrieval.
  • MUVERA retrieval uses MIPS (Maximum Inner Product Search), an established search technique used in retrieval, making it easier to deploy at scale.
  • Reranking: After using fast single-vector search (MIPS) to quickly narrow down the most likely matches, MUVERA re-ranks them using Chamfer similarity, a more detailed multi-vector comparison method. This final step restores the full accuracy of multi-vector retrieval, so you get both speed and precision.
  • MUVERA is able to find more of the precisely relevant documents with a lower processing time than the state-of-the-art retrieval baseline (PLAID) it was compared to.

Google Confirms That They Use MUVERA

José Manuel Morgal (LinkedIn profile) related his question to Google’s Gary Illyes and his response was to jokingly ask what MUVERA was and then he confirmed that they use a version of it:

This is how the question and answer was described by José:

“An article has been published in Google Research about MUVERA and there is an associated paper. Is it currently in production in Search?

His response was to ask me what MUVERA was haha and then he commented that they use something similar to MUVERA but they don’t name it like that.”

Does Google Use Graph Foundation Models (GFMs)?

Google recently published a blog announcement about an AI breakthrough called a Graph Foundation Model.

Google’s Graph Foundation Model (GFM) is a type of AI that learns from relational databases by turning them into graphs, where rows become nodes and the connections between tables become edges.

Unlike older models (machine learning models and graph neural networks (GNNs)) that only work on one dataset, GFMs can handle new databases with different structures and features without retraining on the new data. GFMs use a large AI model to learn how data points relate across tables. This lets GFMs find patterns that regular models miss, and they perform much better in tasks like detecting spam in Google’s scaled systems. GFMs are a big step forward because they bring foundation-model flexibility to complex structured data.

Graph Foundation Models represent a notable achievement because their improvements are not incremental. They are an order-of-magnitude improvement, with performance gains of 3x to 40x in average precision.

José next asked Illyes if Google uses Graph Foundation Models and Gary again jokingly feigned not knowing what José was talking about.

He related the question and answer:

“An article has been published in Google Research about Graph Foundation Models for data, this time there are not paper associated with it. Is it currently in production in Search?

His answer was the same as before, asking me what Graph Foundation Models for data was, and he thought it was not in production. He did not know because there are not associated paper and on the other hand, he commented me that he did not control what is published in Google Research blog.”

Gary expressed his opinion that Graph Foundation Model was not currently used in Search. At this point, that’s the best information we have.

Is GFM Ready For Scaled Deployment?

The official Graph Foundation Model announcement says it was tested in an internal task, spam detection in ads, which strongly suggests that real internal systems and data were used, not just academic benchmarks or simulations.

Here is what Google’s announcement relates:

“Operating at Google scale means processing graphs of billions of nodes and edges where our JAX environment and scalable TPU infrastructure particularly shines. Such data volumes are amenable for training generalist models, so we probed our GFM on several internal classification tasks like spam detection in ads, which involves dozens of large and connected relational tables. Typical tabular baselines, albeit scalable, do not consider connections between rows of different tables, and therefore miss context that might be useful for accurate predictions. Our experiments vividly demonstrate that gap.”

Takeaways

Google’s Gary Illyes confirmed that a form of MUVERA is in use at Google. His answer about GFM seemed to be expressed as an opinion, so it’s somewhat less clear, as it’s related as Gary saying that he thinks it’s not in production.

Featured Image by Shutterstock/Krakenimages.com

Merging SEO And Content Using Your Knowledge Graph to AI-Proof Content via @sejournal, @marthavanberkel

New AI platforms, powered by generative technologies like Google’s Gemini, Microsoft’s Copilot, Grok, and countless specialized chatbots, are rapidly becoming the front door for digital discovery.

We’ve entered an era of machine-led discovery, where AI systems aggregate, summarize, and contextualize content across multiple platforms.

Users today no longer follow a linear journey from keyword to website. Instead, they engage in conversations and move fluidly between channels and experiences.

These shifts are being driven by new types of digital engagement, including:

  • AI-generated overviews, such as AI Overviews in Google, that pull data from many sources.
  • Conversational search, such as ChatGPT and Gemini, where follow-up questions replace traditional browsing.
  • Social engagement, with platforms like TikTok equipped with their own generative search features, engaging entire generations in interactive journeys of discovery.

The result is a new definition of discoverability and a need to rethink how you manage your brand across these experiences.

It’s not enough to optimize your brand’s website for search engines. You must ensure your website content is machine-consumable and semantically connected to appear in AI-generated results.

This is why forward-thinking organizations are turning to schema markup (structured data) and building content knowledge graphs to manage the data layer that powers both traditional search and emerging AI platforms.

Semantic structured data transforms your content into a machine-readable network of information, enabling your brand to be recognized, connected, and potentially included in AI-driven experiences across channels.

In this article, we’ll explore how SEO and content teams can partner to build a content knowledge graph that fuels discoverability in the age of AI, and why this approach is critical for enterprise brands aiming to future-proof their digital presence.

Why Schema Markup Is Your Strategic Data Layer

You may be asking, “Schema markup – is that not just for rich results (visual changes in SERP)?”

Schema markup is no longer just a technical SEO tactic for achieving rich results; it can also be used to define the content on your website and its relationship to other entities within your brand.

When you apply markup in a connected way, AI and search can do more accurate inferencing, resulting in more accurate targeting to user queries or prompts.

In May 2025, Google and Microsoft both reiterated that the use of structured data does make your content “machine-readable” that makes you eligible for certain features. [Editor’s note: Although, Gary Illyes recently said to avoid excessive use and that Schema is not a ranking factor.]

Schema markup can be a strategic foundation for creating a data layer that feeds AI systems. While schema markup is a technical SEO approach, it all starts with content.

When You Implement Schema Markup, You’re:

Defining Entities

Schema markup clarifies the “things” your content is about, such as products, services, people, locations, and more.

It provides precise tags that help machines recognize and categorize your content accurately.

Establishing Relationships

Beyond defining individual entities (a.k.a. topics), schema markup describes how those entities connect to each other and to broader topics across the web.

This creates a web of meaning that mirrors how humans understand context and relationships.

Providing Machine-Readable Context

Schema markup assists your content to be machine-readable.

It enables search engines and AI tools to confidently identify, interpret, and surface your content in relevant contexts, which can help your brand appear where it is most relevant.

Enterprise SEO and content teams can work together to implement schema markup to create a content knowledge graph, a structured representation of your brand’s expertise, offerings, and topic authority.

When you do this, the data you put into search and AI platforms is ready for large language models (LLMs) to make accurate inferences, which can help with consumer visibility.

What Is A Content Knowledge Graph?

A content knowledge graph organizes your website’s data into a network of interconnected entities and topics, all defined by implementing schema markup based on the Schema.org vocabulary. This graph serves as a digital map of your brand’s expertise and topical authority.

Imagine your website as a library. Without a knowledge graph, AI systems trying to read your site have to sift through thousands of pages, hoping to piece together meaning from scattered words and phrases.

With a content knowledge graph:

  • Entities are defined. Machines can informed precisely who, what, and where you’re talking about.
  • Topics are connected. Machines can better understand and infer how subjects relate. For example, machines can infer that “cardiology” encompasses entities like heart disease, cholesterol, or specific medical procedures.
  • Content becomes query-ready. your content is assisted to become structured data that AI can reference, cite, and include in responses.

When your content is organized into a knowledge graph, you’re effectively supplying AI platforms with information about your products, services, and expertise.

This becomes a powerful control point for how your brand is represented in AI search experiences.

Rather than leaving it to chance how AI systems interpret your web content, you can help to proactively shape the narrative and ensure machines have the right signals to potentially include your brand in conversations, summaries, and recommendations.

Your organization’s leaders should be aware this is now a strategic issue, not just a technical one.

A content knowledge graph gives you some influence over how your organization’s expertise and authority are recognized and distributed by AI systems, which can impact discoverability, reputation, and competitive advantage in a rapidly evolving digital landscape.

This structure can improve your chances of appearing in AI-generated answers and equips your content and SEO teams with data-driven insights to guide your content strategy and optimization efforts.

How Enterprise SEO And Content Teams Can Build A Content Knowledge Graph

Here’s how enterprise teams can operationalize a content knowledge graph to future-proof discoverability and unify SEO and content strategies:

1. Define What You Want To Be Known For

Enterprise brands should start by identifying their core topical authority areas. Ask:

  • Which topics matter most to our audience and brand?
  • Where do we want to be the recognized authority?
  • What new topics are emerging in our industry that we should own?

These strategic priorities shape the pillars of your content knowledge graph.

2. Use Schema Markup To Define Key Entities

Next, use schema markup to:

  • Identify key entities tied to your priority topics, such as products, services, people, places, or concepts.
  • Connect those entities to each other through Schema.org properties, such as “about,” “mentions,” or “sameAs.”
  • Ensure consistent entity definitions across your entire site so that AI systems can reliably identify and understand entities and their relationships.

This is how your content becomes machine-readable and more likely to be accurately included in AI-driven results and recommendations.

3. Audit Your Existing Content Against Your Content Knowledge Graph

Instead of just tracking keywords, enterprises should audit their content based on entity coverage:

  • Are all priority entities represented on your site?
  • Do you have “entity homes” (pillar pages) that serve as authoritative hubs for those priority entities?
  • Where are there gaps in entity coverage that could limit your presence in search and AI responses?
  • What content opportunities exist to improve coverage of priority entities where these gaps have been identified?

A thorough audit provides a clear roadmap for aligning your content strategy with how machines interpret and surface information, ensuring your brand has the potential to be discoverable in evolving AI-driven search experiences.

4. Create Pillar Pages And Fill Content Gaps

Based on your findings from Step 3, create dedicated pillar pages for high-priority entities where needed. These become the authoritative source that:

  • Defines the entity.
  • Links to supporting content, including case studies, blog posts, or service pages.
  • Signals to search engines and AI systems on where to find reliable information about that entity.

Supporting content can then be created to expand on subtopics and related entities that link back to these pillar pages, ensuring comprehensive coverage of topics.

5. Measure Performance By Entity And Topic

Finally, enterprises should track how well their content performs at the entity and topic levels:

  • Which entities drive impressions and clicks in AI-powered search results?
  • Are there emerging entities gaining traction in your industry that you should cover?
  • How does your topical authority compare to competitors?

This data-driven approach enables continuous optimization, helping you to stay visible as AI search evolves.

Why SEO And Content Teams Are The Heroes Of The AI Search Evolution

In this new landscape, where AI generates answers before users ever reach your website, schema markup and content knowledge graphs provide a critical control point.

They enable your brand to signal its authority to machines, support the possibility of accurate inclusion in AI results and overviews, and inform SEO and content investment based on data, not guesswork.

For enterprise organizations, this isn’t just an SEO tactic; it’s a strategic imperative that could protect visibility and brand presence in the new digital ecosystem.

So, the question remains: What does your brand want to be known for?

Your content knowledge graph is the infrastructure that ensures AI systems, and by extension, your future customers, know the answer.

More Resources:


Featured Image: Urbanscape/Shutterstock

2025 Core Web Vitals Challenge: WordPress Versus Everyone via @sejournal, @martinibuster

The Core Web Vitals Technology Report shows the top-ranked content management systems by Core Web Vitals (CWV) for the month of June (July’s statistics aren’t out yet). The breakout star this year is an e-commerce platform, which is notable because shopping sites generally have poor performance due to the heavy JavaScript and image loads necessary to provide shopping features.

This comparison also looks at the Interaction to Next Paint (INP) scores because they don’t mirror the CWV scores. INP measures how quickly a website responds visually after a user interacts with it. The phrase “next paint” refers to the moment the browser visually updates the page in response to a user’s interaction.

A poor INP score can mean that users will be frustrated with the site because it’s perceived as unresponsive. A good INP score correlates with a better user experience because of how quickly the website performs.

Core Web Vitals Technology Report

The HTTP Archive Technology Report combines two public datasets:

  1. Chrome UX Report (CrUX)
  2. HTTP Archive

1. Chrome UX Report (CrUX)
CrUX obtains its data from Chrome users who opt into providing usage statistics reporting as they browse over 8 million websites. This data includes performance on Core Web Vitals metrics and is aggregated into monthly datasets.

2. HTTP Archive
HTTP Archive obtains its data from lab tests by tools like WebPageTest and Lighthouse that analyze how pages are built and whether they follow performance best practices. Together, these datasets show how websites perform and what technologies they use.

The CWV Technology Report combines data from HTTP Archive (which tracks websites through lab-based crawling and testing) and CrUX (which collects real-user performance data from Chrome users), and that’s where the Core Web Vitals performance data of content management systems comes from.

#1 Ranked Core Web Vitals (CWV) Performer

The top-performing content management system is Duda. A remarkable 83.63% of websites on the Duda platform received a good CWV score. Duda has consistently ranked #1, and this month continues that trend.

For Interaction to Next Paint scores, Duda ranks in the second position.

#2 Ranked CWV CMS: Shopify

The next position is occupied by Shopify. 75.22% of Shopify websites received a good CWV score.

This is extraordinary because shopping sites are typically burdened with excessive JavaScript to power features like product filters, sliders, image effects, and other tools that shoppers rely on to make their choices. Shopify, however, appears to have largely solved those issues and is outperforming other platforms, like Wix and WordPress.

In terms of INP, Shopify is ranked #3, at the upper end of the rankings.

#3 Ranked CMS For CWV: Wix

Wix comes in third place, just behind Shopify. 70.76% of Wix websites received a good CWV score. In terms of INP scores, 86.82% of Wix sites received a good INP score. That puts them in fourth place for INP.

#4 Ranked CMS: Squarespace

67.66% of Squarespace sites had a good CWV score, putting them in fourth place for CWV, just a few percentage points behind the No. 3 ranked Wix.

That said, Squarespace ranks No. 1 for INP, with a total of 95.85% of Squarespace sites achieving a good INP score. That’s a big deal because INP is a strong indicator of a good user experience.

#5 Ranked CMS: Drupal

59.07% of sites on the Drupal platform had a good CWV score. That’s more than half of sites, considerably lower than Duda’s 83.63% score but higher than WordPress’s score.

But when it comes to the INP score, Drupal ranks last, with only 85.5% of sites scoring a good INP score.

#6 Ranked CMS: WordPress

Only 43.44% of WordPress sites had a good CWV score. That’s over fifteen percentage points lower than fifth-ranked Drupal. So WordPress isn’t just last in terms of CWV performance; it’s last by a wide margin.

WordPress performance hasn’t been getting better this year either. It started 2025 at 42.58%, then went up a few points in April to 44.93%, then fell back to 43.44%, finishing June at less than one percentage point higher than where it started the year.

WordPress is in fifth place for INP scores, with 85.89% of WordPress sites achieving a good INP score, just 0.39 points above Drupal, which is in last place.

But that’s not the whole story about the WordPress INP scores. WordPress started the year with a score of 86.05% and ended June with a slightly lower score.

INP Rankings By CMS

Here are the rankings for INP, with the percentage of sites exhibiting a good INP score next to the CMS name:

  1. Squarespace 95.85%
  2. Duda 93.35%
  3. Shopify 89.07%
  4. Wix 86.82%
  5. WordPress 85.89%
  6. Drupal 85.5%

As you can see, positions 3–6 are all bunched together in the eighty percent range, with only a 3.57 percentage point difference between the last-placed Drupal and the third-ranked Shopify. So, clearly, all the content management systems deserve a trophy for INP scores. Those are decent scores, especially for Shopify, which earned a second-place ranking for CWV and third place for INP.

Takeaways

  • Duda Is #1
    Duda leads in Core Web Vitals (CWV) performance, with 83.63% of sites scoring well, maintaining its top position.
  • Shopify Is A Strong Performer
    Shopify ranks #2 for CWV, a surprising performance given the complexity of e-commerce platforms, and scores well for INP.
  • Squarespace #1 For User Experience
    Squarespace ranks #1 for INP, with 95.85% of its sites showing good responsiveness, indicating an excellent user experience.
  • WordPress Performance Scores Are Stagnant
    WordPress lags far behind, with only 43.44% of sites passing CWV and no signs of positive momentum.
  • Drupal Also Lags
    Drupal ranks last in INP and fifth in CWV, with over half its sites passing but still underperforming against most competitors.
  • INP Scores Are Generally High Across All CMSs
    Overall INP scores are close among the bottom four platforms, suggesting that INP scores are relatively high across all content management systems.

Find the Looker Studio rankings for here (must be logged into a Google account to view).

Featured Image by Shutterstock/Krakenimages.com

Google URL Removal Bug Enabled Attackers To Deindex URLs via @sejournal, @martinibuster

Google recently fixed a bug that enabled anyone to anonymously use an official Google tool to remove any URL from Google search and get away with it. The tool had the potential to be used to devastate competitor rankings by removing their URLs completely from Google’s index. The bug was known by Google since 2023 but until now Google hadn’t taken action to fix it.

Tool Exploited For Reputation Management

A report by the Freedom of the Press Foundation recounted the case of a tech CEO who had employed numerous tactics to “censor” negative reporting by a journalist, ranging from legal action to identify the reporter’s sources, an “intimidation campaign” via the San Francisco city attorney and a DMCA takedown request.

Through it all, the reporter and the Freedom of the Press Foundation prevailed in court, and the article at the center of the actions remained online until it began getting removed through abuse of Google’s Remove Outdated Content tool. Restoring the web page with Google Search Console was easy, but the abuse continued. This led to opening a discussion on the Google Search Console Help Community.

The person posted a description of what was happening and asked if there was a way to block abuse of the tool. The post alleged that the attacker was choosing a word that was no longer in the original article and using that as the basis for claiming an article is outdated and should be removed from Google’s search index.

This is what the report on Google’s Help Community explained:

“We have a dozen articles that got removed this way. We can measure it by searching Google for the article, using the headline in quotes and with the site name. It shows no results returned.

Then, we go to GSC and find it has been “APPROVED” under outdated content removal. We cancel that request. Moments later, the SAME search brings up an indexed article. This is the 5th time we’ve seen this happen.”

Four Hundred Articles Deindexed

What was happening was an aggressive attack against a website, and Google apparently was unable to do anything to stop the abuse, leaving the user in a very bad position.

In a follow-up post, they explained the devastating effect of the sustained negative SEO attack:

“Every week, dozens of pages are being deindexed and we have to check the GSC every day to see if anything else got removed, and then restore that.

We’ve had over 400 articles deindexed, and all of the articles were still live and on our sites. Someone went in and submitted them through the public removal tool, and they got deindexed.”

Google Promised To Look Into It

They asked if there was a way to block the attacks, and Google’s Danny Sullivan responded:

“Thank you — and again, the pages where you see the removal happening, there’s no blocking mechanism on them.”

Danny responded to a follow-up post, saying that they would look into it:

“The tool is designed to remove links that are no longer live or snippets that are no longer reflecting live content. We’ll look into this further.”

How Google’s Tool Was Exploited

The initial report said that the negative SEO attack was leveraging changed words within the content to file a successful outdated content removal. But it appears that they later discovered that another attack method was being used.

Google’s Outdated Content Removal tool is case-sensitive, which means that if you submit a URL containing an uppercase letter, the crawler will go out to specifically check for the uppercase version, and if the server returns a 404 Not Found error response, Google will remove all versions of the URL.

The Freedom of the Press Foundation writes that the tool is case insensitive, but that’s not entirely correct because if it were insensitive, the case wouldn’t matter. But the case does matter, which means that it is case sensitive.

By the way, the victim of the attack could have created a workaround by rewriting all requests for uppercase URLs to lowercase and enforcing lowercase URLs across the entire website.

That’s the flaw the attacker exploited. So, while the tool was case sensitive, at some point in the system Google’s removal system is case agnostic, which resulted in the correct URL being removed.

Here’s how the Freedom of the Press Foundation described it:

“Our article… was vanished from Google search using a novel maneuver that apparently hasn’t been publicly well documented before: a sustained and coordinated abuse of Google’s “Refresh Outdated Content” tool.

This tool is supposed to allow those who are not a site’s owner to request the removal from search results of web pages that are no longer live (returning a “404 error”), or to request an update in search of web pages that display outdated or obsolete information in returned results.

However, a malicious actor could, until recently, disappear a legitimate article by submitting a removal request for a URL that resembled the target article but led to a “404 error.” By altering the capitalization of a URL slug, a malicious actor apparently could take advantage of a case-insensitivity bug in Google’s automated system of content removal.”

Other Sites Affected By Thes Exploit

Google responded to the Freedom of the Press Foundation and admitted that this exploit did, in fact, affect other sites.

They are quoted as saying the issue only impacted a “tiny fraction of websites” and that the wrongly impacted sites were reinstated.

Google responded by email to note that this bug has been fixed.

Industry Pioneer Reveals Why SEO Isn’t Working & What To Refocus On via @sejournal, @theshelleywalsh

Bill Hunt is a true pioneer in the industry, with more than 25 years of experience working on the websites of some of the largest multinationals. Having built two large digital/search agencies, one of which was acquired by Ogilvy, Bill has now moved into consulting focused on repositioning search to leverage marketing for shareholder growth.

His approach is not myopic surface-level SEO, but as an enterprise specialist who looks at what users actually want from their online experience. He connects the dots between search visibility, user experience, and business value for real results.

Bill is currently writing a series for Search Engine Journal about connecting search visibility to business value, and I spoke to him for IMHO to find out why he thinks SEO is currently not working.

“SEOs are creatures of habit. To succeed now, we need to unlearn and relearn how discovery actually works.”

The Real Problems Aren’t What You Think

I started out by asking Bill why SEO isn’t working, and his key message was not that SEO is broken, but there is paralysis, distraction from AI hype, and a neglect of fundamentals:

“I think there are three key problems right now. One is paralysis. We see that clients put search on pause, especially organic search, because they just don’t know what to do.

The second is the distraction with all the hype around the AI thing.

I mean, there’s a different acronym every day. So, which do we do? Are we chasing answers? Are we doing LLM index files or whatever craziness comes out?

And then the third is that there’s such a distraction from all this that a lot of the fundamentals aren’t being covered. And I think that’s where the problem is.”

Bill emphasized that the impact varies significantly by business type. Information-based businesses have been significantly affected because AI now directly answers queries that previously drove traffic to their sites. However, many other businesses might not be negatively impacted if they understand what’s actually changed.

Three Fundamental Shifts To Pay Attention To

Bill went on to talk about how three core changes have reshaped search, and understanding them is crucial for adaptation:

  • Intent understanding has evolved: Everything is about what did they search for? What are they hoping to see?
  • Friction must be removed: Platforms reward the path of least resistance.
  • Monetization is leading the way: It’s not just about helpful, but also about profitable.

Bill used an example from his work with Absolut Vodka.

“When I was working with Absolut Vodka, we had a drink site that was really just an awareness driver, and every month we sat down and we looked at Google’s search results and said, ‘If we were Google, what would we be changing around drinks or recipes or things like that?’

And so, by looking at the results, we could see, little by little, [that] somebody [was] looking for yellow cocktails. What should Google present?”

Rather than just optimizing for rankings, his team studied Google’s interface changes and adapted their visual content accordingly.

“We started focusing on the drink, bringing it front and center, amplifying the colors, the ingredients, and more and more people clicked.

We were generating millions and millions of visits because every step that Google was making to create a different user experience, we were trying to accommodate it.”

Bill believes that the idea of intent is still crucial. Considering how users just want to get to an answer, we must think about how they discover information and how we then present information to them.

“I think that’s really it in a nutshell. All of this change has paralyzed us and distracted us, and we need to recenter and refocus.

And that’s really a key part of what this series [at SEJ] is about: How do we refocus? How do we rethink this, both from a strategic point of view, from a shareholder value standpoint, and from a simple workflow standpoint?”

AI Tools Reward Consensus, Not Originality

In a recent LinkedIn post, Bill stated that AI tools don’t reward originality; they reward consensus.

As generative AI becomes embedded into how users explore and consume information, Bill warned against assuming that originality is enough to get discovered.

“AI systems synthesize consensus. If you’re saying something radically different, you won’t show up unless you connect it to what people already know.”

So, I asked Bill if you are creating this original content, how do you teach the systems to see you?

Bill’s advice is that to succeed in AI search environments, businesses need to:

  • Link new ideas to familiar terms.
  • Reflect user language and legacy concepts.
  • Be explicit in bridging the gap between old and new methods.

Otherwise, you risk being invisible to LLMs and answer engines that rely on summarizing well-established viewpoints.

“If you’re stating that you’re radically different, you’re not going to be shown because you’re radically different. So, you have to connect, and this is what I put in that article. You need to connect back to the consensus idea.

If you’re saying you’ve got a new way to cut bread, you have to talk about the old way to cut bread and connect it to a more efficient or easier way to do it.”

Is Your Product Even Discoverable?

The most practical insight from our conversation surrounded how people can discover your brand or your product.

Historically, keyword research has been focused on connecting to searches that have existing search volume. But, if somebody doesn’t know a product exists to solve a problem, how would they search for it?

“I used to tell companies, if somebody doesn’t know a product exists to solve a problem, how would they search for it?

They would use the problem or symptoms of the problem. If they know a product exists but don’t know you exist, how would they search for it?”

Bill recommended that you run searches for problems related to your product and see if you show up. Search as if you know the solution exists, but not your brand.

And if you don’t surface, ask yourself why not?

“Take the symptoms people have, go into any tool you want, Google, Perplexity, ChatGPT, Gemini, and search and see if you come up.

If you don’t come up, the very next question you should ask is, ‘Why isn’t this product or this company in your result set?’ That’s probably the single most illuminating thing a senior executive can do…

When it tells you that you don’t have the answer, your very next step is, ‘How do we then create the answer, and then how do we get it into these?’”

This kind of query-path analysis is more revealing than traditional keyword research because it aligns with how people actually search, especially in AI environments that interpret broader queries.

Moving Forward: Back To Basics

Despite all the AI disruption, Bill recommends a return to fundamental principles. Companies need to ensure they’re indexable, crawlable, and seen as authorities in their space. The same core elements that have always mattered for search visibility.

“Who got cited? Who was number one? And Larry and Sergey said, ‘Well, if they’re cited most frequently as a source for a question, shouldn’t they be?’”

The key difference is that these fundamentals now operate in an AI-enhanced environment where understanding user intent and creating relevant, engaging content matter more than ever.

And if you want to find answers, ask the tools; they can tell you everything you need to know.

“I would tell everybody to go do that query and do the follow-up saying why aren’t we there? And you’d be surprised how efficient these tools are at telling you what you need to do to close that gap.”

Rather than panicking about AI destroying SEO, organizations should focus on understanding what’s actually changed and adapting their strategies accordingly.

The fundamentals remain solid; they just need to be applied in new ways.

You can watch the full interview with Bill Hunt below:

Don’t miss the new series that Bill is currently writing for SEJ about how you can connect the dots between search visibility, user experience, and business value that will not only help CMOs but also help search marketers get buy-in from CMOs.

Thank you to Bill Hunt for offering his insights and being my guest on IMHO.

More Resources: 


Featured Image: Shelley Walsh/Search Engine Journal

Research Shows Differences In ChatGPT And Google AIO Answers via @sejournal, @martinibuster

New research from enterprise search marketing platform BrightEdge discovered differences in how Google and ChatGPT surface content. These differences matter to digital marketers and content creators because they show how content is recommended by each system. Recognizing the split enables brands to adapt their content strategies to stay relevant across both platforms.

BrightEdge’s findings were surfaced through an analysis of B2B technology, education, healthcare, and finance queries. It’s possible to cautiously extrapolate the findings to other niches where there could be divergences in how Google and ChatGPT respond, but that’s highly speculative, so this article won’t do that.

Core Differences: Task Vs. Information Orientation

BrightEdge’s research discovered that ChatGPT and Google AI Overviews take two different approaches to helping users take action. ChatGPT is more likely to recommend tools and apps, behaving in the role of a guide for making immediate decisions. Google provides informational content that encourages users to read before acting. This difference matters for SEO because it enables content creators and online stores to understand how their content is processed and presented to users of each system.

BrightEdge explains:

“In task-oriented prompts, ChatGPT overwhelmingly suggests tools and apps directly, while Google continues to link to informational content. While Google thrives as a research assistant, ChatGPT acts like a trusted coach for decision making, and that difference shapes which tool users instinctively choose for different needs.”

Divergence On Action-Oriented Queries

ChatGPT and Google tend to show similar kinds of results when users are querying for comparisons, but the results begin to diverge when the user intent implies they want to act. BrightEdge found that prompts about credit card comparisons or learning platforms generated similar kinds of results.

Questions with an action intent, like “how to create a budget” or “learn Python,” lead to different answers. ChatGPT appears to treat action intent prompts as requiring a response with tools, while Google treats them as requiring information.

BrightEdge notes that Healthcare has the highest rate of divergence:

“At 62% divergence, healthcare demonstrates the most significant split between platforms.

  • When prompts pertain to symptoms or medical information, both ChatGPT and Google will mention the CDC and The Mayo Clinic.
  • However, when prompted to help with things like “How to find a doctor,” ChatGPT pushes users towards Zocdoc, while Google points to hospital directories.”

B2B Technology niche has the second highest level of divergence:

“With 47% divergence, B2B tech shows substantial platform differences.

  • When comparing technology, such as cloud platforms, both suggest AWS and Azure.
  • When asked “How to deploy things (such as specific apps),” ChatGPT relies on tools like Kubernetes and the AWS CLI, while Google offers tutorials and Stack Overflow.”

Education follows closely behind B2B technology:

“At 45% divergence, education follows the same trend.

  • When comparing “Best online learning platforms,” both platforms surface Coursera, EdX, and LinkedIn Learning.
  • When a user’s prompt pertains to learning a skill such as “How to learn Python,” ChatGPT recommends Udemy, whereas Google directs users to user-generated content hubs like GitHub and Medium.”

Finance shows the lowest levels of divergence, at 39%.

BrightEdge concludes that this represents a “fundamental shift” in how AI platforms interpret intent, which means that marketers need to examine the intent behind the search results for each platform and make content strategy decisions based on that research.

Tools Versus Topics

BrightEdge uses the example of the prompt “What are some resources to help plan for retirement?” to show how Google and ChatGPT differ. ChatGPT offers calculators and tools that users can act on, while Google suggests topics for further reading.

Screenshot Of ChatGPT Responding With Financial Tools

There’s a clear difference in the search experience for users. Marketers, SEOs, and publishers should consider how to meet both types of expectations: practical, action-based responses from ChatGPT and informational content from Google.

Takeaways

  • Split In User Intent Interpretation:
    Google interprets queries as requests for information, while ChatGPT tends to interpret many of the same queries as a call for action that’s solved by tools.
  • Platform Roles:
    ChatGPT behaves like a decision-making coach, while Google acts as a research assistant.
  • Domain-Specific Differences:
    Healthcare has the highest divergence (62%), especially in task-based queries like finding a doctor.
    B2B Technology (47%) and Education (45%) also show significant splits in how guidance is delivered.
    Finance shows the least divergence (39%) in how results are presented.
  • Tools vs. Topics:
    ChatGPT recommends actionable resources; Google links to authoritative explainer content.
  • SEO Insight:
    Content strategies must reflect each platform’s interpretation of intent. For example, creating actionable responses for ChatGPT and comprehensive informational content for Google. This may even mean creating and promoting a useful tool that can surface in ChatGPT.

BrightEdge’s research shows that, for some queries, Google and ChatGPT interpret the same user intent in profoundly different ways. While Google treats action-oriented queries as a prompt to deliver informational content, ChatGPT responds by recommending tools and services users can immediately act on. This divergence calls attention to the need to understand when ChatGPT is delivering actionable responses in order for marketers and content creators to create platform-specific content and web experiences.

Read the original research:

Brand Visibility: ChatGPT and Google AI Approaches by Industry

Featured Image by Shutterstock/wenich_mit

How To Win In Generative Engine Optimization (GEO) via @sejournal, @maltelandwehr

This post was sponsored by Peec.ai. The opinions expressed in this article are the sponsor’s own.

The first step of any good GEO campaign is creating something that LLM-driven answer machines actually want to link out to or reference.

GEO Strategy Components

Think of experiences you wouldn’t reasonably expect to find directly in ChatGPT or similar systems:

  • Engaging content like a 3D tour of the Louvre or a virtual reality concert.
  • Live data like prices, flight delays, available hotel rooms, etc. While LLMs can integrate this data via APIs, I see the opportunity to capture some of this traffic for the time being.
  • Topics that require EEAT (experience, expertise, authoritativeness, trustworthiness).

LLMs cannot have first-hand experience. But users want it. LLMs are incentivized to reference sources that provide first-hand experience. That’s just one of the things to keep in mind, but what else?

We need to differentiate between two approaches: influencing foundational models versus influencing LLM answers through grounding. The first is largely out of reach for most creators, while the second offers real opportunities.

Influencing Foundational Models

Foundational models are trained on fixed datasets and can’t learn new information after training. For current models like GPT-4, it is too late – they’ve already been trained.

But this matters for the future: imagine a smart fridge stuck with o4-mini from 2025 that might – hypothetically – favor Coke over Pepsi. That bias could influence purchasing decisions for years!

Optimizing For RAG/Grounding

When LLMs can’t answer from their training data alone, they use retrieval augmented generation (RAG) – pulling in current information to help generate answers. AI Overviews and ChatGPT’s web search work this way.

As SEO professionals, we want three things:

  1. Our content gets selected as a source.
  2. Our content gets quoted most within those sources.
  3. Other selected sources support our desired outcome.

Concrete Steps To Succeed With GEO

Don’t worry, it doesn’t take rocket science to optimize your content and brand mentions for LLMs. Actually, plenty of traditional SEO methods still apply, with a few new SEO tactics you can incorporate into your workflow.

Step 1: Be Crawlable

Sounds simple but it is actually an important first step. If you aim for maximum visibility in LLMs, you need to allow them to crawl your website. There are many different LLM crawlers from OpenAI, Anthropic & Co.

Some of them behave so badly that they can trigger scraping and DDoS preventions. If you are automatically blocking aggressive bots, check in with your IT team and find a way to not block LLMs you care about.

If you use a CDN, like Fastly or Cloudflare, make sure LLM crawlers are not blocked by default settings.

Step 2: Continue Gaining Traditional Rankings

The most important GEO tactic is as simple as it sounds. Do traditional SEO. Rank well in Google (for Gemini and AI Overviews), Bing (for ChatGPT and Copilot), Brave (for Claude), and Baidu (for DeepSeek).

Step 3: Target the Query Fanout

The current generation of LLMs actually does a little more than simple RAG. They generate multiple queries. This is called query fanout.

For example, when I recently asked ChatGPT “What is the latest Google patent discussed by SEOs?”, it performed two web searches for “latest Google patent discussed by SEOs patent 2025 SEO forum” and “latest Google patent SEOs 2025 discussed”.

Advice: Check the typical query fanouts for your prompts and try to rank for those keywords as well.

Typical fanout-patterns I see in ChatGPT are appending the term “forums” when I ask what people are discussing and appending “interview” when I ask questions related to a person. The current year (2025) is often added as well.

Beware: fanout patterns differ between LLMs and can change over time. Patterns we see today may not be relevant anymore in 12 months.

Step 4: Keep Consistency Across Your Brand Mentions

This is something simple everyone should do – both as a person and an enterprise. Make sure you are consistently described online. On X, LinkedIn, your own website, Crunchbase, Github – always describe yourself the same way.

If your X and LinkedIn profiles say you are a “GEO consultant for small businesses”, don’t change it to “AIO expert” on Github and “LLMO Freelancer” in your press releases.

I have seen people achieve positive results within a few days on ChatGPT and Google AI Overviews by simply having a consistent self description across the web. This also applies to PR coverage – the more and better coverage you can obtain for your brand, the more likely LLMs are to parrot it back to users.

Step 5: Avoid JavaScript

As an SEO, I always ask for as little JavaScript usage as possible. As a GEO, I demand it!

Most LLM crawlers cannot render JavaScript. If your main content is hidden behind JavaScript, you are out.

Step 6: Embrace Social Media & UGC

Unsurprisingly, LLMs seem to rely on reddit and Wikipedia a lot. Both platforms offer user-generated-content on virtually every topic. And thanks to multiple layers of community-driven moderation, a lot of junk and spam is already filtered out.

While both can be gamed, the average reliability of their content is still far better than on the internet as a whole. Both are also regularly updated.

reddit also provides LLM labs with data into how people discuss topics online, what language they use to describe different concepts, and knowledge on obscure niche topics.

We can reasonably assume that moderated UGC found on platforms like reddit, Wikipedia, Quora, and Stackoverflow will stay relevant for LLMs.

I do not advocate spamming these platforms. However, if you can influence how you and competitors show up there, you might want to do so.

Step 7: Create For Machine-Readability & Quotability

Write content that LLMs understand and want to cite. No one has figured this one out perfectly yet, but here’s what seems to work:

  • Use declarative and factual language. Instead of writing “We are kinda sure this shoe is good for our customers”, write “96% of buyers have self-reported to be happy with this shoe.
  • Add schema. It has been debated many times. Recently, Fabrice Canel (Principal Product Manager at Bing) confirmed that schema markup helps LLMs to understand your content.
  • If you want to be quoted in an already existing AI Overview, have content with similar length to what is already there. While you should not just copy the current AI Overview, having high cosine similarly helps. And for the nerds: yes, given normalization, you can of course use the dot product instead of cosine similarity.
  • If you use technical terms in your content, explain them. Ideally in a simple sentence.
  • Add summaries of long text paragraphs, lists of reviews, tables, videos, and other types of difficult-to-cite content formats.

Step 8: Optimize your Content

Start of the paper GEO: Generative Engine Optimization (arXiv:2311.09735)The original GEO paper

If we look at GEO: Generative Engine Optimization (arXiv:2311.09735) , What Evidence Do Language Models Find Convincing? (arXiv:2402.11782v1), and similar scientific studies, the answer is clear. It depends!

To be cited for some topics in some LLMs, it helps to:

  • Add unique words.
  • Have pro/cons.
  • Gather user reviews.
  • Quote experts.
  • Include quantitative data and name your sources.
  • Use easy to understand language.
  • Write with positive sentiment.
  • Add product text with low perplexity (predictable and well-structured).
  • Include more lists (like this one!).

However, for other combinations of topics and LLMs, these measures can be counterproductive.

Until broadly accepted best practices evolve, the only advice I can give is do what is good for users and run experiments.

Step 9: Stick to the Facts

For over a decade, algorithms have extracted knowledge from text as triples like (Subject, Predicate, Object) — e.g., (Lady Liberty, Location, New York). A text that contradicts known facts may seem untrustworthy. A text that aligns with consensus but adds unique facts is ideal for LLMs and knowledge graphs.

So stick to the established facts. And add unique information.

Step 10: Invest in Digital PR

Everything discussed here is not just true for your own website. It is also true for content on other websites. The best way to influence it? Digital PR!

The more and better coverage you can obtain for your brand, the more likely LLMs are to parrot it back to users.

I have even seen cases where advertorials were used as sources!

Concrete GEO Workflows To Try

Before I joined Peec AI, I was a customer. Here is how I used the tool – and how I advise our customers to use it.

Learn Who Your Competitors Are

Just like with traditional SEO, using a good GEO tool will often reveal unexpected competitors. Regularly look at a list of automatically identified competitors. For those who surprise you, check in which prompts they are mentioned. Then check the sources that led to their inclusion. Are you represented properly in these sources? If not, act!

Is a competitor referenced because of their PeerSpot profile but you have zero reviews there? Ask customers for a review.

Was your competitor’s CEO interviewed by a Youtuber? Try to get on that show as well. Or publish your own videos targeting similar keywords.

Is your competitor regularly featured on top 10 lists where you never make it to the top 5? Offer the publisher who created the list an affiliate deal they cannot decline. With the next content update, you’re almost guaranteed to be the new number one.

Understand the Sources

When performing search grounding, LLMs rely on sources.

Typical LLM Sources: Reddit & Wikipedia

Look at the top sources for a large set of relevant prompts. Ignore your own website and your competitors for a second. You might find some of these:

  • A community like Reddit or X. Become part of the community and join the discussion. X is your best bet to influence results on Grok.
  • An influencer-driven website like YouTube or TikTok. Hire influencers to create videos. Make sure to instruct them to target the right keywords.
  • An affiliate publisher. Buy your way to the top with higher commissions.
  • A news and media publisher. Buy an advertorial and/or target them with your PR efforts. In certain cases, you might want to contact their commercial content department.

You can also check out this in-depth guide on how to deal with different kinds of source domains.

Target Query Fanout

Once you have observed which searches are triggered by query fanout for your most relevant prompts, create content to target them.

On your own website. With posts on Medium and LinkedIn. With press releases. Or simply by paying for article placements. If it ranks well in search engines, it has a chance to be cited by LLM-based answer engines.

Position Yourself for AI-Discoverability

Generative Engine Optimization is no longer optional – it’s the new frontline of organic growth. At Peec AI, we’re building the tools to track, influence, and win in this new ecosystem.

Generative Engine Optimization is no longer optional – it’s the new frontline of organic growth. We currently see clients growing their LLM traffic by 100% every 2 to 3 months. Sometimes with up to 20x the conversation rate of typical SEO traffic!

Whether you’re shaping AI answers, monitoring brand mentions, or pushing for source visibility, now is the time to act. The LLMs consumers will trust tomorrow are being trained today.


Image Credits

Featured Image: Image by Peec.ai Used with permission.

Google Explains The Process Of Indexing The Main Content via @sejournal, @martinibuster

Google’s Gary Illyes discussed the concept of “centerpiece content,” how they go about identifying it, and why soft 404s are the most critical error that gets in the way of indexing content. The context of the discussion was the recent Google Search Central Deep Dive event in Asia, as summarized by Kenichi Suzuki.

Main Body Content

According to Gary Illyes, Google goes to great lengths to identify the main content of a web page. The phrase “main content” will be familiar to those who have read Google’s Search Quality Rater Guidelines. The concept of “main content” is first introduced in Part 1 of the guidelines, in a section that teaches how to identify main content, which is followed by a description of main content quality.

The quality guidelines define main content (aka MC) as:

“Main Content is any part of the page that directly helps the page achieve its purpose. MC can be text, images, videos, page features (e.g., calculators, games), and it can be content created by website users, such as videos, reviews, articles, comments posted by users, etc. Tabs on some pages lead to even more information (e.g., customer reviews) and can sometimes be considered part of the MC.

The MC also includes the title at the top of the page (example). Descriptive MC titles allow users to make informed decisions about what pages to visit. Helpful titles summarize the MC on the page.”

Google’s Illyes referred to main content as the centerpiece content, saying that it is used for “ranking and retrieval.” The content in this section of a web page has greater weight than the content in the footer, header, and navigation areas (including sidebar navigation).

Suzuki summarized what Illyes said:

“Google’s systems heavily prioritize the “main content” (which he also calls the “centerpiece”) of a page for ranking and retrieval. Words and phrases located in this area carry significantly more weight than those in headers, footers, or navigation sidebars. To rank for important terms, you must ensure they are featured prominently within the main body of your page.”

Content Location Analysis To Identify Main Content

This part of Illyes’ presentation is important to get right. Gary Illyes said that Google analyzes the rendered web page to located the content so that it can assign the appropriate amount of weight to the words located in the main content.

This isn’t about the identifying the position of keywords in the page. It’s just about identifying the content within a web page.

Here’s what Suzuki transcribed:

“Google performs positional analysis on the rendered page to understand where content is located. It then uses this data to assign an importance score to the words (tokens) on the page. Moving a term from a low-importance area (like a sidebar) to the main content area will directly increase its weight and potential to rank.”

Insight: Semantic HTML is an excellent way to help Google identify the main content and the less important areas. Semantic HTML makes web pages less ambiguous because it uses HTML elements to identify the different areas of a web page, like the top header section, navigational areas, footers, and even to identify advertising and navigational elements that may be embedded within the main content area. This technical SEO process of making a web page less ambiguous is called disambiguation.

3. Tokenization Is Foundation Of Google’s Index

Because of the prevalence of AI technologies today, many SEOs are aware of the concept of tokenization. Google also uses tokenization to convert words and phrases into a machine-readable format for indexing. What gets stored in Google’s index isn’t the original HTML; it’s the tokenized representation of the content.

4. “Soft 404s Are A Critical Error

This part is important because it frames soft 404s as a critical error. Soft 404s are pages that should return a 404 response but instead return a 200 OK response. This can happen when an SEO or publisher redirects a missing web page to the home page in order to conserve their PageRank. Sometimes a missing web page will redirect to an error page that returns a 200 OK response, which is also incorrect.

Many SEOs mistakenly believe that the 404 response code is an error that needs fixing. A 404 is something that needs fixing only if the URL is broken and is supposed to point to a different URL that is live with actual content.

But in the case of a URL for a web page that is gone and is likely never returning because it has not been replaced by other content, a 404 response is the correct one. If the content has been replaced or superseded by another web page, then it’s proper in that case to redirect the old URL to the URL where the replacement content exists.

The point of all this is that, to Google, a soft 404 is a critical error. That means that SEOs who try to fix a non-error event like a 404 response by redirecting the URL to the home page are actually creating a critical error by doing so.

Suzuki noted what Illyes said:

“A page that returns a 200 OK status code but displays an error message or has very thin/empty main content is considered a “soft 404.” Google actively identifies and de-prioritizes these pages as they waste crawl budget and provide a poor user experience. Illyes shared that for years, Google’s own documentation page about soft 404s was flagged as a soft 404 by its own systems and couldn’t be indexed.”

Takeaways

  • Main Content
    Google gives priority to the main content portion of a given web page. Although Gary Illyes didn’t mention it, it may be helpful to use semantic HTML to clearly outline what parts of the page are the main content and which parts are not.
  • Google Tokenizes Content For Indexing
    Google’s use of tokenization enables semantic understanding of queries and content. The importance for SEO is that Google no longer relies heavily on exact-match keywords, which frees publishers and SEOs to focus on writing about topics (not keywords) from the point of view of how they are helpful to users.
  • Soft 404s Are A Critical Error
    Soft 404s are commonly thought of as something to avoid, but they’re not generally understood as a critical error that can negatively impact the crawl budget. This elevates the importance of avoiding soft 404s.

Featured Image by Shutterstock/Krakenimages.com