Google AI Overviews Target Of Legal Complaints In The UK And EU via @sejournal, @martinibuster

The Movement For An Open Web and other organizations filed a legal challenge against Google, alleging harm to UK news publishers. The crux of the legal filing is the allegation that Google’s AI Overviews product is using news content as part of its summaries and for grounding AI answers, but not allowing publishers to opt out of that use without also opting out of appearing in search results.

The Movement For An Open Web (MOW) in the UK published details of a complaint to the UK’s Competition and Markets Authority (CMA):

“Last week, the CMA announced plans to consult on how to make Google search fairer, including providing “more control and transparency for publishers over how their content collected for search is used, including in AI-generated responses.” However, the complaint from Foxglove, the Alliance and MOW warns that news organisations are already being harmed in the UK and action is needed immediately.

In particular, publishers urgently need the ability to opt out of Google’s AI summaries without being removed from search altogether. This is a measure that has already been proposed by other leading regulators, including the US Department of Justice and the South African Competition Commission. Foxglove is warning that without immediate action, the UK – and its news industry – risks being left behind, while other states take steps to protect independent news from Google.

Foxglove is therefore seeking interim measures to prevent Google misusing publisher content pending the outcome of the CMA’s more detailed review.”

Reuters is reporting on an EU antitrust complaint filed in Brussels seeking relief for the same thing:

“Google’s core search engine service is misusing web content for Google’s AI Overviews in Google Search, which have caused, and continue to cause, significant harm to publishers, including news publishers in the form of traffic, readership and revenue loss.”

Publishers And SEOs Critical Of AI Overviews

Google is under increasing criticism from the publisher and the SEO community for sending fewer clicks to users, although Google itself insists it is sending more traffic than ever. This may be one of those occasions where the phrase “let the judge decide” describes where this is all going, because there are no signs that Google is backing down from its decade-long trend of showing fewer links and more answers.

Featured Image by Shutterstock/nitpicker

SEO Rockstar “Proves” You Don’t Need Meta Descriptions via @sejournal, @martinibuster

An SEO shared on social media that his SEO tests proved that not using a meta description resulted in a lift in traffic. Coincidentally, another well-known SEO published an article that claims that SEO tests misunderstand how Google and the internet actually work and lead to the deprioritization of meaningful changes. Who is right?

SEO Says Pages Without Meta Descriptions Received Ranking Improvement

Mark Williams-Cook posted the results of his SEO test on LinkedIn about using and omitting meta descriptions, concluding that pages lacking a meta description received an average traffic lift of approximately 3%.

Here’s some of what he wrote:

“This will get some people’s backs up, but we don’t recommend writing meta descriptions anymore, and that’s based on data and testing.

We have consistently found a small, usually around 3%, but statistically significant uplift to organic traffic on groups of pages with no meta descriptions vs test groups of pages with meta descriptions via SEOTesting.

I’ve come to the conclusion if you’re writing meta descriptions manually, you’re wasting time. If you’re using AI to do it, you’re probably wasting a small amount of time.”

Williams-Cook asserted that Google rewrites around 80% of meta descriptions and insisted that the best meta descriptions are query dependent, meaning that the ideal meta description would be one that’s custom written for the specific queries the page is ranking for, which is what Google does when the meta description is missing.

He expressed the opinion that omitting the meta description increases the likelihood that Google will step in and inject a query-relevant meta description into the search results which will “outperform” the normal meta description that’s optimized for whatever the page is about.

Although I have reservations about SEO tests in general, his suggestion is intriguing and has the ring of plausibility.

Are SEO Tests Performative Theater?

Coincidentally, Jono Alderson, a technical SEO consultant, published an article last week titled, “Stop testing. Start shipping.” where he discusses his view on SEO tests, calling it “performative theater.”

Alderson writes:

“The idea of SEO testing appeals because it feels scientific. Controlled. Safe…

You tweak one thing, you measure the outcome, you learn, you scale. It works for paid media, so why not here?

Because SEO isn’t a closed system. …It’s architecture, semantics, signals, and systems. And trying to test it like you would test a paid campaign misunderstands how the web – and Google – actually work.

Your site doesn’t exist in a vacuum. Search results are volatile. …Even the weather can influence click-through rates.

Trying to isolate the impact of a single change in that chaos isn’t scientific. It’s theatre.

…A/B testing, as it’s traditionally understood, doesn’t even cleanly work in SEO.

…most SEO A/B testing isn’t remotely scientific. It’s just a best-effort simulation, riddled with assumptions and susceptible to confounding variables. Even the cleanest tests can only hint at causality – and only in narrowly defined environments.”

Jono makes a valid point about the unreliability of tests where the inputs and the outputs are not fully controlled.

Statistical tests are generally done within a closed system where all the data being compared follow the same rules and patterns. But if you compare multiple sets of pages, where some pages target long-tail phrases and others target high-volume queries, then the pages will differ in their potential outcomes. External changes (daily traffic fluctuation, users clicking on the search results) aren’t controllable. As Jono suggested, even the weather can influence click rates.

Although Williams-Cook asserted that he had a control group for testing purposes, it’s extremely difficult to isolate a single variable on live websites due to the uncontrollable external factors as Jono points out.

So, even though Williams-Cook asserts that the 3% change he noted is consistent and statistically relevant, the unobservable factors within Google’s black box algorithm that determines the outcome makes it difficult to treat that result as a reliable causal finding in the way one could with a truly controlled and observable statistical testing method.

If it’s not possible to isolate one change then it’s very difficult to make reliable claims about the resulting SEO test results.

Focus On Meaningful SEO Improvements

Jono’s article calls out the shortcomings of SEO tests but the point of his essay is to call attention to how focusing on what can be tested and measured can become prioritized over the “meaningful” changes that should be made but aren’t because they cannot be measured. He argues that it’s important to focus on the things that matter in today’s search environment that are related to content and a better user experience.

And that’s where we circle back to Williams-Cook because although statistically valid A/B SEO tests may be “theatre” as Jono suggests, it doesn’t mean that Williams-Cook’s suggestion is wrong. He may actually may be correct that it’s better to omit the meta description and let Google rewrite them.

SEO is subjective which means what’s good for one might not be a priority for someone else. So the question remains, is removing all meta descriptions a meaningful change?

Featured Image by Shutterstock/baranq

Cloudflare Sparks SEO Debate With New AI Crawler Payment System via @sejournal, @MattGSouthern

Cloudflare’s new “pay per crawl” initiative has sparked a debate among SEO professionals and digital marketers.

The company has introduced a default AI crawler-blocking system alongside new monetization options for publishers.

This enables publishers to charge AI companies for access, which could impact how web content is consumed and valued in the age of generative search.

Cloudflare’s New Default: Block AI Crawlers

The system, now in private beta, blocks known AI crawlers by default for new Cloudflare domains.

Publishers can choose one of three access settings for each crawler:

  1. Allow – Grant unrestricted access
  2. Charge – Require payment at the configured, domain-wide price
  3. Block – Deny access entirely

      Crawlers that attempt to access blocked content will receive a 402 Payment Required response. Publishers set a flat, sitewide price per request, and Cloudflare handles billing and revenue distribution.

      Cloudflare wrote:

      “Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho — and then giving that agent a budget to spend to acquire the best and most relevant content.

      Technical Details & Publisher Adoption

      The system integrates directly with Cloudflare’s bot management tools and works alongside existing WAF rules and robots.txt files. Authentication is handled using Ed25519 key pairs and HTTP message signatures to prevent spoofing.

      Cloudflare says early adopters include major publishers like Condé Nast, Time, The Atlantic, AP, BuzzFeed, Reddit, Pinterest, Quora, and others.

      While the current setup supports only flat pricing, the company plans to explore dynamic and granular pricing models in future iterations.

      SEO Community Shares Concerns

      While Cloudflare’s new controls can be changed manually, several SEO experts are concerned about the impact of making the system opt-out rather than opt-in.

      “This won’t end well,” wrote Duane Forrester, Vice President of Industry Insights at Yext, warning that businesses may struggle to appear in AI-powered answers without realizing crawler access is being blocked unless a fee is paid.

      Lily Ray, Vice President of SEO Strategy and Research at Amsive Digital, noted the change is likely to spark urgent conversations with clients, especially those unaware that their sites might now be invisible to AI crawlers by default.

      Ryan Jones, Senior Vice President of SEO at Razorfish, expressed that most of his client sites actually want AI crawlers to access their content for visibility reasons.

      Some Say It’s a Necessary Reset

      Some in the community welcome the move as a long-overdue rebalancing of content economics.

      “A force is needed to tilt the balance back to where it once was,” said Pedro Dias, Technical SEO Consultant and former member of Google’s Search Quality team. He suggests that the current dynamic favors AI companies at the expense of publishers.

      Ilya Grigorik, Distinguished Engineer and Technical Advisor at Shopify, praised the use of cryptographic authentication, saying it’s “much needed” given how difficult it is to distinguish between legitimate and malicious bots.

      Under the new system, crawlers must authenticate using public key cryptography and declare payment intent via custom HTTP headers.

      Looking Ahead

      Cloudflare’s pay-per-crawl system formalizes a new layer of negotiation over who gets to access web content, and at what cost.

      For SEO pros, this adds complexity: visibility may now depend not just on ranking, but on crawler access settings, payment policies, and bot authentication.

      While some see this as empowering publishers, others warn it could fragment the open web, where content access varies based on infrastructure and paywalls.

      If generative AI becomes a core part of how people search, and the pipes feeding that AI are now toll roads, websites will need to manage visibility across a growing patchwork of systems, policies, and financial models.


      Featured Image: Roman Samborskyi/Shutterstock

      YouTube Adds New Viewer Metrics To Track Audience Loyalty via @sejournal, @MattGSouthern

      YouTube is rolling out a new audience analytics feature that replaces the “returning viewers” metric with more detailed viewer categories.

      The update introduces three viewer types: new, casual, and regular. This is designed to help creators better understand who’s engaging with their content and how often.

      Breaking Down The New Viewer Categories

      YouTube now segments viewers into:

      • New viewers: People watching your content for the first time within the selected time period.
      • Casual viewers: Those who’ve watched between one and five months out of the past year.
      • Regular viewers: Viewers who have returned consistently for six or more months over the past 12 months.
      Screenshot from: YouTube.com/CreatorInsider, July 2025.

      In an announcment, YouTube clarifies:

      “These new categories provide a more nuanced understanding of viewer engagement and are not a direct equivalent of the previous returning viewers metric.”

      There are no changes to the definition of new viewers. The new segmentation applies across all video formats, including Shorts, VOD, and livestreams.

      What This Means

      The switch to more granular segmentation addresses a long-standing limitation in YouTube’s analytics.

      Previously, creators could only distinguish between new and returning viewers. That was a binary distinction that didn’t capture the full range of audience engagement.

      Now, with casual and regular viewer categories, creators can identify which viewers are sporadically engaged versus those who form a loyal base.

      YouTube cautioned that many channels may see a smaller percentage of regular viewers than expected, stating:

      “Regular viewers is a high bar to reach as it signifies viewers who have consistently returned to watch your content for 6 months or more in the past year.”

      Strategies For Building A Loyal Audience

      YouTube suggests that maintaining a strong base of regular viewers requires consistent publishing and community engagement.

      The platform recommends the following tactics:

      • Use community posts to stay visible between uploads
      • Respond to viewer comments
      • Host live premieres and join live chats
      • Maintain brand consistency across videos

      These strategies reflect broader trends in the creator economy, where sustained engagement is becoming more valuable than viral reach.

      Looking Ahead

      The new segmentation is now rolling out globally on both desktop and mobile, with availability expanding to all creators in the coming weeks.

      For marketers and brands, the added granularity offers a clearer picture of a creator’s influence and audience loyalty.

      As YouTube continues refining its analytics tools, the emphasis is shifting from raw numbers to actionable insights that help creators grow sustainable channels.


      Featured Image: Roman Samborskyi/Shutterstock

      LLM Visibility Tools: Do SEOs Agree On How To Use Them? via @sejournal, @martinibuster

      A discussion on LinkedIn about LLM visibility and the tools for tracking it explored how SEOs are approaching optimization for LLM-based search. The answers provided suggest that tools for LLM-focused SEO are gaining maturity, though there is some disagreement about what exactly should be tracked.

      Joe Hall (LinkedIn profile) raised a series of questions on LinkedIn about the usefulness of tools that track LLM visibility. He didn’t explicitly say that the tools lacked utility, but his questions appeared intended to open a conversation

      He wrote:

      “I don’t understand how these systems that claim to track LLM visibility work. LLM responses are highly subjective to context. They are not static like traditional SERPs are. Even if you could track them, how can you reasonably connect performance to business objectives? How can you do forecasting, or even build a strategy with that data? I understand the value of it from a superficial level, but it doesn’t really seem good for anything other than selling a service to consultants that don’t really know what they are doing.”

      Joshua Levenson (LinkedIn profile) else answered saying that today’s SEO tools are out of date, remarking:

      “People are using the old paradigm to measure a new tech.”

      Joe Hall responded with “Bingo!”

      LLM SEO: “Not As Easy As Add This Keyword”

      Lily Ray (LinkedIn profile) responded to say that the entities that LLMs fall back on are a key element to focus on.

      She explained:

      “If you ask an LLM the same question thousands of times per day, you’ll be able to average the entities it mentions in its responses. And then repeat that every day. It’s not perfect but it’s something.”

      Hall asked her how that’s helpful to clients and Lily answered:

      “Well, there are plenty of actionable recommendations that can be gleaned from the data. But that’s obviously the hard part. It’s not as easy as “add this keyword to your title tag.”

      Tools For LLM SEO

      Dixon Jones (LinkedIn profile) responded with a brief comment to introduce Waikay, which stands for What AI Knows About You. He said that his tool uses entity and topic extraction, and bases its recommendations and actions on gap analysis.

      Ryan Jones (LinkedIn profile) responded to discuss how his product SERPRecon works:

      “There’s 2 ways to do it. one – the way I’m doing it on SERPrecon is to use the APIs to monitor responses to the queries and then like LIly said, extract the entities, topics, etc from it. this is the cheaper/easier way but is easiest to focus on what you care about. The focus isn’t on the exact wording but the topics and themes it keeps mentioning – so you can go optimize for those.

      The other way is to monitor ISP data and see how many real user queries you actually showed up for. This is super expensive.

      Any other method doesn’t make much sense.”

      And in another post followed up with more information:

      “AI doesn’t tell you how it fanned out or what other queries it did. people keep finding clever ways in the network tab of chrome to see it, but they keep changing it just as fast.

      The AI Overview tool in my tool tries to reverse engineer them using the same logic/math as their patents, but it can never be 100%.”

      Then he explained how it helps clients:

      “It helps us in the context of, if I enter 25 queries I want to see who IS showing up there, and what topics they’re mentioning so that I can try to make sure I’m showing up there if I’m not. That’s about it. The people measuring sentiment of the AI responses annoy the hell out of me.”

      Ten Blue Links Were Never Static

      Although Hall stated that the “traditional” search results were static, in contrast to LLM-based search results, it must be pointed out that the old search results were in a constant state of change, especially after the Hummingbird update which enabled Google to add fresh search results when the query required it or when new or updated web pages were introduced to the web. Also, the traditional search results tended to have more than one intent, often as many as three, resulting in fluctuations in what’s ranking.

      LLMs also show diversity in their search results but, in the case of AI Overviews, Google shows a few results that for the query and then does the “fan-out” thing to anticipate follow-up questions that naturally follow as part of discovering a topic.

      Billy Peery (LinkedIn profile) offered an interesting insight into LLM search results, suggesting that the output exhibits a degree of stability and isn’t as volatile as commonly believed.

      He offered this truly interesting insight:

      “I guess I disagree with the idea that the SERPs were ever static.

      With LLMs, we’re able to better understand which sources they’re pulling from to answer questions. So, even if the specific words change, the model’s likelihood of pulling from sources and mentioning brands is significantly more static.

      I think the people who are saying that LLMs are too volatile for optimization are too focused on the exact wording, as opposed to the sources and brand mentions.”

      Peery makes an excellent point by noting that some SEOs may be getting hung up on the exact keyword matching (“exact wording”) and that perhaps the more important thing to focus on is whether the LLM is linking to and mentioning specific websites and brands.

      Takeaway

      Awareness of LLM tools for tracking visibility is growing. Marketers are reaching some agreement on what should be tracked and how it benefits clients. While some question the strategic value of these tools, others use them to identify which brands and themes are mentioned, adding that data to their SEO mix.

      Featured Image by Shutterstock/TierneyMJ

      Study: Google AI Mode Shows 91% URL Change Across Repeat Searches via @sejournal, @MattGSouthern

      A new study analyzing 10,000 keywords reveals that Google’s AI Mode delivers inconsistent results.

      The research also shows minimal overlap between AI Mode sources and traditional organic search rankings.

      Published by SE Ranking, the study examines how AI Mode performs in comparison to Google’s AI Overviews and the top 10 organic search results.

      “The average overlap of exact URLs between the three datasets was just 9.2%,” the study notes, illustrating the volatility.

      Highlights From The Study

      AI Mode Frequently Pulls Different Results

      To test consistency, researchers ran the same 10,000 keywords through AI Mode three times on the same day. The results varied most of the time.

      In 21.2% of cases, there were no overlapping URLs at all between the three sets of responses.

      Domain-level consistency was slightly higher, at 14.7%, indicating AI Mode may cite different pages from the same websites.

      Minimal Overlap With Organic Results

      Only 14% of URLs in AI Mode responses matched the top 10 organic search results for the same queries. When looking at domain-level matches, overlap increased to 21.9%.

      In 17.9% of queries, AI Mode provided zero overlap with organic URLs, suggesting its selections could be independent of Google’s ranking algorithms.

      Most Links Come From Trusted Domains

      On average, each AI Mode response contains 12.6 citations.

      The most common format is block links (90.8%), followed by in-text links (8.9%) and AIM SERP-style links (0.3%), which resemble traditional search engine results pages (SERPs).

      Despite the volatility, some domains consistently appeared across all tests. The top-cited sites were:

      1. Indeed (1.8%)
      2. Wikipedia (1.6%)
      3. Reddit (1.5%)
      4. YouTube (1.4%)
      5. NerdWallet (1.2%)

      Google properties were cited most frequently, accounting for 5.7% of all links. These were mostly Google Maps business profiles.

      Differences From AI Overviews

      Comparing AI Mode to AI Overviews, researchers found an average URL overlap of just 10.7%, with domain overlap at 16%.

      This suggests the two systems operate under different logic despite both being AI-driven.

      What This Means For Search Marketers

      The high volatility of AI Mode results presents new challenges and new opportunities.

      Because results can vary even for identical queries, tracking visibility is more complex.

      However, this fluidity also creates more openings for exposure. Unlike traditional search results, where a small set of top-ranking pages often dominate, AI Mode appears to refresh its citations frequently.

      That means publishers with relevant, high-quality content may have a better chance of appearing in AI Mode answers, even if they’re not in the organic top 10.

      To adapt to this environment, SEOs and content creators should consider:

      • Prioritizing domain-wide authority and topical relevance
      • Diversifying content across trusted platforms
      • Optimizing local presence through tools like Google Maps
      • Monitoring evolving inclusion patterns as AI Mode develops

      For more, see the full study from SE Ranking.


      Featured Image: Roman Samborskyi/Shutterstock

      Google’s John Mueller: Core Updates Build On Long-Term Data via @sejournal, @MattGSouthern

      Google Search Advocate John Mueller says core updates rely on longer-term patterns rather than recent site changes or link spam attacks.

      The comment was made during a public discussion on Bluesky, where SEO professionals debated whether a recent wave of spammy backlinks could impact rankings during a core update.

      Mueller’s comment offers timely clarification as Google rolls out its June core update.

      Core Updates Aren’t Influenced By Recent Links

      Asked directly whether recent link spam would be factored into core update evaluations, Mueller said:

      “Off-hand, I can’t think of how these links would play a role with the core updates. It’s possible there’s some interaction that I’m not aware of, but it seems really unlikely to me.

      Also, core updates generally build on longer-term data, so something really recent wouldn’t play a role.”

      For those concerned about negative SEO tactics, Mueller’s statement suggests recent spam links are unlikely to affect how Google evaluates a site during a core update.

      Link Spam & Visibility Concerns

      The conversation began with SEO consultant Martin McGarry, who shared traffic data suggesting spam attacks were impacting sites targeting high-value keywords.

      In a post linking to a recent SEJ article, McGarry wrote:

      “This is traffic up in a high value keyword and the blue line is spammers attacking it… as you can see traffic disappears as clear as day.”

      Mark Williams-Cook responded by referencing earlier commentary from a Google representative at the SEOFOMO event, where it was suggested that in most cases, links were not the root cause of visibility loss, even when the timing seemed suspicious.

      This aligns with a broader theme in recent SEO discussions: it’s often difficult to prove that link-based attacks are directly responsible for ranking drops, especially during major algorithm updates.

      Google’s Position On The Disavow Tool

      As the discussion turned to mitigation strategies, Mueller reminded the community that Google’s disavow tool remains available, though it’s not always necessary.

      Mueller said:

      “You can also use the domain: directive in the disavow file to cover a whole TLD, if you’re +/- certain that there are no good links for your site there.”

      He added that the tool is often misunderstood or overused:

      “It’s a tool that does what it says; almost nobody needs it, but if you think your case is exceptional, feel free.

      Pushing it as a service to everyone says a bit about the SEO though.”

      That final remark drew pushback from McGarry, who clarified that he doesn’t sell cleanup services and only uses the disavow tool in carefully reviewed edge cases.

      Community Calls For More Transparency

      Alan Bleiweiss joined the conversation by calling for Google to share more data about how many domains are already ignored algorithmically:

      “That would be the best way to put site owners at ease, I think. There’s a psychology to all this cat & mouse wording without backing it up with data.”

      His comment reflects a broader sentiment. Many professionals still feel in the dark about how Google handles potentially manipulative or low-quality links at scale.

      What This Means

      Mueller’s comments offer guidance for anyone evaluating ranking changes during a core update:

      • Recent link spam is unlikely to influence a core update.
      • Core updates are based on long-term patterns, not short-term changes.
      • The disavow tool is still available but rarely needed in most cases.
      • Google’s systems may already discount low-quality links automatically.

      If your site has seen changes in visibility since the start of the June core update, these insights suggest looking beyond recent link activity. Instead, focus on broader, long-term signals, such as content quality, site structure, and overall trust.

      Google’s Trust Ranking Patent Shows How User Behavior Is A Signal via @sejournal, @martinibuster

      Google long ago filed a patent for ranking search results by trust. The groundbreaking idea behind the patent is that user behavior can be used as a starting point for developing a ranking signal.

      The big idea behind the patent is that the Internet is full of websites all linking to and commenting about each other. But which sites are trustworthy? Google’s solution is to utilize user behavior to indicate which sites are trusted and then use the linking and content on those sites to reveal more sites that are trustworthy for any given topic.

      PageRank is basically the same thing only it begins and ends with one website linking to another website. The innovation of Google’s trust ranking patent is to put the user at the start of that trust chain like this:

      User trusts X Websites > X Websites trust Other Sites > This feeds into Google as a ranking signal

      The trust originates from the user and flows to trust sites that themselves provide anchor text, lists of other sites and commentary about other sites.

      That, in a nutshell, is what Google’s trust-based ranking algorithm is about.

      The deeper insight is that it reveals Google’s groundbreaking approach to letting users be a signal of what’s trustworthy. You know how Google keeps saying to create websites for users? This is what the trust patent is all about, putting the user in the front seat of the ranking algorithm.

      Google’s Trust And Ranking Patent

      The patent was coincidentally filed around the same period that Yahoo and Stanford University published a Trust Rank research paper which is focused on identifying spam pages.

      Google’s patent is not about finding spam. It’s focused on doing the opposite, identifying trustworthy web pages that satisfy the user’s intent for a search query.

      How Trust Factors Are Used

      The first part of any patent consists of an Abstract section that offers a very general description of the invention that that’s what this patent does as well.

      The patent abstract asserts:

      • That trust factors are used to rank web pages.
      • The trust factors are generated from “entities” (which are later described to be the users themselves, experts, expert web pages, and forum members) that link to or comment about other web pages).
      • Those trust factors are then used to re-rank web pages.
      • Re-ranking web pages kicks in after the normal ranking algorithm has done its thing with links, etc.

      Here’s what the Abstract says:

      “A search engine system provides search results that are ranked according to a measure of the trust associated with entities that have provided labels for the documents in the search results.

      A search engine receives a query and selects documents relevant to the query.

      The search engine also determines labels associated with selected documents, and the trust ranks of the entities that provided the labels.

      The trust ranks are used to determine trust factors for the respective documents. The trust factors are used to adjust information retrieval scores of the documents. The search results are then ranked based on the adjusted information retrieval scores.”

      As you can see, the Abstract does not say who the “entities” are nor does it say what the labels are yet, but it will.

      Field Of The Invention

      The next part is called the Field Of The Invention. The purpose is to describe the technical domain of the invention (which is information retrieval) and the focus (trust relationships between users) for the purpose of ranking web pages.

      Here’s what it says:

      “The present invention relates to search engines, and more specifically to search engines that use information indicative of trust relationship between users to rank search results.”

      Now we move on to the next section, the Background, which describes the problem this invention solves.

      Background Of The Invention

      This section describes why search engines fall short of answering user queries (the problem) and why the invention solves the problem.

      The main problems described are:

      • Search engines are essentially guessing (inference) what the user’s intent is when they only use the search query.
      • Users rely on expert-labeled content from trusted sites (called vertical knowledge sites) to tell them which web pages are trustworthy
      • Explains why the content labeled as relevant or trustworthy is important but ignored by search engines.
      • It’s important to remember that this patent came out before the BERT algorithm and other natural language approaches that are now used to better understand search queries.

      This is how the patent explains it:

      “An inherent problem in the design of search engines is that the relevance of search results to a particular user depends on factors that are highly dependent on the user’s intent in conducting the search—that is why they are conducting the search—as well as the user’s circumstances, the facts pertaining to the user’s information need.

      Thus, given the same query by two different users, a given set of search results can be relevant to one user and irrelevant to another, entirely because of the different intent and information needs.”

      Next it goes on to explain that users trust certain websites that provide information about certain topics:

      “…In part because of the inability of contemporary search engines to consistently find information that satisfies the user’s information need, and not merely the user’s query terms, users frequently turn to websites that offer additional analysis or understanding of content available on the Internet.”

      Websites Are The Entities

      The rest of the Background section names forums, review sites, blogs, and news websites as places that users turn to for their information needs, calling them vertical knowledge sites. Vertical Knowledge sites, it’s explained later, can be any kind of website.

      The patent explains that trust is why users turn to those sites:

      “This degree of trust is valuable to users as a way of evaluating the often bewildering array of information that is available on the Internet.”

      To recap, the “Background” section explains that the trust relationships between users and entities like forums, review sites, and blogs can be used to influence the ranking of search results. As we go deeper into the patent we’ll see that the entities are not limited to the above kinds of sites, they can be any kind of site.

      Patent Summary Section

      This part of the patent is interesting because it brings together all of the concepts into one place, but in a general high-level manner, and throws in some legal paragraphs that explain that the patent can apply to a wider scope than is set out in the patent.

      The Summary section appears to have four sections:

      • The first section explains that a search engine ranks web pages that are trusted by entities (like forums, news sites, blogs, etc.) and that the system maintains information about these labels about trusted web pages.
      • The second section offers a general description of the work of the entities (like forums, news sites, blogs, etc.).
      • The third offers a general description of how the system works, beginning with the query, the assorted hand waving that goes on at the search engine with regard to the entity labels, and then the search results.
      • The fourth part is a legal explanation that the patent is not limited to the descriptions and that the invention applies to a wider scope. This is important. It enables Google to use a non-existent thing, even something as nutty as a “trust button” that a user selects to identify a site as being trustworthy as an example. This enables an example like a non-existent “trust button” to be a stand-in for something else, like navigational queries or Navboost or anything else that is a signal that a user trusts a website.

      Here’s a nutshell explanation of how the system works:

      • The user visits sites that they trust and click a “trust button” that tells the search engine that this is a trusted site.
      • The trusted site “labels” other sites as trusted for certain topics (the label could be a topic like “symptoms”).
      • A user asks a question at a search engine (a query) and uses a label (like “symptoms”).
      • The search engine ranks websites according to the usual manner then it looks for sites that users trust and sees if any of those sites have used labels about other sites.
      • Google ranks those other sites that have had labels assigned to them by the trusted sites.

      Here’s an abbreviated version of the third part of the Summary that gives an idea of the inner workings of the invention:

      “A user provides a query to the system…The system retrieves a set of search results… The system determines which query labels are applicable to which of the search result documents. … determines for each document an overall trust factor to apply… adjusts the …retrieval score… and reranks the results.”

      Here’s that same section in its entirety:

      • “A user provides a query to the system; the query contains at least one query term and optionally includes one or more labels of interest to the user.
      • The system retrieves a set of search results comprising documents that are relevant to the query term(s).
      • The system determines which query labels are applicable to which of the search result documents.
      • The system determines for each document an overall trust factor to apply to the document based on the trust ranks of those entities that provided the labels that match the query labels.
      • Applying the trust factor to the document adjusts the document’s information retrieval score, to provide a trust adjusted information retrieval score.
      • The system reranks the search result documents based at on the trust adjusted information retrieval scores.”

      The above is a general description of the invention.

      The next section, called Detailed Description, deep dives into the details. At this point it’s becoming increasingly evident that the patent is highly nuanced and can not be reduced to simple advice similar to: “optimize your site like this to earn trust.”

      A large part of the patent hinges on a trust button and an advanced search query:  label:

      Neither the trust button or the label advanced search query have ever existed. As you’ll see, they are quite probably stand-ins for techniques that Google doesn’t want to explicitly reveal.

      Detailed Description In Four Parts

      The details of this patent are located in four sections within the Detailed Description section of the patent. This patent is not as simple as 99% of SEOs say it is.

      These are the four sections:

      1. System Overview
      2. Obtaining and Storing Trust Information
      3. Obtaining and Storing Label Information
      4. Generated Trust Ranked Search Results

      The System Overview is where the patent deep dives into the specifics. The following is an overview to make it easy to understand.

      System Overview

      1. Explains how the invention (a search engine system) ranks search results based on trust relationships between users and the user-trusted entities who label web content.

      2. The patent describes a “trust button” that a user can click that tells Google that a user trusts a website or trusts the website for a specific topic or topics.

      3. The patent says a trust related score is assigned to a website when a user clicks a trust button on a website.

      4. The trust button information is stored in a trust database that’s referred to as #190.

      Here’s what it says about assigning a trust rank score based on the trust button:

      “The trust information provided by the users with respect to others is used to determine a trust rank for each user, which is measure of the overall degree of trust that users have in the particular entity.”

      Trust Rank Button

      The patent refers to the “trust rank” of the user-trusted websites. That trust rank is based on a trust button that a user clicks to indicate that they trust a given website, assigning a trust rank score.

      The patent says:

      “…the user can click on a “trust button” on a web page belonging to the entity, which causes a corresponding record for a trust relationship to be recorded in the trust database 190.

      In general any type of input from the user indicating that such as trust relationship exists can be used.”

      The trust button has never existed and the patent quietly acknowledges this by stating that any type of input can be used to indicate the trust relationship.

      So what is it? I believe that the “trust button” is a stand-in for user behavior metrics in general, and site visitor data in particular. The patent Claims section does not mention trust buttons at all but does mention user visitor data as an indicator of trust.

      Here are several passages that mention site visits as a way to understand if a user trusts a website:

      “The system can also examine web visitation patterns of the user and can infer from the web visitation patterns which entities the user trusts. For example, the system can infer that a particular user trust a particular entity when the user visits the entity’s web page with a certain frequency.”

      The same thing is stated in the Claims section of the patent, it’s the very first claim they make for the invention:

      “A method performed by data processing apparatus, the method comprising:
      determining, based on web visitation patterns of a user, one or more trust relationships indicating that the user trusts one or more entities;”

      It may very well be that site visitation patterns and other user behaviors are what is meant by the “trust button” references.

      Labels Generated By Trusted Sites

      The patent defines trusted entities as news sites, blogs, forums, and review sites, but not limited to those kinds of sites, it could be any other kind of website.

      Trusted websites create references to other sites and in that reference they label those other sites as being relevant to a particular topic. That label could be an anchor text. But it could be something else.

      The patent explicitly mentions anchor text only once:

      “In some cases, an entity may simply create a link from its site to a particular item of web content (e.g., a document) and provide a label 107 as the anchor text of the link.”

      Although it only explicitly mentions anchor text once, there are other passages where it anchor text is strongly implied, for example, the patent offers a general description of labels as describing or categorizing the content found on another site:

      “…labels are words, phrases, markers or other indicia that have been associated with certain web content (pages, sites, documents, media, etc.) by others as descriptive or categorical identifiers.”

      Labels And Annotations

      Trusted sites link out to web pages with labels and links. The combination of a label and a link is called an annotation.

      This is how it’s described:

      “An annotation 106 includes a label 107 and a URL pattern associated with the label; the URL pattern can be specific to an individual web page or to any portion of a web site or pages therein.”

      Labels Used In Search Queries

      Users can also search with “labels” in their queries by using a non-existent “label:” advanced search query. Those kinds of queries are then used to match the labels that a website page is associated with.

      This is how it’s explained:

      “For example, a query “cancer label:symptoms” includes the query term “cancel” and a query label “symptoms”, and thus is a request for documents relevant to cancer, and that have been labeled as relating to “symptoms.”

      Labels such as these can be associated with documents from any entity, whether the entity created the document, or is a third party. The entity that has labeled a document has some degree of trust, as further described below.”

      What is that label in the search query? It could simply be certain descriptive keywords, but there aren’t any clues to speculate further than that.

      The patent puts it all together like this:

      “Using the annotation information and trust information from the trust database 190, the search engine 180 determines a trust factor for each document.”

      Takeaway:

      A user’s trust is in a website. That user-trusted website is not necessarily the one that’s ranked, it’s the website that’s linking/trusting another relevant web page. The web page that is ranked can be the one that the trusted site has labeled as relevant for a specific topic and it could be a web page in the trusted site itself. The purpose of the user signals is to provide a starting point, so to speak, from which to identify trustworthy sites.

      Experts Are Trusted

      Vertical Knowledge Sites, sites that users trust, can host the commentary of experts. The expert could be the publisher of the trusted site as well. Experts are important because links from expert sites are used as part of the ranking process.

      Experts are defined as publishing a deep level of content on the topic:

      “These and other vertical knowledge sites may also host the analysis and comments of experts or others with knowledge, expertise, or a point of view in particular fields, who again can comment on content found on the Internet.

      For example, a website operated by a digital camera expert and devoted to digital cameras typically includes product reviews, guidance on how to purchase a digital camera, as well as links to camera manufacturer’s sites, new products announcements, technical articles, additional reviews, or other sources of content.

      To assist the user, the expert may include comments on the linked content, such as labeling a particular technical article as “expert level,” or a particular review as “negative professional review,” or a new product announcement as ;new 10MP digital SLR’.”

      Links From Expert Sites

      Links and annotations from user-trusted expert sites are described as sources of trust information:

      “For example, Expert may create an annotation 106 including the label 107 “Professional review” for a review 114 of Canon digital SLR camera on a web site “www.digitalcameraworld.com”, a label 107 of “Jazz music” for a CD 115 on the site “www.jazzworld.com”, a label 107 of “Classic Drama” for the movie 116 “North by Northwest” listed on website “www.movierental.com”, and a label 107 of “Symptoms” for a group of pages describing the symptoms of colon cancer on a website 117 “www.yourhealth.com”.

      Note that labels 107 can also include numerical values (not shown), indicating a rating or degree of significance that the entity attaches to the labeled document.

      Expert’s web site 105 can also include trust information. More specifically, Expert’s web site 105 can include a trust list 109 of entities whom Expert trusts. This list may be in the form of a list of entity names, the URLs of such entities’ web pages, or by other identifying information. Expert’s web site 105 may also include a vanity list 111 listing entities who trust Expert; again this may be in the form of a list of entity names, URLs, or other identifying information.”

      Inferred Trust

      The patent describes additional signals that can be used to signal (infer) trust. These are more traditional type signals like links, a list of trusted web pages (maybe a resources page?) and a list of sites that trust the website.

      These are the inferred trust signals:

      “(1) links from the user’s web page to web pages belonging to trusted entities;
      (2) a trust list that identifies entities that the user trusts; or
      (3) a vanity list which identifies users who trust the owner of the vanity page.”

      Another kind of trust signal that can be inferred is from identifying sites that a user tends to visit.

      The patent explains:

      “The system can also examine web visitation patterns of the user and can infer from the web visitation patterns which entities the user trusts. For example, the system can infer that a particular user trusts a particular entity when the user visits the entity’s web page with a certain frequency.”

      Takeaway:

      That’s a pretty big signal and I believe that it suggests that promotional activities that encourage potential site visitors to discover a site and then become loyal site visitors can be helpful. For example, that kind of signal can be tracked with branded search queries. It could be that Google is only looking at site visit information but I think that branded queries are an equally trustworthy signal, especially when those queries are accompanied by labels… ding, ding, ding!

      The patent also lists some kind of out there examples of inferred trust like contact/chat list data. It doesn’t say social media, just contact/chat lists.

      Trust Can Decay or Increase

      Another interesting feature of trust rank is that it can decay or increase over time.

      The patent is straightforward about this part:

      “Note that trust relationships can change. For example, the system can increase (or decrease) the strength of a trust relationship for a trusted entity. The search engine system 100 can also cause the strength of a trust relationship to decay over time if the trust relationship is not affirmed by the user, for example by visiting the entity’s web site and activating the trust button 112.”

      Trust Relationship Editor User Interface

      Directly after the above paragraph is a section about enabling users to edit their trust relationships through a user interface. There has never been such a thing, just like the non-existent trust button.

      This is possibly a stand-in for something else. Could this trusted sites dashboard be Chrome browser bookmarks or sites that are followed in Discover? This is a matter for speculation.

      Here’s what the patent says:

      “The search engine system 100 may also expose a user interface to the trust database 190 by which the user can edit the user trust relationships, including adding or removing trust relationships with selected entities.

      The trust information in the trust database 190 is also periodically updated by crawling of web sites, including sites of entities with trust information (e.g., trust lists, vanity lists); trust ranks are recomputed based on the updated trust information.”

      What Google’s Trust Patent Is About

      Google’s Search Result Ranking Based On Trust patent describes a way of leveraging user-behavior signals to understand which sites are trustworthy. The system then identifies sites that are trusted by the user-trusted sites and uses that information as a ranking signal. There is no actual trust rank metric, but there are ranking signals related to what users trust. Those signals can decay or increase based on factors like whether a user still visits those sites.

      The larger takeaway is that this patent is an example of how Google is focused on user signals as a ranking source, so that they can feed that back into ranking sites that meet their needs. This means that instead of doing things because “this is what Google likes,” it’s better to go even deeper and do things because users like it. That will feed back to Google through these kinds of algorithms that measure user behavior patterns, something we all know Google uses.

      Featured Image by Shutterstock/samsulalam

      Google: Many Top Sites Have Invalid HTML And Still Rank via @sejournal, @MattGSouthern

      A recent discussion on Google’s Search Off the Record podcast challenges long-held assumptions about technical SEO, revealing that most top-ranking websites don’t use valid HTML.

      Despite these imperfections, they continue to rank well in search results.

      Search Advocate John Mueller and Developer Relations Engineer Martin Splitt referenced a study by former Google webmaster Jens Meiert, which found that only one homepage among the top 200 websites passed HTML validation tests.

      Mueller highlighted:

      “0.5% of the top 200 websites have valid HTML on their homepage. One site had valid HTML. That’s it.”

      He described the result as “crazy,” noting that the study surprised even developers who take pride in clean code.

      Mueller added:

      “Search engines have to deal with whatever broken HTML is out there. It doesn’t have to be perfect, it’ll still work.”

      When HTML Errors Matter

      While most HTML issues are tolerated, certain technical elements, such as metadata, must be correctly implemented.

      Splitt said:

      “If something is written in a way that isn’t HTML compliant, then the browser will make assumptions.”

      That usually works fine for visible content, but can fail “catastrophically” when it comes to elements that search engines rely on.

      Mueller said:

      “If [metadata] breaks, then it’s probably not going to do anything in your favor.”

      SEO Is Not A Technical Checklist

      Google also challenged the notion that SEO is a box-ticking exercise for developers.

      Mueller said:

      “Sometimes SEO is also not so much about purely technical things that you do, but also kind of a mindset.”

      Splitt said:

      “Am I using the terminology that my potential customers would use? And do I have the answers to the things that they will ask?”

      Naming things appropriately, he said, is one of the most overlooked SEO skills and often more important than technical precision.

      Core Web Vitals and JavaScript

      Two recurring sources of confusion, Core Web Vitals and JavaScript, were also addressed.

      Core Web Vitals

      The podcast hosts reiterated that good Core Web Vitals scores don’t guarantee better rankings.

      Mueller said:

      “Core Web Vitals is not the solution to everything.”

      Mueller added:

      “Developers love scores… it feels like ‘oh I should like maybe go from 85 to 87 and then I will rank first,’ but there’s a lot more involved.”

      JavaScript

      On the topic of JavaScript, Splitt said that while Google can process it, implementation still matters.

      Splitt said:

      “If the content that you care about is showing up in the rendered HTML, you’ll be fine generally speaking.”

      Splitt added:

      “Use JavaScript responsibly and don’t use it for everything.”

      Misuse can still create problems for indexing and rendering, especially if assumptions are made without testing.

      What This Means

      The key takeaway from the podcast is that technical perfection isn’t 100% necessary for SEO success.

      While critical elements like metadata must function correctly, the vast majority of HTML validation errors won’t prevent ranking.

      As a result, developers and marketers should be cautious about overinvesting in code validation at the expense of content quality and search intent alignment.

      Listen to the full podcast episode below: