Google Patent On Using Contextual Signals Beyond Query Semantics via @sejournal, @martinibuster

A patent recently filed by Google outlines how an AI assistant may use at least five real-world contextual signals, including identifying related intents, to influence answers and generate natural dialog. It’s an example of how AI-assisted search modifies responses to engage users with contextually relevant questions and dialog, expanding beyond keyword-based systems.

The patent describes a system that generates relevant dialog and answers using signals such as environmental context, dialog intent, user data, and conversation history. These factors go beyond using the semantic data in the user’s query and show how AI-assisted search is moving toward more natural, human-like interactions.

In general, the purpose of filing a patent is to obtain legal protection and exclusivity for an invention and the act of filing doesn’t indicate that Google is actually using it.

The patent uses examples of spoken dialog but it also states the invention is not limited to audio input:

“Notably, during a given dialog session, a user can interact with the automated assistant using various input modalities, including, but not limited to, spoken input, typed input, and/or touch input.”

The name of the patent is, Using Large Language Model(s) In Generating Automated Assistant response(s). The patent applies to a wide range of AI assistants that receive inputs via the context of typed, touch, and speech.

There are five factors that influence the LLM modified responses:

  1. Time, Location, And Environmental Context
  2. User-Specific Context
  3. Dialog Intent & Prior Interactions
  4.  Inputs (text, touch, and speech)
  5. System & Device Context

The first four factors influence the answers that the automated assistant provides and the fifth one determines whether to turn off the LLM-assisted part and revert to standard AI answers.

Time, Location, And Environmental

There are three contextual factors: time, location and environmental that provide contexts that are not existent in keywords and influence how the AI assistant responds. While these contextual factors, as described in the patent, aren’t strictly related to AI Overviews or AI Mode, they do show how AI-assisted interactions with data can change.

The patent uses the example of a person who tells their assistant they’re going surfing. A standard AI response would be a boilerplate comment to have fun or to enjoy the day. The LLM-assisted response described in the patent would generate a response based on the geographic location and time to generate a comment about the weather like the potential for rain. These are called modified assistant outputs.

The patent describes it like this:

“…the assistant outputs included in the set of modified assistant outputs include assistant outputs that do drive the dialog session in manner that further engages the user of the client device in the dialog session by asking contextually relevant questions (e.g., “how long have you been surfing?”), that provide contextually relevant information (e.g., “but if you’re going to Example Beach again, be prepared for some light showers”), and/or that otherwise resonate with the user of the client device within the context of the dialog session.”

User-Specific Context

The patent describes multiple user-specific contexts that the LLM may use to generate a modified output:

  • User profile data, such as preferences (like food or types of activity).
  • Software application data (such as apps currently or recently in use).
  • Dialog history of the ongoing and/or previous assistant sessions.

Here’s a snippet that talks about various user profile related contextual signals:

“Moreover, the context of the dialog session can be determined based on one or more contextual signals that include, for example, ambient noise detected in an environment of the client device, user profile data, software application data, ….dialog history of the dialog session between the user and the automated assistant, and/or other contextual signals.”

Related Intents

An interesting part of the patent describes how a user’s food preference can be used to determine a related intent to a query.

“For example, …one or more of the LLMs can determine an intent associated with the given assistant query… Further, the one or more of the LLMs can identify, based on the intent associated with the given assistant query, at least one related intent that is related to the intent associated with the given assistant query… Moreover, the one or more of the LLMs can generate the additional assistant query based on the at least one related intent. “

The patent illustrates this with the example of a user saying that they’re hungry. The LLM will then identify related contexts such as what type of cuisine the user enjoys and the itent of eating at a restaurant.

The patent explains:

“In this example, the additional assistant query can correspond to, for example, “what types of cuisine has the user indicated he/she prefers?” (e.g., reflecting a related cuisine type intent associated with the intent of the user indicating he/she would like to eat), “what restaurants nearby are open?” (e.g., reflecting a related restaurant lookup intent associated with the intent of the user indicating he/she would like to eat)… In these implementations, additional assistant output can be determined based on processing the additional assistant query.”

System & Device Context

The system and device context part of the patent is interesting because it enables the AI to detect if the context of the device is that it’s low on batteries, and if so, it will turn off the LLM-modified responses. There are other factors such as whether the user is walking away from the device, computational costs, etc.

Takeaways

  • AI Query Responses Use Contextual Signals
    Google’s patent describes how automated assistants can use real-world context to generate more relevant and human-like answers and dialog.
  • Contextual Factors Influence Responses
    These include time/location/environment, user-specific data, dialog history and intent, system/device conditions, and input type (text, speech, or touch).
  • LLM-Modified Responses Enhance Engagement
    Large language models (LLMs) use these contexts to create personalized responses or follow-up questions, like referencing weather or past interactions.
  • Examples Show Practical Impact
    Scenarios like recommending food based on user preferences or commenting on local weather during outdoor plans demonstrates how real-world contexts can influence how AI responds to user queries.

This patent is important because millions of people are increasingly engaging with AI assistants, thus it’s relevant to publishers, ecommerce stores, local businesses and SEOs.

It outlines how Google’s AI-assisted systems can generate personalized, context-aware responses by using real-world signals. This enables assistants to go beyond keyword-based answers and respond with relevant information or follow-up questions, such as suggesting restaurants a user might like or commenting on weather conditions before a planned activity.

Read the patent here:

Using Large Language Model(s) In Generating Automated Assistant response(s).

Featured Image by Shutterstock/Visual Unit

New AI-Assisted Managed WordPress Hosting For Ecommerce via @sejournal, @martinibuster

Bluehost announced two competitively priced managed WordPress ecommerce hosting solutions that make it easy for content creators and ecommerce stores to get online with WordPress and start accepting orders.

Both plans feature AI site migration tools that help users switch web hosting providers, free content delivery networks to speed up web page downloads, AI-assisted site creation tools and NVMe (Non-Volatile Memory Express) solid state storage which provides faster speeds than traditional web hosting storage.

The new plans enable users to sell products with WooCommerce and even offer paid courses online, all within a managed WordPress hosting environment that’s optimized for WordPress websites.

According to Bluehost:

“Bluehost’s eCommerce Essentials equips content creators with an intuitive, all‑in‑one toolkit—complete with AI‑powered site building, seamless payment integrations, paid courses and memberships, social logins, email templates and SEO tools—to effortlessly engage audiences and turn their passion into profit.”

There are two plans, eCommerce Essentials and eCommerce Premium, with the premium version offering more ecommerce features built-in. Both plans are surprisingly affordable considering the many features offered.

Satish Hemachandran, Chief Product Officer at Bluehost commented:

“At Bluehost, we understand the unique needs of today’s content creators and entrepreneurs who are building personal brands or online stores and turning their passion into profit.

With Bluehost WordPress eCommerce hosting plans, creators get a streamlined platform to easily develop personalized commerce experiences. From launching a store to engaging an audience and monetizing content, our purpose-built tools are designed to simplify the process and support long-term growth. Our mission is to empower creators with the right resources to strengthen their brand, increase their income, and succeed in the digital economy.”

Read more about the new ecommerce WordPress hosting here:

WooCommerce Online Stores – The future of online selling is here.

Google Search Console Fails To Report Half Of All Search Queries via @sejournal, @MattGSouthern

New research from ZipTie reveals an issue with Google Search Console.

The study indicates that approximately 50% of search queries driving traffic to websites never appear in GSC reports. This leaves marketers with incomplete data regarding their organic search performance.

The research was conducted by Tomasz Rudzki, co-founder of ZipTie. His tests show that Google Search Console consistently overlooks conversational searches. These are the natural language queries people use when interacting with voice assistants or AI chatbots.

Simple Tests Prove The Data Gap

Rudzki started with a basic experiment on his website.

For several days, he searched Google using the same conversational question from different devices and accounts. These searches directed traffic to his site, which he could verify through other analytics tools.

However, when he checked Google Search Console for these specific queries, he found nothing. “Zero. Nada. Null,” as Rudzki put it.

To confirm this wasn’t isolated to his site, Rudzki asked 10 other SEO professionals to try the same test. All received identical results: their conversational queries were nowhere to be found in GSC data, even though the searches generated real traffic.

Search Volume May Affect Query Reporting

The research suggests that Google Search Console uses a minimum search volume threshold before it begins tracking queries. A search term may need to reach a certain number of searches before it appears in reports.

According to tests conducted by Rudzki’s colleague Jakub Łanda, when queries finally become popular enough to track, historical data from before that point appears to vanish.

Consider how people might search for iPhone information:

  • “What are the pros and cons of the iPhone 16?”
  • “Should I buy the new iPhone or stick with Samsung?”
  • “Compare iPhone 16 with Samsung S25”

Each question may receive only 10-15 searches per month individually. However, these variations combined could represent hundreds of searches about the same topic.

GSC often overlooks these low-volume variations, despite their significant combined impact.

Google Shows AI Answers But Hides the Queries

Here’s the confusing part: Google clearly understands conversational queries. Rudzki analyzed 140,000 questions from People Also Asked data and found that Google shows AI Overviews for 80% of these conversational searches.

Rudzki observed:

“So it seems Google is ready to show the AI answer on conversational queries. Yet, it struggles to report conversational queries in one of the most important tools in SEO’s and marketer’s toolkits.”

Why This Matters

When half of your search data is missing, strategic decisions turn into guesswork.

Content teams create articles based on keyword tools instead of genuine user questions. SEO professionals optimize for visible queries while overlooking valuable conversational searches that often go unreported.

Performance analysis becomes unreliable when pages appear to underperform in GSC but draw significant unreported traffic. Teams also lose the ability to identify emerging trends ahead of their competitors, as new topics only become apparent after they reach high search volumes.

What’s The Solution?

Acknowledge that GSC only shows part of the picture and adjust your strategy accordingly.

Switch from the Query tab to the Pages tab to identify which content drives traffic, regardless of the specific search terms used. Focus on creating comprehensive content that fully answers questions rather than targeting individual keywords.

Supplement GSC data with additional research methods to understand conversational search patterns. Consider how your users interact with an AI assistant, as that’s increasingly how they search.

What This Means for the Future

The gap between how people search and the tools that track their searches is widening. Voice search is gaining popularity, with approximately 20% of individuals worldwide using it on a regular basis. AI tools are training users to ask detailed, conversational questions.

Until Google addresses these reporting gaps, successful SEO strategies will require multiple data sources and approaches that account for the invisible half of search traffic, which drives real results yet remains hidden from view.

The complete research and instructions to replicate these tests can be found in ZipTie’s original report.


Featured Image: Roman Samborskyi/Shutterstock

WordPress Takes Steps To Integrate AI via @sejournal, @martinibuster

WordPress announced the formation of an AI Team that will focus on coordinating the development and integration of AI within the WordPress core. The team is to function similarly to the Performance Team, focusing on developing canonical plugins that users can install to test new functionality before a decision is made about whether or how to integrate new functionalities into the WordPress core itself.

The goal for the team is to help create a strategic focus, rapid testing to deployment and to provide a centralized location for collaborating on ideas and projects.

The team will include two Google employees, Felix Arntz and Pascal Birchler. Arntz is a Senior Software Engineer at Google who contributes to the WordPress core and to other WordPress plugins and has worked as a lead for the Performance Team.

Pascal Birchler, a Developer Relations Engineer and WordPress core committer, recently led a project to integrate the Model Context Protocol (MCP) with WordPress via WP-CLI.

The WordPress announcement called it an important step:

“This is an exciting and important step in WordPress’s evolution. I look forward to seeing what we’ll create together and in the open.”

WordPress First Steps On Path Blazed By Competitors

The formation of an AI team is long overdue, as even the new open source Drupal CMS designed to provide an easy to use interface for marketers and creators has AI-powered features built-in. Third-party proprietary CMS provider Wix already and shopping platform Shopify have both integrated AI into their user’s workflows.

Read the official WordPress announcement:

Announcing the Formation of the WordPress AI Team

Featured Image by Shutterstock/Hananeko_Studio

WordPress Unpauses Development But Has It Run Out Of Time? via @sejournal, @martinibuster

Automattic announced that it is reversing its four-month pause in WordPress development and will return to focusing on the WordPress core, Gutenberg, and other projects. The pause in contributions came at a critical moment, as competitors outpaced WordPress in ease of use and technological innovation left the platform behind.

Did WordPress Need A Four-Month Pause?

Automattic’s return to normal levels of contributions were initially contingent on WP Engine withdrawing their lawsuit against Automattic and Mullenweg, with the announcement stating:

“We’re excited to return to active contributions to WordPress core, Gutenberg, Playground, Openverse, and WordPress.org when the legal attacks have stopped.”

WP Engine and Automattic are still locked in litigation, so what changed?

Automattic suggests that it has reconsidered its place as the future of content management:

“After pausing our contributions to regroup, rethink, and plan strategically, we’re ready to press play again and return fully to the WordPress project.

…We’ve learned a lot from this pause that we can bring back to the project, including a greater awareness of the many ways WordPress is used and how we can shape the future of the web alongside so many passionate contributors. We’re committed to helping it grow and thrive…”

Automattic’s announcement suggests that they realized moving forward with WordPress is important despite continued litigation.

But did Automattic really need a four-month pause to come to that realization?

Where Did The WordPress Money Go?

And it’s not like Automattic was hurting for money to throw at WordPress. Salesforce Ventures invested $300 million dollars into Automattic in 2019 and an elated Mullenweg wrote that this would enable them to almost double the pace of innovation for WP.com, their enterprise offering WordPress VIP, WooCommerce, Jetpack, and increase resources to WordPress.org and Gutenberg.

Mullenweg wrote:

“For Automattic, the funding will allow us to accelerate our roadmap (perhaps by double) and scale up our existing products—including WordPress.com, WordPress VIP, WooCommerce, Jetpack, and (in a few days when it closes) Tumblr. It will also allow us to increase investing our time and energy into the future of the open source WordPress and Gutenberg.”

In the years immediately following the $300 million investment, updates to WooCommerce increased by 47.62% and as high as 80.95% and just a little bit higher for the year 2024. Jetpack continued at an average release schedule of 7 updates per year although it shot up to 22 updates in 2024. The enterprise level WordPress VIP premium service may have also benefited (changelog here).

Updates to the WordPress Core remained fairly unchanged according to the official release announcements and the pace of Gutenberg releases also followed a steady pace, with no significant increases.

List of number of WordPress release announcements per year:

  • 2019 – 29 announcements
  • 2020 28 announcements
  • 2021 26 announcements
  • 2022 27 announcements
  • 2023 26 announcements
  • 2024 30 announcements
  • 2025 9 announcements

All the millions of dollars invested in Automattic, along with any other income earned, had no apparent effect on the pace of innovation in the WordPress core.

Survival Of The Fittest CMS

A positive development from Automattic’s pause to rethink is the announcement of a new AI group, modeled after their Performance group. The new team is tasked with coordinating AI initiatives within WordPress’ core development. Like their Performance group, the new AI group was formed after their competitors had outpaced them, so WordPress is once again late in adapting to user needs and the fast pace of technology.

Matt Mullenweg struggled to answer where WordPress would be in five years when asked at the February 2025 WordCamp Asia event. He asked someone from Automattic to join him on stage to answer the question, but that other person also couldn’t answer because there was, in fact, no plan or idea other than the short-term roadmap focused on the immediate future.

Mullenweg explained the lack of a long-term vision as a strategic decision to remain adaptable to the fast pace of technology:

“Outside of Gutenberg, we haven’t had a roadmap that goes six months or a year, or a couple versions, because the world changes in ways you can’t predict.

But being responsive is, I think, really is how organisms survive.

You know, Darwin, said it’s not the fittest of the species that survives. It’s the one that’s most adaptable to change. I think that’s true for software as well.”

That’s a somewhat surprising statement, given that WordPress has a history of being years late to prioritizing website performance and AI integration. Divi, Elementor, Beaver Builder, and other WordPress editing environments had already cracked the code on democratizing web design in 2017 with block-based, point-and-click editors when WordPress began their effort to develop their own block-based editor.

Eight years later, Gutenberg is so difficult for many users that the official Classic Editor plugin has over ten million installations, and advanced web developers prefer other, more advanced web builders.

Takeaways:

  • Automattic’s Strategic Reversal
    Automattic reversed its pause on WordPress contributions despite unresolved litigation with WP Engine, perhaps signaling a change in internal priorities or external pressures.
  • Delayed Response to AI Trends
    A new AI group has been formed within WordPress core development, but this move comes years after competitors embraced AI—suggesting a reactive rather than proactive strategy.
  • Lack of Long-Term Vision
    WordPress leadership admits to having no roadmap beyond the short term, framing adaptability as a strength even as the platform lags in addressing user needs and keeping up with technological trends.
  • Minimal Impact from Major Investments
    Despite receiving hundreds of millions in funding, core WordPress and Gutenberg development showed no significant acceleration, raising questions about where investment actually went.
  • Usability and Competitive Lag
    Gutenberg arguably struggles with usability, as shown by the popularity of the Classic Editor plugin and user preference for third-party builders.
  • WordPress at a Competitive Disadvantage
    WordPress now finds itself needing to catch up in a CMS market that has evolved rapidly in both ease of use and innovation.

The bottom line is that the pace of development for the WordPress core and Gutenberg remained steady after the 2019 investment, and after all of the millions of dollars that Automattic received from companies like Newfold Digital, sponsored contributions, and volunteer contributions from individuals themselves, the effect on the speed of development and innovation maintained the same follow-the-competitors-from-behind pace.

Automattic’s return to WordPress core development inadvertently calls attention to how far the platform has fallen behind competitors like Wix in usability and innovation, despite major investments and years of community support. For users and developers, this means that WordPress must now work to regain trust by proving it can adapt quickly and deliver the tools that modern site developers, businesses, and content creators actually need.

Automattic has a legitimate dispute with WP Engine, but the way it was approached became a major distraction that resulted in an arguably unnecessary four-month pause to WordPress development. The platform might have been in danger of losing relevance if not for the work of third-party innovators, and it still arguably lags behind competitors.

Google Lens Integration For YouTube Shorts: Search Within Videos via @sejournal, @MattGSouthern

Google has integrated Lens into YouTube Shorts.

Now, you can search for items you see in videos directly from your phone.

How The New Feature Works

Here’s how to use Google Lens in YouTube Shorts:

  • Pause any Short by tapping the screen.
  • Select “Lens” from the top menu.
  • Circle, highlight, or tap anything you want to search.

You can identify objects, translate text, or learn about locations. The results appear right above the video. When you’re finished, just swipe down to continue watching.

Here’s an example of the interface:

Screenshot from: YouTube.com/CreatorInsider, May 2025.

The feature works with products, plants, animals, landmarks, and text. You can even translate captions in real-time. Some searches include AI Overviews that provide more detailed information about what you’re looking for.

Google shared an example in its announcement:

“If you’re watching a short filmed in a location that you want to visit, you can select a landmark to identify it and learn more about the destination’s culture and history.”

See a demonstration in the video below:

Important Limitations

There are some key restrictions. Google Lens won’t work on Shorts with YouTube Shopping affiliate tags or paid product promotions.

The support docs are clear:

“Tagging a product via YouTube Shopping will disable the lens search.”

Search results only show organic content, meaning no ads will appear when you use Lens. Google also states that it doesn’t use facial recognition technology, although the system may display results for famous people when relevant.

The feature is only compatible with mobile devices (iOS and Android). Google says the beta is “starting to roll out to all viewers this week,” though it hasn’t shared specific dates for different regions.

What This Means For Marketers

This update presents several opportunities for content creators and marketers:

  • Visual elements in your Shorts can now boost engagement.
  • Travel and hospitality businesses receive free visibility when their locations feature in videos.
    Educational creators can benefit as viewers explore the topics presented in their content.

The ban on affiliate content poses a challenge. Creators who rely on YouTube Shopping must carefully consider their monetization strategies. They will need to find a balance between discoverable content and their revenue goals.

Looking Ahead

Google Lens in YouTube Shorts signals a shift in how people interact with video content. You can now search within videos, not just for them.

For marketers, this means visual elements matter more than ever. The objects, locations, and text in your videos are now searchable entry points.

The exclusion of monetized content also sets up an interesting dynamic. Creators must choose between affiliate revenue and visibility in visual search.

Start planning your Shorts with searchable moments in mind. Your viewers are about to become visual searchers.

Google: Database Speed Beats Page Count For Crawl Budget via @sejournal, @MattGSouthern

Google has confirmed that most websites still don’t need to worry about crawl budget unless they have over one million pages. However, there’s a twist.

Google Search Relations team member Gary Illyes revealed on a recent podcast that how quickly your database operates matters more than the number of pages you have.

This update comes five years after Google shared similar guidance on crawl budgets. Despite significant changes in web technology, Google’s advice remains unchanged.

The Million-Page Rule Stays The Same

During the Search Off the Record podcast, Illyes maintained Google’s long-held position when co-host Martin Splitt inquired about crawl budget thresholds.

Illyes stated:

“I would say 1 million is okay probably.”

This implies that sites with fewer than a million pages can stop worrying about their crawl budget.

What’s surprising is that this number has remained unchanged since 2020. The web has grown significantly, with an increase in JavaScript, dynamic content, and more complex websites. Yet, Google’s threshold has remained the same.

Your Database Speed Is What Matters

Here’s the big news: Illyes revealed that slow databases hinder crawling more than having a large number of pages.

Illyes explained:

“If you are making expensive database calls, that’s going to cost the server a lot.”

A site with 500,000 pages but slow database queries might face more crawl issues than a site with 2 million fast-loading static pages.

What does this mean? You need to evaluate your database performance, not just count the number of pages. Sites with dynamic content, complex queries, or real-time data must prioritize speed and performance.

The Real Resource Hog: Indexing, Not Crawling

Illyes shared a sentiment that contradicts what many SEOs believe.

He said:

“It’s not crawling that is eating up the resources, it’s indexing and potentially serving or what you are doing with the data when you are processing that data.”

Consider what this means. If crawling doesn’t consume many resources, then blocking Googlebot may not be helpful. Instead, focus on making your content easier for Google to process after it has been crawled.

How We Got Here

The podcast provided some context about scale. In 1994, the World Wide Web Worm indexed only 110,000 pages, while WebCrawler indexed 2 million. Illyes called these numbers “cute” compared to today.

This helps explain why the one-million-page mark has remained unchanged. What once seemed huge in the early web is now just a medium-sized site. Google’s systems have expanded to manage this without altering the threshold.

Why The Threshold Remains Stable

Google has been striving to reduce its crawling footprint. Illyes revealed why that’s a challenge.

He explained:

“You saved seven bytes from each request that you make and then this new product will add back eight.”

This push-and-pull between efficiency improvements and new features helps explain why the crawl budget threshold remains consistent. While Google’s infrastructure evolves, the basic math regarding when crawl budget matters stays unchanged.

What You Should Do Now

Based on these insights, here’s what you should focus on:

Sites Under 1 Million Pages:
Continue with your current strategy. Prioritize excellent content and user experience. Crawl budget isn’t a concern for you.

Larger Sites:
Enhance database efficiency as your new priority. Review:

  • Query execution time
  • Caching effectiveness
  • Speed of dynamic content generation

All Sites:
Redirect focus from crawl prevention to indexing optimization. Since crawling isn’t the resource issue, assist Google in processing your content more efficiently.

Key Technical Checks:

  • Database query performance
  • Server response times
  • Content delivery optimization
  • Proper caching implementation

Looking Ahead

Google’s consistent crawl budget guidance demonstrates that some SEO fundamentals are indeed fundamental. Most sites don’t need to worry about it.

However, the insight regarding database efficiency shifts the conversation for larger sites. It’s not just about the number of pages you have; it’s about how efficiently you serve them.

For SEO professionals, this means incorporating database performance into your technical SEO audits. For developers, it underscores the significance of query optimization and caching strategies.

Five years from now, the million-page threshold might still exist. But sites that optimize their database performance today will be prepared for whatever comes next.

Listen to the full podcast episode below:


Featured Image: Novikov Aleksey/Shutterstock

Google’s Gary Illyes Warns AI Agents Will Create Web Congestion via @sejournal, @MattGSouthern

A Google engineer has warned that AI agents and automated bots will soon flood the internet with traffic.

Gary Illyes, who works on Google’s Search Relations team, said “everyone and my grandmother is launching a crawler” during a recent podcast.

The warning comes from Google’s latest Search Off the Record podcast episode.

AI Agents Will Strain Websites

During his conversation with fellow Search Relations team member Martin Splitt, Illyes warned that AI agents and “AI shenanigans” will be significant sources of new web traffic.

Illyes said:

“The web is getting congested… It’s not something that the web cannot handle… the web is designed to be able to handle all that traffic even if it’s automatic.”

This surge occurs as businesses deploy AI tools for content creation, competitor research, market analysis, and data gathering. Each tool requires crawling websites to function, and with the rapid growth of AI adoption, this traffic is expected to increase.

How Google’s Crawler System Works

The podcast provides a detailed discussion of Google’s crawling setup. Rather than employing different crawlers for each product, Google has developed one unified system.

Google Search, AdSense, Gmail, and other products utilize the same crawler infrastructure. Each one identifies itself with a different user agent name, but all adhere to the same protocols for robots.txt and server health.

Illyes explained:

“You can fetch with it from the internet but you have to specify your own user agent string.”

This unified approach ensures that all Google crawlers adhere to the same protocols and scale back when websites encounter difficulties.

The Real Resource Hog? It’s Not Crawling

Illyes challenged conventional SEO wisdom with a potentially controversial claim: crawling doesn’t consume significant resources.

Illyes stated:

“It’s not crawling that is eating up the resources, it’s indexing and potentially serving or what you are doing with the data.”

He even joked he would “get yelled at on the internet” for saying this.

This perspective suggests that fetching pages uses minimal resources compared to processing and storing the data. For those concerned about crawl budget, this could change optimization priorities.

From Thousands to Trillions: The Web’s Growth

The Googlers provided historical context. In 1994, the World Wide Web Worm search engine indexed only 110,000 pages, whereas WebCrawler managed to index 2 million. Today, individual websites can exceed millions of pages.

This rapid growth necessitated technological evolution. Crawlers progressed from basic HTTP 1.1 protocols to modern HTTP/2 for faster connections, with HTTP/3 support on the horizon.

Google’s Efficiency Battle

Google spent last year trying to reduce its crawling footprint, acknowledging the burden on site owners. However, new challenges continue to arise.

Illyes explained the dilemma:

“You saved seven bytes from each request that you make and then this new product will add back eight.”

Every efficiency gain is offset by new AI products requiring more data. This is a cycle that shows no signs of stopping.

What Website Owners Should Do

The upcoming traffic surge necessitates action in several areas:

  • Infrastructure: Current hosting may not support the expected load. Assess server capacity, CDN options, and response times before the influx occurs.
  • Access Control: Review robots.txt rules to control which AI crawlers can access your site. Block unnecessary bots while allowing legitimate ones to function properly.
  • Database Performance: Illyes specifically pointed out “expensive database calls” as problematic. Optimize queries and implement caching to alleviate server strain.
  • Monitoring: Differentiate between legitimate crawlers, AI agents, and malicious bots through thorough log analysis and performance tracking.

The Path Forward

Illyes pointed to Common Crawl as a potential model, which crawls once and shares data publicly, reducing redundant traffic. Similar collaborative solutions may emerge as the web adapts.

While Illyes expressed confidence in the web’s ability to manage increased traffic, the message is clear: AI agents are arriving in massive numbers.

Websites that strengthen their infrastructure now will be better equipped to weather the storm. Those who wait may find themselves overwhelmed when the full force of the wave hits.

Listen to the full podcast episode below:


Featured Image: Collagery/Shutterstock

Google’s Query Fan-Out Patent: Thematic Search via @sejournal, @martinibuster

A patent that Google filed in December 2024 presents a close match to the Query Fan-Out technique that Google’s AI Mode uses. The patent, called Thematic Search, offers an idea of how AI Mode answers are generated and suggests new ways to think about content strategy.

The patent describes a system that organizes related search results to a search query into categories, what it calls themes, and provides a short summary for each theme so that users can understand the answers to their questions without having to click a link to all of the different sites.

The patent describes a system for deep research, for questions that are broad or complex. What’s new about the invention is how it automatically identifies themes from the traditional search results and uses an AI to generate an informative summary for each one using both the content and context from within those results.

Thematic Search Engine

Themes is a concept that goes back to the early days of search engines, which is why this patent caught my eye a few months ago and caused me to bookmark it.

Here’s the TL/DR of what it does:

  • The patent references its use within the context of a large language model and a summary generator.
  • It also references a thematic search engine that receives a search query and then passes that along to a search engine.
  • The thematic search engine takes the search engine results and organizes them into themes.
  • The patent describes a system that interfaces with a traditional search engine and uses a large language model for generating summaries of thematically grouped search results.
  • The patent describes that a single query can result in multiple queries that are based on “sub-themes”

Comparison Of Query Fan-Out And Thematic Search

The system described in the parent mirrors what Google’s documentation says about the Query Fan-Out technique.

Here’s what the patent says about generating additional queries based on sub-themes:

“In some examples, in response to the search query 142-2 being generated, the thematic search engine 120 may generate thematic data 138-2 from at least a portion of the search results 118-2. For example, the thematic search engine 120 may obtain the search results 118-2 and may generate narrower themes 130 (e.g., sub-themes) (e.g., “neighborhood A”, “neighborhood B”, “neighborhood C”) from the responsive documents 126 of the search results 118-2. The search results page 160 may display the sub-themes of theme 130a and/or the thematic search results 119 for the search query 142-2. The process may continue, where selection of a sub-theme of theme 130a may cause the thematic search engine 120 to obtain another set of search results 118 from the search engine 104 and may generate narrower themes 130 (e.g., sub-sub-themes of theme 130a) from the search results 118 and so forth.”

Here’s what Google’s documentation says about the Query Fan-Out Technique:

“It uses a “query fan-out” technique, issuing multiple related searches concurrently across subtopics and multiple data sources and then brings those results together to provide an easy-to-understand response. This approach helps you access more breadth and depth of information than a traditional search on Google.”

The system described in the patent resembles what Google’s documentation says about the Query Fan-Out technique, particularly in how it explores subtopics by generating new queries based on themes.

Summary Generator

The summary generator is a component of the thematic search system. It’s designed to generate textual summaries for each theme generated from search results.

This is how it works:

  • The summary generator is sometimes implemented as a large language model trained to create original text.
  • The summary generator uses one or more passages from search results grouped under a particular theme.
  • It may also use contextual information from titles, metadata, surrounding related passages to improve summary quality.
  • The summary generator can be triggered when a user submits a search query or when the thematic search engine is initialized.

The patent doesn’t define what ‘initialization’ of the thematic search engine means, maybe because it’s taken for granted that it means the thematic search engine starts up in anticipation of handling a query.

Query Results Are Clustered By Theme Instead Of Traditional Ranking

The traditional search results, in some examples shared in the patent, are replaced by grouped themes and generated summaries. Thematic search changes what content is shown and linked to users. For example, a typical query that a publisher or SEO is optimizing for may now be the starting point for a user’s information journey. The thematic search results leads a user down a path of discovering sub-themes of the original query and the site that ultimately wins the click might not be the one that ranks number one for the initial search query but rather it may be another web page that is relevant for an adjacent query.

The patent describes multiple ways that the thematic search engine can work (I added bullet points to make it easier to understand):

  • “The themes are displayed on a search results page, and, in some examples, the search results (or a portion thereof) are arranged (e.g., organized, sorted) according to the plurality of themes. Displaying a theme may include displaying the phrase of the theme.
  • In some examples, the thematic search engine may rank the themes based on prominence and/or relevance to the search query.
  • The search results page may organize the search results (or a portion thereof) according to the themes (e.g., under the theme of ‘cost of living”, identifying those search results that relate to the theme of ‘cost of living”).
  • The themes and/or search results organized by theme by the thematic search engine may be rendered in the search results page according to a variety of different ways, e.g., lists, user interface (UI) cards or objects, horizontal carousel, vertical carousel, etc.
  • The search results organized by theme may be referred to as thematic search results. In some examples, the themes and/or search results organized by theme are displayed in the search results page along with the search results (e.g., normal search results) from the search engine.
  • In some examples, the themes and/or theme-organized search results are displayed in a portion of the search results page that is separate from the search results obtained by the search engine.”

Content From Multiple Sources Are Combined

The AI-generated summaries are created from multiple websites and grouped under a theme. This makes link attribution, visibility, and traffic difficult to predict.

In the following citation from the patent, the reference to “unstructured data” means content that’s on a web page.

According to the patent:

“For example, the thematic search engine may generate themes from unstructured data by analyzing the content of the responsive documents themselves and may thematically organize the search results according to the themes.

….In response to a search query (“moving to Denver”), a search engine may obtain search results (e.g., responsive documents) responsive to that search query.

The thematic search engine may select a set of responsive documents (e.g., top X number of search results) from the search results obtained by the search engine, and generate a plurality of themes (e.g., “neighborhoods”, “cost of living”, “things to do”, “pros and cons”, etc.) from the content of the responsive documents.

A theme may include a phrase, generated by a language model, that describes a theme included in the responsive documents. In some examples, the thematic search engine may map semantic keywords from each responsive document (e.g., from the search results) and connect the semantic keywords to similar semantic keywords from other responsive documents to generate themes.”

Content From Source Pages Are Linked

The documentation states that the thematic search engine links to the URLs of the source pages. It also states that the thematic search result could include the web page’s title or other metadata. But the part that’s important for SEOs and publishers is the part about attribution, links.

“…a thematic search result 119 may include a title 146 of the responsive document 126, a passage 145 from the responsive document 126, and a source 144 of the responsive document. The source 144 may be a resource locator (e.g., uniform resource location (URL)) of the responsive document 126.

The passage 145 may be a description (e.g., a snippet obtained from the metadata or content of the responsive document 126). In some examples, the passage 145 includes a portion of the responsive document 126 that mentions the respective theme 130. In some examples, the passage 145 included in the thematic search result 119 is associated with a summary description 166 generated by the language model 128 and included in a cluster group 172.”

User Interaction Influences Presentation

As previously mentioned, the thematic search engine is not a ranked list of documents for a search query. It’s a collection of information across themes that are related to the initial search query. User interaction with those AI generated summaries influences which sites are going to receive traffic.

Automatically generated sub-themes can present alternative paths on the user’s information journey that begins with the initial search query.

Summarization Uses Publisher Metadata

The summary generator uses document titles, metadata, and surrounding textual content. That may mean that well-structured content may influence how summaries are constructed.

The following is what the patent says, I added bullet points to make it easier to understand:

  • “The summary generator 164 may receive a passage 145 as an input and outputs a summary description 166 for the inputted passage 145.
  • In some examples, the summary generator 164 receives a passage 145 and contextual information as inputs and outputs a summary description 166 for the passage 145.
  • In some examples, the contextual information may include the title of the responsive document 126 and/or metadata associated with the responsive document 126.
  • In some examples, the contextual information may include one or more neighboring passages 145 (e.g., adjacent passages).
  • In some examples, the contextual information may include a summary description 166 for one or more neighboring passages 145 (e.g., adjacent passages).
  • In some examples, the contextual information may include all the other passages 145 on the same responsive document 126. For example, the summary generator may receive a passage 145 and the other passages 145 (e.g., all other passages 145) on the same responsive document 126 (and, in some examples, other contextual information) as inputs and may output a summary description 166 for the passage 145.”

Thematic Search: Implications For Content & SEO

There are two way that AI Mode ends for a publisher:

  1. Since users may get their answers from theme summaries or dropdowns, zero-click behavior is likely to increase, reducing traffic from traditional links.
  2. Or, it could be that the web page that provides the end of the user’s information journey for a given query is the one that receives the click.

I think this means that we really need to re-think the paradigm of ranking for keywords and maybe consider what the question is that’s being answered by a web page, and then identify follow-up questions that may be related to that initial query and either include that in the web page or create another web page that answers what may be the end of the information journey for a given search query.

You can read the patent here:

Thematic Search (PDF)

Read Google’s Documentation Of AI Mode (PDF)

Google Fixes AI Mode Traffic Attribution Bug via @sejournal, @MattGSouthern

Google has fixed a bug that caused AI Mode search traffic to be reported as “direct traffic” instead of “organic traffic” in Google Analytics.

The problem started last week. Google was adding a special code (rel=”noopener noreferrer”) to links in its AI Mode search results. This code caused Google Analytics to incorrectly attribute traffic to websites, rather than from Google search.

Reports from Aleyda Solis, Founder at Orainti, and others in the SEO community confirm the issue is resolved.

Discovery of the Attribution Problem

Maga Sikora, an SEO director specializing in AI search, first identified the issue. She warned other marketers:

“Traffic from Google’s AI Mode is being tagged as direct in GA — not organic, as Google adds a rel=’noopener noreferrer’ to those links. Keep this in mind when reviewing your reports.”

The noreferrer code is typically used for security purposes. However, in this case, it was blocking Google Analytics from tracking the actual source of the traffic.

Google Acknowledges the Bug

John Mueller, Search Advocate at Google, quickly responded. He suggested it was a mistake on Google’s end, stating:

“My assumption is that this will be fixed; it looks like a bug on our side.”

Mueller also explained that Search Console doesn’t currently display AI Mode data, but it will be available soon.

He added:

“We’re updating the documentation to reflect this will be showing soon as part of the AI Mode rollout.”

Rapid Resolution & Current Status

Google fixed the problem within days.

Solis confirmed the fix:

“I don’t see the ‘noreferrer’ in Google’s AI Mode links anymore.”

She’s now seeing AI Mode data in her analytics and is verifying that traffic is correctly labeled as “organic” instead of “direct.”

Impact on SEO Reporting

The bug may have affected your traffic data for several days. If your site received AI Mode traffic during this period, some of your “direct” traffic may have been organic search traffic.

This misclassification could have:

  • Skewed conversion tracking
  • Affected budget decisions
  • Made SEO performance look worse than it was
  • Hidden the true impact of AI Mode on your site

What To Do Now

Here’s your action plan:

  1. Audit recent traffic data – Check for unusual spikes in direct traffic from the past week
  2. Document the issue – Note the affected dates for future reference
  3. Adjust reporting – Consider adding notes to client reports about the temporary bug
  4. Prepare for AI Mode tracking – Start planning how to measure this new traffic source

Google’s prompt response shows it understands the importance of accurate data for marketers.


Featured Image: Tada Images/Shutterstock