Google has released its annual summer travel trends report alongside several AI-powered updates to its travel planning tools.
The announcement reveals shifting travel preferences while introducing enhancements to Search, Maps, Lens, and Gemini functionality.
New AI Search and Planning Features
Google announced five major updates to its travel planning ecosystem.
Expanded AI Overviews
Google has enhanced its AI Overviews in Search to generate travel recommendations for entire countries and regions, not just cities.
You can now request specialized itineraries by entering queries like “create an itinerary for Costa Rica with a focus on nature.”
The feature includes visual elements and the ability to export recommendations to various Google products.
Image Credit: Google
Price Monitoring for Hotels
Following its flight price tracking implementation, Google has extended similar functionality to accommodations.
When browsing google.com/hotels, you can now toggle price tracking to receive alerts when hotel rates decrease for selected dates and destinations.
The system factors in applied filters include amenity preferences and star ratings.
Image Credit: Google
Screenshot Recognition in Maps
A new Google Maps feature can help organize travel plans by automatically identifying places mentioned in screenshots.
Using Gemini AI capabilities, the system recognizes venues from saved images and allows users to add them to dedicated lists.
The feature is launching first on iOS in English, with Android rollout planned.
Gemini Travel Assistance
Google’s Gemini AI assistant now offers enhanced travel planning support, allowing users to create “Gems” – customized AI assistants for specific travel needs.
Now available at no cost, these specialized assistants can help with destination selection, local recommendations, and trip logistics.
Expanded Lens Capabilities
Google Lens continues evolving, offering enhanced AI-powered information delivery when pointing your camera at landmarks or objects.
The feature is expanding beyond English to include Hindi, Indonesian, Japanese, Korean, Portuguese, and Spanish, complementing its existing translation capabilities.
Image Credit: Google
Travel Search Trends
According to Google’s Flights and Search data analysis, travelers are increasingly drawn to coastal destinations for the Summer of 2025.
Caribbean islands, including Puerto Rico, Curacao, and St. Lucia, are seeing significant search growth, along with other beach destinations like Rio de Janeiro, Maui, and Nantucket.
The data also reveals continued momentum for outdoor adventure travel within the U.S.:
Cities with proximity to nature experiences (Billings, Montana; Juneau, Alaska; and Bangor, Maine) are experiencing higher search volume
“Cabins” has emerged as the top accommodation search for romantic getaways
Family travelers are increasingly searching for “dude ranch” vacations
Weekend getaway searches concentrate on natural destinations, including upstate New York, Joshua Tree National Park, and Sedona.
An unexpected trend in luggage preferences was also noted, with “checked bags” queries now exceeding historically dominant “carry on” searches.
Supporting this shift, space-saving solutions like vacuum bags and compression packing cubes have become top trending travel accessory searches.
Implications for SEO and Travel Content
These updates signal Google’s continued investment in controlling the travel research journey within its own ecosystem.
The expansion of AI-generated itineraries and information potentially reduces the need for users to visit traditional travel content sites during the planning phase.
Travel brands and publishers may need to adapt their SEO and content strategies to account for these changes, focusing more on unique experiences and in-depth content beyond what Google’s AI tools can generate.
The trend data also provides valuable insights for travel-related keyword targeting and content development as summer vacation planning begins for many consumers.
OpenAI has rolled out a new image generation system directly integrated with GPT-4o. This system allows the AI to access its knowledge base and conversation context when creating images.
This integration is said to enable more contextually relevant and accurate visual outputs.
“GPT‑4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration. These capabilities make it easier to create exactly the image you envision, helping you communicate more effectively through visuals and advancing image generation into a practical tool with precision and power.”
Here’s everything else you need to know.
Technical Capabilities
OpenAI highlights the following capabilities of its new image generation system:
It accurately renders text within images.
It allows users to refine images through conversation while keeping a consistent style.
It supports complex prompts with up to 20 different objects.
It can generate images based on uploaded references.
It creates visuals using information from GPT-4o’s training data.
OpenAI states in its announcement:
“Because image generation is now native to GPT‑4o, you can refine images through natural conversation. GPT‑4o can build upon images and text in chat context, ensuring consistency throughout. For example, if you’re designing a video game character, the character’s appearance remains coherent across multiple iterations as you refine and experiment.”
Examples
To demonstrate character consistency, here’s an example showing a cat and then that same cat with a hat and monocle.
Screenshot from: openai.com/index/introducing-4o-image-generation/, March 2025.
Here’s a more practical example for marketers, demonstrating text generation: a full restaurant menu generated with a detailed prompt.
Screenshot from: openai.com/index/introducing-4o-image-generation/, March 2025.
There are dozens more examples in OpenAI’s announcement post, many of which contain several prompts and follow-ups.
Limitations
OpenAI admits:
“Our model isn’t perfect. We’re aware of multiple limitations at the moment which we will work to address through model improvements after the initial launch.”
The company notes the following limitations of its new image generation system:
Cropping: GPT-4o sometimes crops long images, like posters, too closely at the bottom.
Hallucinations: This model can create false information, especially with vague prompts.
High Blending Problems: It struggles to accurately depict more than 10 to 20 concepts at once, like a complete periodic table.
Multilingual Text: The model can have issues showing non-Latin characters, leading to errors.
Editing: Requests to edit specific image parts may change other areas or create new mistakes. It also struggles to keep faces consistent in uploaded images.
Information Density: The model has difficulty showing detailed information at small sizes.
Search Implications
This update changes AI image generation from mainly decorative uses to more practical functions in business and communication.
Websites can use AI-generated images but with important considerations.
Google’s guidelines do not prohibit AI-generated visuals, focusing instead on whether content provides value regardless of how it’s produced.
Following these best practices is recommended:
Using C2PA metadata (which GPT-4o adds automatically) to maintain transparency
Adding proper alt text for accessibility and indexing
Ensuring images serve user intent rather than just filling space
Creating unique visuals rather than generic AI templates
Google Search Advocate John Mueller has expressed a negative opinion regarding AI-generated images. While his personal preferences don’t influence Google’s algorithms, they may indicate how others feel about AI images.
Screenshot from: bsky.app/profile/johnmu.com, March 2025.
Note that Google is implementing measures to label AI-generated images in search results.
Availability
The feature is now available to ChatGPT users with Plus, Pro, Team, or Free plans. Access for Enterprise and Edu users will be available soon.
Developers can expect API access in the coming weeks. Because of higher processing needs, image generation takes about one minute on average.
IAB’s latest “State of Data” report reveals that despite recognizing its potential, 70% of agencies, brands, and publishers have yet to integrate AI into their campaigns fully.
Here’s a look at the study, which examines the current use of AI in advertising, the challenges of adoption, and the opportunities for success.
Current State of AI Adoption
A report from the Interactive Advertising Bureau (IAB) surveyed over 500 experts and found that AI use varies across the industry:
30% of companies have implemented AI in their media campaigns.
Agencies (37%) and publishers (34%) are more advanced in using AI compared to brands (19%).
Half of the companies that haven’t adopted AI plan to do so by 2026.
Most organizations (85%) are using general AI tools, while fewer are using custom solutions (45%) or proprietary tools (24%).
One SVP from an undisclosed brand stated in the report:
“We have been slow to fully implement AI into our day-to-day processes. We are wary to go ‘all in’ until it’s become a bit more of a societal norm with a long-standing track record of scalable success.”
AI Perceptions
Companies using AI generally have positive experiences:
82% say AI meets or exceeds their efficiency expectations, saving time and costs.
75% believe AI helps their media campaigns effectively.
73% find AI reliable over time.
AI excels in data-heavy tasks, like audience segmentation and targeting, but struggles with tasks needing human judgment, such as RFP management and campaign setup.
Adoption Barriers
The research found several barriers to adopting AI in media campaigns:
62% said they’re concerned about how complex it is to set up and maintain AI.
62% worry about the risk of data security.
61% noted that their organizations lack AI knowledge.
60% have concerns about how accurate and transparent AI is.
Interestingly, job displacement isn’t seen as a major issue, with only 37% identifying it as a concern.
Buy-Side vs. Sell-Side Challenges
Agencies, brands, and publishers face unique challenges with AI:
Publishers struggle with complex technology (67%) and scattered capabilities (62%).
Brands and publishers (56% each) lack a clear AI vision.
Agencies encounter the most resistance to change from teammates and clients (61%).
Additionally, 51% of brands worry about transparency in how their partners use AI.
Looking Ahead
AI is changing media campaigns, and IAB’s report highlights some important points.
First, many companies are in the early stages of adopting AI, but this is happening faster than before. Companies without clear plans risk falling behind by 2026.
Second, companies need good data and solid governance guidelines to succeed with AI. Organizations should train their teams in best practices and set clear goals.
Standards for transparency, privacy, and reliability are still being developed across the industry. Companies that collaborate to set these standards will be best positioned to handle this change in digital advertising.
The full “State of Data” report is available through IAB.
Danny Sullivan, Google’s Search Liaison, shared insights about AI Overviews, explaining how predictive summaries, grounding links, and the query fan-out technique work together to shape AI-generated search results.
Optimizing For AIO
Danny Sullivan shared insights into how AI Overviews are generated, helping explain why Google may link to websites that don’t match the typical search results. While the links can differ, he emphasized that the fundamentals of search optimization remain unchanged.
This is what Danny Sullivan said, based on my notes:
“The core fundamental things haven’t really changed. If you’re doing things that are making you successful on search, those sorts of things should transfer into some of the things that you see in the generative AI kind of summaries.”
Google Explains Why AIO Results Are Different
One of the main takeaways from this part of Danny’s presentation was his explanation of why Google AIO search results are different. This is the clearest explanation of why the AIO search results are different, every SEO and publisher needs to know this.
He introduced two concepts to familiarize yourself with in order to better understand AIO search results.
Predictive Summaries
Grounding Links
Predictive Summaries
Danny solved the mystery behind AIO search results that show content and links that are different from what organic search results show, which makes it harder to understand how to optimize for that kind of AI search results.
He shared that the reason for that kind of AIO is something called predictive summaries. Predictive summaries show answers to a search query but also try to predict related variations of what a user will also want to see. This sounds a lot like Google’s patent about Information Gain. Information Gain is about predicting the next question that a searcher may ask after reading the answer to their present question. Information gain is a patent that is strictly to the context of AI Search and AI Assistants.
Here is what he said, according to my notes:
“One thing I think that people find really confusing sometimes is that they’ll do a query and especially you’ll see …these are the top 10 results, but I don’t see them in the AIO, what’s going on?
And it’s like, yeah, the query in the search box is the same query, but the model that’s going out there to try to understand what to show is kind of an overview, going beyond just the top 10 results. It’s understanding a lot of results and it’s understanding a lot of variations that you might kind of get and so that it’s coming back and it’s trying to provide its predictive summary of what the query is related to.”
Grounding Links
Sullivan also revealed that “grounding links” are another reason why AIO search results are different from the regular organic search results. An AIO search result is a summary of a topic that includes facts about multiple subtopics. The purpose of grounding is to anchor the entire summary to verifiable information from the web ecosystem.
In the context of AIO, grounding is the process of confirming the factual authenticity of the AI summaries so that a searcher can click to read about any subtopic discussed in the answer summary provided by AIO. This is the second reason why the links in AIO show a variety not normally seen in the organic search results.
One way to look at this is that the links are more contextual than the regular ten blue links of the organic search results. These contextual links are also referred to as qualified clicks or qualified links, links that are hyper-specific and more relevant in general than organic search results.
Danny appears to say that the grounding links are created from searches that are related to the initial search query, but are not the same. Like, if you want to explain how a conventional automobile runs, you need information about the powertrain which is made up of a gas combustion engine, a transmission, the axles and so on. Answering a complex question requires grounding from a wide array of information sources.
According to my notes, this is how Danny Sullivan explained it:
“And then on top of that, it’s then also trying to bring in the grounding links. And those grounding links, because it kind of comes from a broader set aren’t just going to match. The queries are going to be different and the overall set is going to be different.
Which is why it’s a great opportunity for diversity and whatever our query thing is that we say, but that’s why you can see different things that are showing there.”
Don’t Mess Up Your Rankings
Sullivan cautioned about trying to rank for both the organic and the different parts of the AIO summaries, saying that it’s likely to “mess things up” because “it doesn’t really work like that.”
Query Fan-Out Technique
Danny Sullivan also touched on the topic of AI Mode, saying that right now it’s not really something to optimize for because it’s still in Google Labs and it’s very likely to change and be something different if it ever gets out of Google Labs.
But he did say that AI Mode uses something called a query fan-out technique.
He said:
“…one of the things they talk about is like ‘we use an advanced query fan out technique with multiple related queries in it…’ And it’s basically that what I said before.
You issued a query. You try to understand the variations and things that are related. which by the way is not that much different to how search works at the moment even when you didn’t have the AI elements to it. Because when you would issue a query now we try to understand synonyms, we try to understand the meaning of the entire query. If it’s a sentence, we try to match it in all sorts of different ways …because sometimes it just brings you better results.”
Takeaways:
Google Search Liaison, aka Danny Sullivan, encouraged the use of the core SEO fundamentals, saying that they are still relevant for ranking. Danny explained why the links in AI Overviews can sometimes differ significantly from those in the organic search results, introducing three concepts that help understand AIO search results better.
Three concepts related to AIO search results to understand:
Google is expanding AI overviews to “thousands more health topics,” per an announcement at the company’s health-focused ‘The Check Up’ event.
The event included developments spanning research, wearable technology, and medical records.
Here’s more about how Google is refining health results in Search.
AI Overviews For Health Queries
Google is showing AI overviews for more health-related queries.
Compared to other types of questions, this topic has had fewer AI overviews. Now, these overviews will be available for more queries and in more languages.
“Now, using AI and our best-in-class quality and ranking systems, we’ve been able to expand these types of overviews to cover thousands more health topics. We’re also expanding to more countries and languages, including Spanish, Portuguese and Japanese, starting on mobile.”
Google notes health-focused advancements to its Gemini models will go into summarizing information for health topics.
With these updates, Google claims AI overviews for health queries are “more relevant, comprehensive and continue to meet a high bar for clinical factuality.”
New “What People Suggest” Feature
Google is introducing a new feature for health queries called “What People Suggest.”
It uses AI to organize perspectives from online discussions and to analyze what people with similar health conditions are saying.
For example, someone with arthritis looking for exercise recommendations could use this feature to learn what works for others with the same condition.
See an example below.
Screenshot from: blog.google/technology/health/the-check-up-health-ai-updates-2025/, March 2025.
“What People Suggest” is currently available only on mobile devices in the U.S.
Broader Health AI Initiatives
The search updates were part of a larger set of health technology announcements at The Check Up event. Google also revealed:
Medical Records APIs in Health Connect for managing health data across applications
FDA clearance for Loss of Pulse Detection on Pixel Watch 3
An AI co-scientist built on Gemini 2.0 to help biomedical researchers
TxGemma, a collection of open models for AI-powered drug discovery
Capricorn, an AI tool for pediatric oncology treatment developed with Princess Máxima Center
Looking Ahead
Hallucination remains a problem for AI models. While Gemini may have upgrades that make it more accurate, it will still be wrong at least sometimes.
Google’s inclusion of personal experiences alongside medical websites marks a shift, recognizing people value both clinical information and real-world perspectives.
Health publishers should be aware that this could affect search visibility but may also increase chances of appearing for more queries or the “What People Suggest” section.
But there isn’t much content about examples and success factors of content that drives citations and mentions in AI chatbots.
To get an answer, I analyzed over 7,000 citations across 1,600 URLs to content-heavy sites (think: Integrators) in # AI chatbots (ChatGPT, Perplexity, AI Overviews) in February 2024 with the help of Profound.
My goal is to figure out:
Why some pages are more cited than others, so we can optimize content for AI chatbots.
Whether classic SEO factors matter for AI chatbot visibility, so we can prioritize.
What traps to avoid, so we don’t have to learn the same lessons many times.
If different factors influence mentions and citations, so we can be more targeted in our efforts.
Here are my findings:
Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!
The Key To Brand Citation In AI Chatbots: Deep Content
Image Credit: Kevin Indig
🔍 Context: We know that AI chatbots use Retrieval Augmented Generation (RAG) to weigh their answers with results from Google and Bing. However, does that mean classic SEO ranking factors also translate to AI chatbot citations? No.
My correlation analysis shows that none of the classic SEO metrics have strong relationships with citations. LLMs have light preferences: Perplexity and in AIOs weigh word and sentence count higher. ChatGPT weighs domain rating and Flesch Score.
💡Takeaway: Classic SEO metrics don’t matter nearly as much for AI chatbot mentions and citations. The best thing you can do for content optimization is to aim for depth, comprehensiveness, and readability (how easy the text is to understand).
The following examples all demonstrate those attributes:
Broad correlations didn’t reveal enough meat on the bone and left me with too many open questions.
So, I looked at what the most-cited content does differently than the rest. That approach showed much stronger patterns.
Image Credit: Kevin Indig
🔍Context: Because I didn’t get much out of statistical correlations, I wanted to see how the top 10% of most cited content stacks up against the bottom 90%.
The bigger the difference, the more critical the factor for the top 10%. In other words, the multiplier (x-axis on the chart) indicates what factors LLMs reward with citations.
The results:
The two factors that stand out are sentence and word count, followed by the Flesch Score. Metrics related to backlinks and traffic seem to have a negative effect, which doesn’t mean that AI chatbots weigh them negatively but simply that they don’t matter for mentions or citations.
The top 10% of most cited pages across all three LLMs have much less traffic, rank for fewer keywords, and get fewer total backlinks. How does that make sense? It almost looks like being strong in traditional SEO metrics is bad for AI chatbot visibility.
Copilot (not included in the chart) has the starkest inequality, by the way. The top 10% have 17.6 more citations than the bottom 90%. However, top 10% also rank for 1.7x more keywords in organic search. So, Copilot seems to have stronger preferences than other AI Chatbots.
Splitting the data up by AI Chatbot shows you their unique preferences:
Image Credit: Kevin Indig
💡Takeaway: Content depth (word and sentence count) and readability (Flesch Score) have the biggest impact on citations in AI chatbots.
This is important to understand: Longer content isn’t better because it’s longer, but because it has a higher chance of answering a specific question prompted in an AI chatbot.
Examples:
www.verywellmind.com/best-online-psychiatrists-5119854 has 187 citations, over 10,000 words, and over 1,500 sentences, with a Flesch Score of 55, and is cited 72 times by ChatGPT.
On the other hand, www.onlinetherapy.com/best-online-psychiatrists/ has only three citations, also a low Flesch Score, with 48, but comes “short” with only 3,900 words and 580 sentences.
🔍Context: We don’t yet know the value of a brand being mentioned by an AI chatbot.
Early research indicates it’s high, especially when prompts indicate purchase intent.
However, I wanted to get a step closer by understanding what leads to brand mentions in AI chatbots in the first place.
After matching many metrics with AI chatbot visibility, I found one factor that stands out more than anything else: Brand search volume.
The number of AI chatbot mentions, and brand search volume have a correlation of .334 – pretty good in this field. In other words, the popularity of a brand broadly decides how visible it is in AI chatbots.
Image Credit: Kevin Indig
Popularity is the most significant predictor for ChatGPT, which also sends the most traffic and has the highest usage of all AI chatbots.
When breaking it down by AI chatbot, I found ChatGPT has the highest correlation with .542 (strong) ,but Perplexity (.196) and Google AIOs (.254) have lower correlations.
To be clear, there is a lot of nuance on the prompt and category level. But broadly, a brand’s visibility seems to be severely impacted by how popular it is.
Example of popular brands and their visibility in the health category (Image Credit: Kevin Indig)
However, when brands are mentioned, all AI chatbots prefer popular brands and consistently rank them in the same order.
There is a clear link between the categories of the users’ questions (mental health, skincare, weight loss, hair loss, erectile dysfunction) and brands.
Early data shows that the most visible brands are digital-first and invest heavily in their online presence with content, SEO, reviews, social media, and digital advertising.
💡Takeaway: Popularity is the biggest criterion that decides whether a brand is mentioned in AI chatbots or not. The way consumers connect brands to product categories also matters.
Comparing brand search volume and product category presence with your competitors gives you the best idea of how competitive you are on ChatGPT & Co.
Examples: All models in my analysis cite Healthline most often. Not a single other domain was in the top 10 citations for all four models, showing their distinctly different tastes and how important it is to keep track of many models as opposed to only ChatGPT – if those models also send you traffic.
Image Credit: Kevin Indig
Other well-cited domains across most models:
verywellmind.com
onlinedoctor.com
medicalnewstoday.com
byrdie.com
cnet.com
ncoa.org
Image Credit: Kevin Indig
Context: Not all AI chatbots mentioned brands with the same frequency. Even though ChatGPT has the highest adoption and sends the most referral traffic to sources, Perplexity mentions the most brands per average in answers.
Prompt structure matters for brand visibility:
The word “best” was a strong trigger for brand mentions in 69.71% of prompts.
Words like “trusted” (5.77%), “source” (2.88%), “recommend” (0.96%), and “reliable” (0.96%) were also associated with an increased likelihood of brand mentions.
Prompts including “recommend” often mention public organizations like the FDA, especially when the prompt includes words like “trusted” or “leading.”
Google AIOs show the highest brand diversity, followed by Perplexity, then ChatGPT.
💡Takeaway: Prompt structure has a meaningful impact on the brands that come up in the answer.
However, we’re not yet able to truly know what prompts users utilize. This is important to keep in mind: All prompts we look at and track are just proxies for what users might be doing.
Image Credit: Kevin Indig
🔍Context: In my research, I encountered several ways brands unintentionally sabotage their AI chatbot visibility.
I surface them here because the pre-requisite to being visible in LLMs is, of course, their ability to crawl your site, whether that’s directly or through training data.
For example, Copilot doesn’t cite onlinedoctor.com because it’s not indexed in Bing. I couldn’t find indicators that this was done on purpose, so I assume it’s an accident that could quickly be fixed and rewarded with referral traffic.
On the other hand, ChatGPT 4o doesn’t cite cnet.com, and Perplexity doesn’t cite everydayhealth.com because both sites intentionally block the respective LLM in their robots.txt.
But there are also cases in which AI chatbots reference sites even though they technically shouldn’t.
The most cited domain in Perplexity in my dataset is blocked.goodrx.com. GoodRX blocks users from non-U.S. countries, and it seems it accidentally or intentionally blocks Perplexity.
Image Credit: Kevin Indig
It’s important to single out Google’s AI Overviews here: There is no opt-out for AIOs, meaning if you want to get organic traffic from Google, you need to allow it to crawl your site, potentially use your content to train its models and surface it in AI Overviews. Chegg recently filed a lawsuit against Google for this.
💡Takeaway: Monitor your site, especially if all wanted URLs are indexed, in Google Search Console and Bing Webmaster Tools.
Double-check whether you accidentally block an LLM crawler in your robots.txt or through your CDN.
If you intentionally block LLM crawlers, double-check whether you appear in their answers simply by asking them what they know about your domain.
Summary: 6 Key Learnings
Classic SEO metrics don’t strongly influence AI chatbot citations.
Content depth (higher word and sentence counts) and readability (good Flesch Score) matter more.
Different AI chatbots have distinct preferences – monitoring multiple platforms is important.
Brand popularity (measured by search volume) is the strongest predictor of brand mentions in AI chatbots, especially in ChatGPT.
Prompt structure influences brand visibility, and we don’t yet know how user phrase prompts.
Technical issues can sabotage AI visibility – ensure your site isn’t accidentally blocking LLM crawlers through robots.txt or CDN settings.
Featured Image: Paulo Bobita/Search Engine Journal
Google researchers introduced a method to improve AI search and assistants by enhancing Retrieval-Augmented Generation (RAG) models’ ability to recognize when retrieved information lacks sufficient context to answer a query. If implemented, these findings could help AI-generated responses avoid relying on incomplete information and improve answer reliability. This shift may also encourage publishers to create content with sufficient context, making their pages more useful for AI-generated answers.
Their research finds that models like Gemini and GPT often attempt to answer questions when retrieved data contains insufficient context, leading to hallucinations instead of abstaining. To address this, they developed a system to reduce hallucinations by helping LLMs determine when retrieved content contains enough information to support an answer.
Retrieval-Augmented Generation (RAG) systems augment LLMs with external context to improve question-answering accuracy, but hallucinations still occur. It wasn’t clearly understood whether these hallucinations stemmed from LLM misinterpretation or from insufficient retrieved context. The research paper introduces the concept of sufficient context and describes a method for determining when enough information is available to answer a question.
Their analysis found that proprietary models like Gemini, GPT, and Claude tend to provide correct answers when given sufficient context. However, when context is insufficient, they sometimes hallucinate instead of abstaining, but they also answer correctly 35–65% of the time. That last discovery adds another challenge: knowing when to intervene to force abstention (to not answer) and when to trust the model to get it right.
Defining Sufficient Context
The researchers define sufficient context as meaning that the retrieved information (from RAG) contains all the necessary details to derive a correct answer. The classification that something contains sufficient context doesn’t require it to be a verified answer. It’s only assessing whether an answer can be plausibly derived from the provided content.
This means that the classification is not verifying correctness. It’s evaluating whether the retrieved information provides a reasonable foundation for answering the query.
Insufficient context means the retrieved information is incomplete, misleading, or missing critical details needed to construct an answer.
Sufficient Context Autorater
The Sufficient Context Autorater is an LLM-based system that classifies query-context pairs as having sufficient or insufficient context. The best performing autorater model was Gemini 1.5 Pro (1-shot), achieving a 93% accuracy rate, outperforming other models and methods.
Reducing Hallucinations With Selective Generation
The researchers discovered that RAG-based LLM responses were able to correctly answer questions 35–62% of the time when the retrieved data had insufficient context. That meant that sufficient context wasn’t always necessary for improving accuracy because the models were able to return the right answer without it 35-62% of the time.
They used their discovery about this behavior to create a Selective Generation method that uses confidence scores and sufficient context signals to decide when to generate an answer and when to abstain (to avoid making incorrect statements and hallucinating).
The confidence scores are self-rated probabilities that the answer is correct. This achieves a balance between allowing the LLM to answer a question when there’s a strong certainty it is correct while also receiving intervention for when there’s sufficient or insufficient context for answering a question, to further increase accuracy.
The researchers describe how it works:
“…we use these signals to train a simple linear model to predict hallucinations, and then use it to set coverage-accuracy trade-off thresholds. This mechanism differs from other strategies for improving abstention in two key ways. First, because it operates independently from generation, it mitigates unintended downstream effects…Second, it offers a controllable mechanism for tuning abstention, which allows for different operating settings in differing applications, such as strict accuracy compliance in medical domains or maximal coverage on creative generation tasks.”
Takeaways
Before anyone starts claiming that context sufficiency is a ranking factor, it’s important to note that the research paper does not state that AI will always prioritize well-structured pages. Context sufficiency is one factor, but with this specific method, confidence scores also influence AI-generated responses by intervening with abstention decisions. The abstention thresholds dynamically adjust based on these signals, which means the model may choose to not answer if confidence and sufficiency are both low.
While pages with complete and well-structured information are more likely to contain sufficient context, other factors such as how well the AI selects and ranks relevant information, the system that determines which sources are retrieved, and how the LLM is trained also play a role. You can’t isolate one factor without considering the broader system that determines how AI retrieves and generates answers.
If these methods are implemented into an AI assistant or chatbot, it could lead to AI-generated answers that increasingly rely on web pages that provide complete, well-structured information, as these are more likely to contain sufficient context to answer a query. The key is providing enough information in a single source so that the answer makes sense without requiring additional research.
What are pages with insufficient context?
Lacking enough details to answer a query
Misleading
Incomplete
Contradictory
Incomplete information
The content requires prior knowledge
The necessary information to make the answer complete is scattered across different sections instead of presented in a unified response.
Google’s third party Quality Raters Guidelines (QRG) has concepts that are similar to context sufficiency. For example, the QRG defines low quality pages as those that don’t achieve their purpose well because they fail to provide necessary background, details, or relevant information for the topic.
Passages from the Quality Raters Guidelines:
“Low quality pages do not achieve their purpose well because they are lacking in an important dimension or have a problematic aspect”
“A page titled ‘How many centimeters are in a meter?’ with a large amount of off-topic and unhelpful content such that the very small amount of helpful information is hard to find.”
“A crafting tutorial page with instructions on how to make a basic craft and lots of unhelpful ‘filler’ at the top, such as commonly known facts about the supplies needed or other non-crafting information.”
“…a large amount of ‘filler’ or meaningless content…”
Even if Google’s Gemini or AI Overviews doesn’t not implement the inventions in this research paper, many of the concepts described in it have analogues in Google’s Quality Rater’s guidelines which themselves describe concepts about high quality web pages that SEOs and publishers that want to rank should be internalizing.
For more than two years, a new concept has been emerging called Agentic SEO.
The idea is to perform SEO using agents based on language models (LLMs) that perform complex tasks autonomously or semi-autonomously to save time for SEO experts.
Of course, humans remain in the loop to guide these agents and validate the results.
Today, with the advent of ChatGPT, Claude, Gemini, and other powerful LLM tools, it is easy to automate complex processes using agents.
Agentic SEO is, therefore, the use of AI agents to optimize SEO productivity. It differs from Generative Engine Optimization (GEO), which aims to improve SEO to be visible on search engines powered by LLMs such as SearchGPT, Perplexity, or AI Overviews.
This concept is based on three main levers: Ideation, Audit, and Generation.
In this first chapter, I will focus on ideation because there is so much to explore.
In our next article, we will see how this concept can be applied to auditing (full website analysis with real-time corrections), and how missing content can be generated using a “Human in the Loop” – or rather “SEO Expert in the Loop” – approach.
AI Agents And Workflows
Before presenting detailed use cases regarding ideation, it is essential to explain the concept of an agent.
AI Agent
Image from author, February 2025
AI agents need at least five key elements to function:
Tools: These are all the resources and technical functionalities available to the agent.
Memory: This is used to store all interactions so that the agent can remember information previously shared in the discussion.
Instructions: Which define its limits, its rules.
Knowledge: This is the database that contains the concepts that the agent can use to solve problems; it can use the knowledge of the LLM or external databases.
Persona: Which defines its “personality” and often its level of expertise, including, in particular, its way of interacting.
Workflow
Workflows allow complex tasks to be broken down into simpler subtasks and chained together logically.
They are useful in SEO because they facilitate the collection and manipulation of data needed to perform specific SEO actions.
Furthermore, in recent months, AI providers (OpenAI, Claude, etc.) have moved from simply offering the model as such to enriching the user experience.
For example, the Deep Research feature in ChatGPT or Perplexity is not a new model, but a workflow that allows complex searches to be performed in several steps.
This process, which would take a human several hours, is carried out by AI agents in a few tens of minutes.
Image from author, February 2025
The diagram above illustrates a simple SEO workflow that starts with “Data & Constraints,” which feeds a tool called “Tools SEO 1” to perform a specific action (such as SERP analysis or scraping).
Next, we have two AIs (IA 1 and IA 2) that intervene to generate specific content, and then comes the “HITL” (Human In The Loop) step before reaching the deliverables.
Although AI and automation play a central role, human supervision and expertise remain essential to ensure quality results.
Use-Case: Ideation
Let’s start with ideation. As you know, AI excels at opening up possibilities.
With the right methods, it is possible to push AI to explore every conceivable idea on a topic.
An SEO expert will then select, refine, and prioritize the best suggestions based on their experience.
Numerous experiments have demonstrated the positive impact of this synergy between human creativity and artificial intelligence.
Below, Ethan Mollick’s diagram posted on X (Twitter) illustrates a benchmark of the creative process with and without AI:
In a large representative sample of humans compared to GPT-4: “the creative ideas produced by AI chatbots are rated more creative [by humans ]than those created by humans… Augmenting humans with AI improves human creativity, albeit not as much as ideas created by ChatGPT alone” pic.twitter.com/rJDUZxJ4iL
The figure shows the distribution of creativity scores (from 0 to 10) assigned to different sources: ChatGPT, Bard (now Gemini), a human control group (HumanBaseline), a human group working with AI (HumanPlusAI), and another group working against AI (HumanAgainstAI).
The horizontal axis represents the perceived level of creativity, while the vertical axis indicates the frequency of each score (density).
We can see that the curve corresponding to HumanPlusAI is generally shifted to the right, meaning that evaluators consider this human+AI collaboration to be the most creative approach.
Conversely, the average scores of ChatGPT and Gemini, although high, remain below those obtained by the human-machine synergy.
Finally, the HumanBaseline group (humans alone) is just below the performance of the Human+AI duo, while the HumanAgainstAI group is the least creative.
AI alone can produce impressive results, but it is in combination with human expertise and sensitivity that the highest levels of creativity are achieved. Let me give you some concrete examples.
Tools Like Deep Research
Among the tools available, Deep Research stands out for its ability to conduct in-depth research in several steps, providing a valuable source of inspiration for ideation.
I recommend using this open-source version; if you prefer, you can also use the OpenAI or Perplexity versions.
How Does It Work?
This diagram describes the operation of the Open Source Deep Research tool.
It generates and executes search queries, crawls the resulting pages, then recursively explores promising leads, and finally produces a detailed report in Markdown format.
Image from author, February 2025
There are several steps to using Deep Research:
Enter your query: You will be asked to enter your query. You must try to be as precise as possible. Do not hesitate to ask ChatGPT or Claude to create your DeepResearch search.
Specify the depth of the search (recommended: between 3 and 10, default: 6): How many topics can be found in each iteration?
Specify the depth of exploration (recommended: between 1 and 5, default: 3): If the crawler finds an interesting topic, how many pages deep will it explore?
Refinement: Sometimes, you need to answer follow-up questions to refine the direction of the search.
With this open-source version, you can turn this open-source project into a real SEO tool. I have identified more than four use cases:
Competitor Content Analysis: The tool can automate the collection and analysis of competitors’ content to identify their strategies and spot opportunities for differentiation.
Long-Tail Keyword Research: By analyzing the web, it can identify specific keywords with high potential and less competition, facilitating content optimization.
SERP Analysis: It can collect and analyze search engine results to understand trends and competitors’ positioning.
Content Idea Generation: Based on in-depth research, it can identify relevant topics and frequently asked questions in a given niche.
For example, you can install CursorAI, a code generation tool, and ask it to modify the code to create a SERP analysis. The tool will easily make all the necessary changes.
With Agentic SEO, it is possible not only to customize and improve existing tools but, more importantly, to create your own tool to suit your specific needs.
On the other hand, if you are not a developer at all, I advise you to use a no-code solution.
No-Code Agent Workflow Tools
Here is an example of a no-code tool called Dng.ai.
We use a CSV file provided by Moz, which we analyze using an agent capable of processing the data, generating Python code, and extracting all the necessary information.
In blue, you have the input fields that serve as a starting point; then, in orange, you have tools like scrapers, crawlers, and keyword tools to extract all the necessary data; and finally, in purple, you have the AIs that identify all the clusters that need to be created.
Image from author, February 2025
The agent then compares this data with the topics already on your site to identify missing content.
Finally, it generates a complete list of topics to create, ensuring optimal coverage of your SEO strategy. There are many no-code tools for building Agentic workflows.
I won’t list them all, but as you can see here on this tool, an interface is automatically generated from the workflow, and all you need to do is specify your topic and a URL and press the run button to get the results in less than two minutes.
Image from author, February 2025
Explore The Full Potential Of This Tool For Yourself
I leave you to appreciate the results of a tool that is built from the SEO data of any tool.
Image from author, February 2025
I think I could have made more than two hours of video on YouTube just on the ideation aspect, as there is so much to say and test.
I now invite you to explore the full potential of these tools and experiment with them to optimize your SEO strategy, and next time, I will cover audit use cases with Agentic SEO.
Google announced it will make its Deep Research feature available to all users for free on a limited basis, while introducing several updates to Gemini.
With this rollout, Gemini is now equipped with enhanced reasoning capabilities, personalization features, and expanded app connectivity.
Free Access with Limitations
Google’s Deep Research tool, which processes information from multiple websites and documents, will now be accessible to non-paying users “a few times a month.”
Gemini Advanced subscribers will continue to have more extensive access to the feature.
The company describes Deep Research as an AI research assistant that searches and synthesizes web information.
Google reports the feature has been updated with its Flash Thinking 2.0 model, which displays its reasoning process while browsing.
Google stated in its announcement:
“Gemini users can try Deep Research a few times a month at no cost, and Gemini Advanced users get expanded access to Deep Research.”
The feature is rolling out in more than 45 languages.
Model Updates
The Flash Thinking 2.0 model has been updated to include file upload capabilities and faster processing speeds.
For paid subscribers, the system now processes up to 1 million tokens in a context window.
Dave Citron, Senior Director of Product Management for the Gemini app, stated in the announcement that the updated model is “trained to break down prompts into a series of steps to strengthen its reasoning capabilities.”
Testing has shown the system can still make errors in both analysis and conclusions, the company acknowledged.
Additional Features
Google also announced a new experimental personalization feature that connects with users’ Google apps and services. The feature uses data from search history to provide tailored responses to queries such as restaurant recommendations.
Additional app integrations now include Calendar, Notes, Tasks, and Photos, allowing users to make requests involving multiple applications. Google Photos integration is planned for the coming weeks.
Lastly, announced that its Gems feature, which lets users create customized AI assistants for specific topics, is now available to all users at no cost.
These updates are available now at gemini.google.com.
Featured Image: Screenshot from blog.google.com, March 2025.