Google’s SAGE Agentic AI Research: What It Means For SEO via @sejournal, @martinibuster

Google published a research paper about creating a challenging dataset for training AI agents for deep research. The paper offers insights into how agentic AI deep research works, which implies insights for optimizing content.

The acronym SAGE stands for Steerable Agentic Data Generation for Deep Search with Execution Feedback.

Synthetic Question And Answer Pairs

The researchers noted that the previous state of the art AI training datasets (like Musique and HotpotQA) required no more than four reasoning steps in order to answer the questions. On the number of searches needed to answer a question, Musique averages 2.7 searches per question and HotpotQA averaged 2.1 searches. Another commonly used dataset named Natural Questions (NQ) only required an average of 1.3 searches per question.

These datasets that are used to train AI agents created a training gap for deep search tasks that required more reasoning steps and a greater number of searches. How can you train an AI agent for complex real-world deep search tasks if the AI agents haven’t been trained to tackle genuinely difficult questions.

The researchers created a system called SAGE that automatically generates high-quality, complex question-answer pairs for training AI search agents. SAGE is a “dual-agent” system where one AI writes a question and a second “search agent” AI tries to solve it, providing feedback on the complexity of the question.

  • The goal of the first AI is to write a question that’s challenging to answer and requires many reasoning steps and multiple searches to solve.
  • The goal of the second AI is try to measure if the question is answerable and calculate how difficult it is (minimum number of search steps required).

The key to SAGE is that if the second AI solves the question too easily or gets it wrong, the specific steps and documents it found (the execution trace) is fed back to the first AI. This feedback enables the first AI to identify one of four shortcuts that enable the second AI to solve the question in fewer steps.

It’s these shortcuts that provide insights into how to rank better for deep research tasks.

Four Ways That Deep Research Was Avoided

The goal of the paper was to create a set of question and answer pairs that were so difficult that it took the AI agent multiple steps to solve. The feedback showed four ways that made it less necessary for the AI agent to do additional searches to find an answer.

Four Reasons Deep Research Was Unnecessary

  1. Information Co-Location
    This is the most common shortcut, accounting for 35% of the times when deep research was not necessary. This happens when two or more pieces of information needed to answer a question are located in the same document. Instead of searching twice, the AI finds both answers in one “hop”.
  2. Multi-query Collapse
    This happened in 21% of cases. The cause is when a single, clever search query retrieves enough information from different documents to solve multiple parts of the problem at once. This “collapses” what should have been a multi-step process into a single step.
  3. Superficial Complexity
    This accounts for 13% of times when deep research was not necessary. The question looks long and complicated to a human, but a search engine (that an AI agent is using) can jump straight to the answer without needing to reason through the intermediate steps.
  4. Overly Specific Questions
    31% of the failures are questions that contain so much detail that the answer becomes obvious in the very first search, removing the need for any “deep” investigation.

The researchers found that some questions look hard but are actually relatively easy because the information is “co-located” in one document. If an agent can answer a 4-hop question in 1 hop because one website was comprehensive enough to have all the answers, that data point is considered a failure for training the agent for reasoning but it’s still something that can happen in real-life and the agent will take advantage of finding all the information on one page.

SEO Takeaways

It’s possible to gain some insights into what kinds of content satisfies the deep research. While these aren’t necessarily tactics for ranking better in agentic AI deep search, these insights do show what kinds of scenarios caused the AI agents to find all or most of the answers in one web page.

“Information Co-location” Could Be An SEO Win
The researchers found that when multiple pieces of information required to answer a question occur in the same document, it reduces the number of search steps needed. For a publisher, this means consolidating “scattered” facts into one page prevents an AI agent from having to “hop” to a competitor’s site to find the rest of the answer.

Triggering “Multi-query Collapse”
The authors identified a phenomenon where information from different documents can be retrieved using a single query. By structuring content to answer several sub-questions at once, you enable the agent to find the full solution on your page faster, effectively “short-circuiting” the long reasoning chain the agent was prepared to undertake.

Eliminating “Shortcuts” (The Reasoning Gap)
The research paper notes that the data generator fails when it accidentally creates a “shortcut” to the answer. As an SEO, your goal is to be that shortcut—providing the specific data points like calculations, dates, or names that allow the agent to reach the final answer without further exploration.

The Goal Is Still To Rank In Classic Search

For an SEO and a publisher, these shortcuts underline the value of creating a comprehensive document because it will remove the need for an AI agent from getting triggered to hop somewhere else. This doesn’t mean it will be helpful to add all the information in one page. If it makes sense for a user it may be useful to link out from one page to another page for related information.

The reason I say that is because the AI agent is conducting classic search looking for answers, so the goal remains to optimize a web page for classic search. Furthermore, in this research, the AI agent is pulling from the top three ranked web pages for each query that it’s executing. I don’t know if this is how agentic AI search works in a live environment, but this is something to consider.

In fact, one of the tests that the researchers did was conducted using the Serper API to extract search results from Google.

So when it comes to ranking in agentic AI search, consider these takeaways:

  • It may be useful to consider the importance of ranking in the top three.
  • Do optimize web pages for classic search.
  • Do not optimize web pages for AI search
  • If it’s possible to be comprehensive, remain on-topic, and rank in the top three, then do that.
  • Interlink to relevant pages to help those rank in classic search, preferably in the top three (to be safe).

It could be that agentic AI search will consider pulling from more than the top three in classic search. But it may be helpful to set the goal of ranking for the top 3 in classic search and to focus on ranking other pages that may be a part of the multi-hop deep research.

The research paper was published by Google on January 26, 2026. It’s available in PDF form:  SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback.

Featured Image by Shutterstock/Shutterstock AI Generator

Chrome Updated With 3 AI Features Including Nano Banana via @sejournal, @martinibuster

Gemini in Chrome has just been refreshed with three new features that integrate more Gemini capabilities within Chrome for Windows, MacOS, and Chromebook Plus. The update adds an AI side panel, agentic AI Auto Browse, and Nano Banana image editing of whatever image is in the browser window.

AI Side Panel For Multitasking

Chrome adds a new side panel that enables users to slide open a side panel to open up a session with Gemini without having to jump around across browser tabs. The feature is described as a way to save time by making it easier to multitask.

Google explains:

“Our testers have been using it for all sorts of things: comparing options across too-many-tabs, summarizing product reviews across different sites, and helping find time for events in even the most chaotic of calendars.”

Opt-In Requirement For AI Chat

Before enabling the side panel AI chat feature, a user must first consent to sending their URLs and browser data back to Google.

Screenshot Of Opt-In Form

Nano Banana In Chrome

Using the AI side panel, users can tell it to update and change an image in the browser window without having to do any copying, downloading, or uploading. Nano banana will change it right there in the open browser window.

Chrome Autobrowse (Agentic AI)

This feature is for subscribers of Google’s AI Pro and Ultra tiers. Autobrowse enables an agentic AI to take action on behalf of the user. It’s described as being able to researching hotel and flights and doing cost comparisons across a given range of dates, obtaining quotes for work, and checking if bills are paid.

Autobrowse is multimodal which means that it can identify items in a photo then go out and find where they can be purchased and add them to a cart, including adding any relevant discount codes. If given permission, the AI agent can also access passwords and log in to online stores and services.

Adds More Features To Existing Ones

Google announced on January 12, 2026 that Chrome’s AI was upgraded with app connections, able to connect to Calendar, Gmail,Google Shopping, Google Flights, Maps, and YouTube. This is part of Google’s Personal Intelligence initiative, which it said is Google’s first step toward a more personalized AI assistant.

Personalization And User Intent Extraction For AI Chat And Agents

On a related note, Google recently published a research paper that shows how an on-device and in-browser AI can extract a user’s intent so as to provide better personalized and proactive responses, pointing to how on-device AI may be used in the near future. Read Google’s New User Intent Extraction Method.

Featured Image by Shutterstock/f11photo

Google May Let Sites Opt Out Of AI Search Features via @sejournal, @MattGSouthern

Google says it’s exploring updates that could let websites opt out of AI-powered search features specifically.

The blog post came the same day the UK’s Competition and Markets Authority opened a consultation on potential new requirements for Google Search, including controls for websites to manage their content in Search AI features.

Ron Eden, Principal, Product Management at Google, wrote:

“Building on this framework, and working with the web ecosystem, we’re now exploring updates to our controls to let sites specifically opt out of Search generative AI features.”

Google provided no timeline, technical specifications, or firm commitment. The post frames this as exploration, not a product roadmap.

What’s New

Google currently offers several controls for how content appears in Search, but none cleanly separate AI features from traditional results.

Google-Extended lets publishers block their content from training Gemini and Vertex AI models. But Google’s documentation states Google-Extended doesn’t impact inclusion in Google Search and isn’t a ranking signal. It controls AI training, not AI Overviews appearance.

The nosnippet and max-snippet directives do apply to AI Overviews and AI Mode. But they also affect traditional snippets in regular search results. Publishers wanting to limit AI feature exposure currently lose snippet visibility everywhere.

Google’s post acknowledges this gap exists. Eden wrote:

“Any new controls need to avoid breaking Search in a way that leads to a fragmented or confusing experience for people.”

Why This Matters

I wrote in SEJ’s SEO Trends 2026 ebook that people would have more influence on the direction of search than platforms do. Google’s post suggests that dynamic is playing out.

Publishers and regulators have spent the past year pushing back on AI Overviews. The UK’s Independent Publishers Alliance, Foxglove, and Movement for an Open Web filed a complaint with the CMA last July, asking for the ability to opt out of AI summaries without being removed from search entirely. The US Department of Justice and South African Competition Commission have proposed similar measures.

The BuzzStream study we covered earlier this month found 79% of top news publishers block at least one AI training bot, and 71% block retrieval bots that affect AI citations. Publishers are already voting with their robots.txt files.

Google’s post suggests it’s responding to pressure from the ecosystem by exploring controls it previously didn’t offer.

Looking Ahead

Google’s language is cautious. “Exploring” and “working with the web ecosystem” are not product commitments.

The CMA consultation will gather input on potential requirements. Regulatory processes move slowly, but they do produce outcomes. The EU’s Digital Markets Act investigations have already pushed Google to make changes in Europe.

For now, publishers wanting to limit AI feature exposure can use nosnippet or max-snippet directives, but note that these affect traditional snippets as well. Google’s robots meta tag documentation covers the current options.

If Google follows through on specific opt-out controls, the technical implementation will matter. Whether it’s a new robots directive, a Search Console setting, or something else will determine how practical it is for publishers to use.


Featured Image: ANDRANIK HAKOBYAN/Shutterstock

New Yahoo Scout AI Search Delivers The Classic Search Flavor People Miss via @sejournal, @martinibuster

Yahoo has announced Yahoo Scout, a new AI-powered answer engine now available in beta to users in the United States, providing a clean Classic Search experience with the power of personalized AI. The launch also includes the Yahoo Scout Intelligence Platform, which brings AI features across Yahoo’s core products, including Mail, News, Finance, and Sports.

Screenshot Of Yahoo Scout

Yahoo’s Existing Products and User Reach

Yahoo’s announcement states that it operates some of the most popular websites and services in the United States, reaching what they say is 90% of all internet users in the United States (based on Comscore data), through its email, news, finance, and sports properties. The company says that Yahoo Scout builds on the foundation of decades of search behavior and user interaction data.

How Yahoo Scout Generates Answers

Yahoo has partnered with Anthropic to use the Claude model as the primary AI system behind Yahoo Scout. Yahoo’s announcement said it selected Claude for speed, clarity, judgment, and safety, which it described as essential qualities for a consumer-facing answer engine. Yahoo also continues its partnership with Microsoft by using Microsoft Bing’s grounding API, which connects AI-generated answers to information from across the open web. Yahoo said this approach ensures that answers are informed by authoritative sources rather than unsupported text generation.

According to Yahoo, Scout relies on a combination of traditional web search and generative AI to produce answers that are grounded using Microsoft Bing’s grounding API and informed by sources from across the open web.

According to  Yahoo:

“It’s informed by 500 million user profiles, a knowledge graph spanning more than 1 billion entities, and 18 trillion consumer events that occur annually across Yahoo, which allow Yahoo Scout to provide effective and personalized answers and suggested actions.”

Yahoo’s announcement says that this data, its use of Claude, and reliance on Bing for grounding work together to provide responses to answers that are personalized and helpful for researching and making decisions in the “moments that matter” to people.

They explain:

“Yahoo Scout continues Yahoo’s focus on the moments that matter to people’s daily lives, such as understanding upcoming weather patterns before a vacation, getting details about an important game, tracking stock price movements after earnings, comparing products before buying, or fact-checking a news story.”

Where Yahoo Scout Appears Inside Yahoo Products

The Yahoo Scout Intelligence Platform embeds these AI capabilities directly into Yahoo’s existing services.

For example:

  • In Yahoo Mail, Scout supports AI-generated message summaries.
  • In Yahoo Sports, it produces game breakdowns.
  • In Yahoo News, it surfaces key takeaways.
  • In Yahoo Finance, Scout adds interactive tools for analysis that allow readers to explore market news and stock performance context through AI-powered questions.

According to Eric Feng, Senior Vice President and General Manager of Yahoo Research Group:

“Yahoo’s deep knowledge base, 30 years in the making, allows us to deliver guidance that our users can trust and easily understand, and will become even more personalized over the coming months. Yahoo Scout now powers a new generation of intelligence experiences across Yahoo, seamlessly integrated into the products people use every day.”

What Yahoo Says Comes Next

Yahoo said Scout will continue to develop over the coming months. Planned updates include deeper personalization, expanded capabilities within specific verticals, and new formats for search advertising designed to work in generative AI search. The company did not provide a timeline for when the beta period will end or when additional features will move beyond testing.

Yahoo explained:

“Yahoo Scout will continue to evolve in the months ahead, expanding to power new products across Yahoo. In particular, the new answer engine will become more personalized, will add new capabilities focused on deeper experiences within key verticals, and will introduce new, improved opportunities for search advertisers to effectively cross the chasm to generative AI search advertising. “

Yahoo’s Search Experience

Something that’s notable about Yahoo’s AI answer engine experience is how clean and straightforward it is. It’s like a throwback to classic search but with the sophistication of AI answers.

For example, I asked it to give me information on where I can buy an esoteric version of a Levi’s trucker jacket in a specific color (Midnight Harvest) and it presented a clean summary of where to get it, a table with a list of retailers ordered by the lowest prices.

Screenshot Of Yahoo Scout

Notice that there are no product images? It’s just giving me the prices. I don’t know if that’s because they don’t have a product feed but I already know what the jacket looks like in the color I specified so images aren’t really necessary.  This is what I mean when I say that Yahoo Scout offers that Classic Search flavor without the busy overly fussy search experience that Google has been providing lately.

With Yahoo Scout, the company is applying AI systems to tasks its users perform when they search for, read, or compare information online. Rather than positioning AI as a replacement for search or content platforms, Yahoo is using it as a tool that organizes, summarizes, and explains information in a clean and easy to read format.

Yahoo Scout is easy to like because it delivers the clean and uncluttered search experience that many people miss.

Check out Yahoo Scout at scout.yahoo.com

The Yahoo Scout app is available for Android and Apple devices.

Google AI Overviews Now Powered By Gemini 3 via @sejournal, @MattGSouthern

Google is making Gemini 3 the default model for AI Overviews in markets where the feature is available and adding a direct path into AI Mode conversations.

The updates, shared in a Google blog post, bring Gemini 3’s reasoning capabilities to AI Overviews. Google says the feature now reaches over one billion users.

What’s New

Gemini 3 For AI Overviews

The Gemini 3 upgrade brings the same reasoning capabilities to AI Overviews that previously powered AI Mode.

Robby Stein, VP of Product for Google Search, wrote:

“We’re rolling out Gemini 3 as the default model for AI Overviews globally, so even more people will be able to access best-in-class AI responses, directly in the results page for questions where it’s helpful.”

Gemini 3 launched in November, and Google shipped it to AI Mode on release day. This expands Gemini 3 from AI Mode into AI Overviews as the default.

AI Overview To AI Mode Transition

You can now ask a follow-up question right from an AI Overview and continue into AI Mode. The context from the original response carries into the conversation, so you don’t start over.

Stein described the thinking behind the change:

“People come to Search for an incredibly wide range of questions – sometimes to find information quickly, like a sports score or the weather, where a simple result is all you need. But for complex questions or tasks where you need to explore a topic deeply, you should be able to seamlessly tap into a powerful conversational AI experience.”

He called the result “one fluid experience with prominent links to continue exploring.”

An earlier test of this flow ran globally on mobile back in December.

In testing, Google found people prefer this kind of natural flow into conversation. The company also found that keeping AI Overview context in follow-ups makes Search more helpful.

Why This Matters

The pattern has held since AI Overviews launched. Each update makes it easier to stay within AI-powered responses.

When Gemini 3 arrived in AI Mode, it brought deeper query fan-out and dynamic response layouts. AI Overviews running on the same model could produce different citation patterns.

That makes today’s update an important one to monitor. Model changes can affect which pages get cited and how responses are structured.

Looking Ahead

Google says the updates are rolling out starting today, though availability may vary by market.

Google previously indicated plans to add automatic model selection that routes complex questions to Gemini 3 while using faster models for simpler tasks. Whether that affects AI Overviews beyond today’s default model change isn’t specified.


Featured Image: Darshika Maduranga/Shutterstock

WP Go Maps Plugin Vulnerability Affects Up To 300K WordPress Sites via @sejournal, @martinibuster

A security advisory was published about a vulnerability affecting the WP Go Maps plugin for WordPress installed on over 300,000 websites. The flaw enables authenticated subscribers to modify map engine settings.

WP Go Maps Plugin

The WP Go Maps plugin is used by local business WordPress sites to display customizable maps on pages and posts, including contact page maps, delivery areas, and store locations. Site owners can manage map markers and map settings without writing code.

The plugin had four vulnerabilities in 2025 and seven vulnerabilities in 2024. Vulnerabilities were discovered in the previous years stretching back to 2019 but not as often.

Vulnerability

The vulnerability can be exploited by authenticated attackers with Subscriber-level access or higher. The Subscriber role is the lowest WordPress permission role. This means an attacker only needs a basic user account to exploit the issue but only if that account level is offered to users on affected websites.

The vulnerability is caused by a missing capability check in the plugin’s processBackgroundAction() function. A capability check is used to verify whether a logged-in user is allowed to perform a specific action. Because this check is missing, the function processes requests from users who do not have permission to change plugin settings.

As a result, authenticated attackers with Subscriber-level access can modify global map engine settings used by the plugin. These settings apply site-wide and affect how the plugin functions across the website.

Wordfence described the vulnerability as an unauthorized modification of data caused by a missing capability check. In practice, this means the plugin allows low-privileged users to change global settings that should be restricted to administrators.

The Wordfence advisory explains:

“The WP Go Maps (formerly WP Google Maps) plugin for WordPress is vulnerable to unauthorized modification of data due to a missing capability check on the processBackgroundAction() function in all versions up to, and including, 10.0.04. This makes it possible for authenticated attackers, with Subscriber-level access and above, to modify global map engine settings”

Any site running an affected version of the plugin with subscriber level registration enabled is exposed to authenticated attackers.

The vulnerability affects all versions of WP Go Maps up to and including version 10.0.04. A patch is available. Site owners are recommended to update the WP Go Maps plugin to version 10.0.05 or newer to fix the vulnerability.

Featured Image by Shutterstock/Dean Drobot

Sam Altman Says OpenAI “Screwed Up” GPT-5.2 Writing Quality via @sejournal, @MattGSouthern

Sam Altman said OpenAI “screwed up” GPT-5.2’s writing quality during a developer town hall Monday evening.

When asked about user feedback that GPT-5.2 produces writing that’s “unwieldy” and “hard to read” compared to GPT-4.5, Altman was blunt.

He said:

“I think we just screwed that up. We will make future versions of GPT 5.x hopefully much better at writing than 4.5 was.”

Altman explained that OpenAI made a deliberate choice to focus GPT-5.2’s development on technical capabilities:

“We did decide, and I think for good reason, to put most of our effort in 5.2 into making it super good at intelligence, reasoning, coding, engineering, that kind of thing. And we have limited bandwidth here, and sometimes we focus on one thing and neglect another.”

How OpenAI Positioned Each Model

The contrast between GPT-4.5 and GPT-5.2 shows where OpenAI focused its resources.

When OpenAI introduced GPT-4.5 in February 2025, the company emphasized natural interaction and writing. OpenAI said interacting with GPT-4.5 “feels more natural” and called it “useful for tasks like improving writing.”

GPT-5.2’s announcement took a different direction. OpenAI positioned it as the most capable model series yet for professional knowledge work, with improvements in creating spreadsheets, building presentations, writing code, and handling complex, multi-step projects.

The release post spotlights spreadsheets, presentations, tool use, and coding. Writing appears more briefly, with technical writing noted as an improvement for GPT-5.2 Instant. But Altman’s comments suggest the overall writing experience still fell short for users comparing it to GPT-4.5.

Why This Matters

We’ve covered the iterative changes to ChatGPT since GPT-5 launched in August, including updates to warmth and tone and the GPT-5.1 instruction-following improvements. OpenAI regularly adjusts model behavior based on user feedback, and regressions in one area while improving another aren’t new.

What’s unusual is hearing Altman acknowledge a tradeoff this directly. For anyone using ChatGPT output in client-facing work, drafts, or polished writing, this explains why outputs may have changed. Model upgrades don’t guarantee improvement across every capability.

If you rely on ChatGPT for writing, treat model updates like any other dependency change. Re-test your prompts when defaults change, and keep a fallback if output quality matters for your workflow.

Looking Ahead

Altman said he believes “the future is mostly going to be about very good general purpose models” and that even coding-focused models should “write well, too.”

No timeline was given for when GPT-5.x writing improvements will ship. OpenAI typically iterates on model behavior through point releases, so changes could arrive gradually rather than in a single update.

Hear Altman’s full statement in the video below:


Featured Image: FotoField/Shutterstock

Why Google Gemini Has No Ads Yet: ‘Trust In Your Assistant’ via @sejournal, @MattGSouthern

Google DeepMind CEO Demis Hassabis said Google doesn’t have any current plans to introduce advertising into its Gemini AI assistant, citing unresolved questions about user trust.

Speaking at the World Economic Forum in Davos, Hassabis said AI assistants represent a different product than search. He believes Gemini should be built for users first.

“In the realm of assistants, if you think of the chatbot as an assistant that’s meant to be helpful and ideally in my mind, as they become more powerful, the kind of technology that works for you as the individual,” Hassabis said in an interview with Axios. “That’s what I’d like to see with these systems.”

He said no one in the industry has figured out how advertising fits into that model.

“There is a question about how does ads fit into that model, where you want to have trust in your assistant,” Hassabis said. “I think no one’s really got a full answer to that yet.”

When asked directly about Google’s plans, Hassabis said: “We don’t have any current plans to do it ourselves.”

What Hassabis Said About OpenAI

The comments came days after OpenAI said it plans to begin testing ads in ChatGPT in the coming weeks for logged-in adults in the U.S. on free and Go tiers.

Hassabis said he was “a little bit surprised they’ve moved so early into that.”

He acknowledged advertising has funded much of the consumer internet and can be useful to users when done well. But he warned that poor execution in AI assistants could damage user relationships.

“I think it can be done right, but it can also be done in a way that’s not good,” Hassabis said. “In the end, what we want to do is be the most useful we can be to our users.”

Search Is Different

Hassabis drew a line between AI assistants and search when discussing advertising.

When asked whether his comments applied to Google Search, where the company already shows ads in AI Overviews, he said the two products work differently.

“But there it’s completely different use case because you’ve already just like how it’s always worked with search, you’ve already, you know, we know what your intent is basically and so we can be helpful there,” Hassabis said. “That’s a very different construct.”

Google began rolling out ads in AI Overviews in October 2024 and has continued expanding them since. The company claims AI Overviews generate ad revenue equal to traditional search results.

Why This Matters

This is the second time in two months that a Google executive has said Gemini ads aren’t currently planned.

In December, Google Ads VP Dan Taylor disputed an Adweek report claiming the company had told advertisers to expect Gemini ads in 2026. Taylor called that report “inaccurate” and said Google has “no current plans” to monetize the Gemini app.

Hassabis’s comments reinforce that position but go further by explaining the reasoning. His “technology that works for you” framing suggests Google sees a tension between advertising and the assistant relationship it wants Gemini to build.

Looking Ahead

Google is comfortable expanding ads where user intent is explicit, like search queries triggering AI Overviews. The company is holding back where intent is less defined and the relationship is more personal.

How long Google maintains its current position depends in part on how users respond to advertising in rival assistants.


Featured Image: Screenshot from: youtube.com/@axios, January 2026. 

Google’s New User Intent Extraction Method via @sejournal, @martinibuster

Google published a research paper on how to extract user intent from user interactions that can then be used for autonomous agents. The method they discovered uses on-device small models that do not need to send data back to Google, which means that a user’s privacy is protected.

The researchers discovered they were able to solve the problem by splitting it into two tasks. Their solution worked so well it was able to beat the base performance of multi-modal large language models (MLLMs) in massive data centers.

Smaller Models On Browsers And Devices

The focus of the research is on identifying the user intent through the series of actions that a user takes on their mobile device or browser while also keeping that information on the device so that no information is sent back to Google. That means the processing must happen on the device.

They accomplished this in two stages.

  1. The first stage the model on the device summarizes what the user was doing.
  2. The sequence of summaries are then sent to a second model that identifies the user intent.

The researchers explained:

“…our two-stage approach demonstrates superior performance compared to both smaller models and a state-of-the-art large MLLM, independent of dataset and model type.
Our approach also naturally handles scenarios with noisy data that traditional supervised fine-tuning methods struggle with.”

Intent Extraction From UI Interactions

Intent extraction from screenshots and text descriptions of user interactions was a technique that was proposed in 2025 using Multimodal Large Language Models (MLLMs). The researchers say they followed this approach to their problem but using an improved prompt.

The researchers explained that extracting intent is not a trivial problem to solve and that there are multiple errors that can happen along the steps. The researchers use the word trajectory to describe a user journey within a mobile or web application, represented as a sequence of interactions.

The user journey (trajectory) is turned into a formula where each interaction step consists of two parts:

  1. An Observation
    This is the visual state of the screen (screenshot) of where the user is at that step.
  2. An Action
    The specific action that the user performed on that screen (like clicking a button, typing text, or clicking a link).

They described three qualities of a good extracted intent:

  • “faithful: only describes things that actually occur in the trajectory;
  • comprehensive: provides all of the information about the user intent required to re-enact the trajectory;
  • and relevant: does not contain extraneous information beyond what is needed for comprehensiveness.”

Challenging To Evaluate Extracted Intents

The researchers explain that grading extracted intent is difficult because user intents contain complex details (like dates or transaction data) and the user intents are inherently subjective, containing ambiguities, which is a hard problem to solve. The reason trajectories are subjective is because the underlying motivations are ambiguous.

For example, did a user choose a product because of the price or the features? The actions are visible but the motivations are not. Previous research shows that intents between humans matched 80% on web trajectories and 76% on mobile trajectories, so it’s not like a given trajectory can always indicate a specific intent.

Two-Stage Approach

After ruling out other methods like Chain of Thought (CoT) reasoning (because small language models struggled with the reasoning), they chose a two-stage approach that emulated Chain of Thought reasoning.

The researchers explained their two-stage approach:

“First, we use prompting to generate a summary for each interaction (consisting of a visual screenshot and textual action representation) in a trajectory. This stage is
prompt-based as there is currently no training data available with summary labels for individual interactions.

Second, we feed all of the interaction-level summaries into a second stage model to generate an overall intent description. We apply fine-tuning in the second stage…”

The First Stage: Screenshot Summary

The first summary, for the screenshot of the interaction, they divide the summary into two parts, but there is also a third part.

  1. A description of what’s on the screen.
  2. A description of the user’s action.

The third component (speculative intent) is a way to get rid of speculation about the user’s intent, where the model is basically guessing at what’s going on. This third part is labeled “speculative intent” and they actually just get rid of it. Surprisingly, allowing the model to speculate and then getting rid of that speculation leads to a higher quality result.

The researchers cycled through multiple prompting strategies and this was the one that worked the best.

The Second Stage: Generating Overall Intent Description

For the second stage, the researchers fine tuned a model for generating an overall intent description. They fine tuned the model with training data that is made up of two parts:

  1. Summaries that represent all interactions in the trajectory
  2. The matching ground truth that describes the overall intent for each of the trajectories.

The model initially tended to hallucinate because the first part (input summaries) are potentially incomplete, while the “target intents” are complete. That caused the model to learn to fill in the missing parts in order to make the input summaries match the target intents.

They solved this problem by “refining” the target intents by removing details that aren’t reflected in the input summaries. This trained the model to infer the intents based only on the inputs.

The researchers compared four different approaches and settled on this approach because it performed so well.

Ethical Considerations And Limitations

The research paper ends by summarizing potential ethical issues where an autonomous agent might take actions that are not in the user’s interest and stressed the necessity to build the proper guardrails.

The authors also acknowledged limitations in the research that might limit generalizability of the results. For example, the testing was done only on Android and web environments, which means that the results might not generalize to Apple devices. Another limitation is that the research was limited to users in the United States in the English language.

There is nothing in the research paper or the accompanying blog post that suggests that these processes for extracting user intent are currently in use. The blog post ends by communicating that the described approach is helpful:

“Ultimately, as models improve in performance and mobile devices acquire more processing power, we hope that on-device intent understanding can become a building block for many assistive features on mobile devices going forward.”

Takeaways

Neither the blog post about this research or the research paper itself describe the results of these processes as something that might be used in AI search or classic search. It does mention the context of autonomous agents.

The research paper explicitly mentions the context of an autonomous agent on the device that is observing how the user is interacting with a user interface and then be able to infer what the goal (the intent) of those actions are.

The paper lists two specific applications for this technology:

  1. Proactive Assistance:
    An agent that watches what a user is doing for “enhanced personalization” and “improved work efficiency”.
  2. Personalized Memory
    The process enables a device to “remember” past activities as an intent for later.

Shows The Direction Google Is Heading In

While this might not be used right away, it shows the direction that Google is heading, where small models on a device will be watching user interactions and sometimes stepping in to assist users based on their intent. Intent here is used in the sense of understanding what a user is trying to do.

Read Google’s blog post here:

Small models, big results: Achieving superior intent extraction through decomposition

Read the PDF research paper:

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition (PDF)

Featured Image by Shutterstock/ViDI Studio

BuddyPress WordPress Vulnerability May Impact Up To 100,000 Sites via @sejournal, @martinibuster

A newly disclosed security vulnerability waffects the BuddyPress plugin, a WordPress plugin installed in over 100,000 websites. The vulnerability, given a threat level rating of 7.3 (high),  enables unauthenticated attackers to execute arbitrary shortcodes.

BuddyPress WordPress Plugin

The BuddyPress plugin enables WordPress sites to create community features such as user profiles, activity streams, private messaging, and groups. It is commonly used on membership sites and online communities and is installed on more than 100,000 WordPress websites.

BuddyPress has a good track record with regard to vulnerabilities. There was only one vulnerability reported for the entire year of 2025, which was a relatively mild medium threat vulnerability, ranked at a 5.3 threat level on a scale of 1-10.

Unauthenticated Arbitrary Shortcode Execution

The vulnerability can be exploited by unauthenticated attackers. An attacker does not need a WordPress account or any level of user access to trigger the issue.

The BuddyPress plugin is vulnerable to arbitrary shortcode execution in all versions up to and including 14.3.3. That means that an attacker can execute shortcodes on the website. Shortcodes are used by WordPress to add dynamic functionality to pages and posts. Because the plugin does not properly validate input before executing shortcodes, attackers can cause the site to run shortcodes they are not authorized to use.

The vulnerability is caused by missing validation before user-supplied input is passed to the do_shortcode function.

Wordfence described the issue:

“The The BuddyPress plugin for WordPress is vulnerable to arbitrary shortcode execution in all versions up to, and including, 14.3.3. This is due to the software allowing users to execute an action that does not properly validate a value before running do_shortcode. This makes it possible for unauthenticated attackers to execute arbitrary shortcodes.”

This means that attackers can trigger a shortcode which in turn will carry out whatever action it is supposed to run, which in the worst case scenario could expose restricted site features or functionality. Depending on the shortcodes available on a site, this can enable attackers to access sensitive information, modify site content, or interact with other plugins in unintended ways.

The vulnerability does not depend on special server settings or optional configurations. Any site running a vulnerable version of the plugin is affected.

The issue was patched in BuddyPress version 14.3.4. Users of the plugin should update to version 14.3.4 or newer to fix the vulnerability.

Featured Image by Shutterstock/Login