Google Announces AI Mode Checkout Protocol, Business Agent via @sejournal, @MattGSouthern

Google announced tools that let shoppers complete purchases directly within AI Mode and chat with branded AI agents in Search results.

Users can purchase from eligible product listings on Google. Retailers are still the seller of record, while the checkout happens on Google surfaces instead of the retailer’s website.

Universal Commerce Protocol Powers AI Mode Checkout

Google launched the Universal Commerce Protocol, an open standard for what it calls “agentic commerce.” The protocol will power checkout on eligible Google product listings in AI Mode in Search and the Gemini app.

Google developed UCP with Shopify, Etsy, Wayfair, Target, and Walmart. More than 20 additional companies endorsed it, including Adyen, American Express, Best Buy, Mastercard, Stripe, The Home Depot, and Visa.

Shoppers will use Google Pay with payment methods and shipping info from Google Wallet. PayPal support is coming. UCP checkout starts with eligible U.S. retailers, with global expansion planned.

Business Agent Brings Branded Chat To Search

Business Agent lets shoppers chat with brands in Search results. Google describes it as a “virtual sales associate” that can answer product questions in the brand’s voice.

The feature goes live January 12 with Lowe’s, Michael’s, Poshmark, Reebok, and others. Eligible U.S. retailers can activate and customize the agent through Merchant Center.

Google plans to add capabilities for training agents on retailer data, providing product offers, and enabling purchases within the chat experience.

Direct Offers Pilot Tests Ads In AI Mode

Google also announced Direct Offers, a new ad pilot in AI Mode. It allows advertisers to offer exclusive discounts to people searching for products.

Google gave an example of a rug search where relevant retailers could feature a special 20% discount. Retailers set up offers in campaign settings, and Google determines when to display them.

Early partners include Petco, e.l.f. Cosmetics, Samsonite, Rugs USA, and Shopify merchants.

Why This Matters

Checkout in AI Mode means a user searching for a product can research, compare, and buy without ever reaching the retailer’s site.

For ecommerce sites, this changes the traffic equation. The sale still happens, but the site visit may not. Retailers participating in UCP gain access to high-intent buyers at the moment of decision. Those who don’t participate may find their products harder to surface when users expect to complete transactions without leaving Google.

Looking Ahead

Checkout in AI Mode rolls out to eligible U.S. retailers soon. Business Agent launches January 12. Direct Offers is in pilot with select advertisers.

Google said it plans to add new Merchant Center data attributes designed for discovery in AI Mode, Gemini, and Business Agent. The company will roll out the new attributes with a small group of retailers soon before expanding more broadly.


Featured Image: hafakot/Shutterstock

Google: AI Overviews Show Less When Users Don’t Engage via @sejournal, @MattGSouthern

AI Overviews don’t show up consistently across Google Search because the system learns where they’re useful and pulls them back when people don’t engage.

Robby Stein, Vice President of Product at Google Search, described in a CNN interview how Google tests the summaries, measures interaction, and reduces their appearance for certain kinds of searches where they don’t help.

How Google Decides When To Show AI Overviews

Stein explained that AI Overviews appear based on learned usefulness rather than showing up by default.

“The system actually learns where they’re helpful and will only show them if users have engaged with that and find them useful,” Stein said. “For many questions, people just ask like a short question or they’re looking for very specific website, they won’t show up because they’re not actually helpful in many many cases.”

He gave a concrete example. When someone searches for an athlete’s name, they typically want photos, biographical details, and social media links. The system learned people didn’t engage with an AI Overview for those queries.

“The system will learn that if it tried to do an AI overview, no one really clicked on it or engaged with it or valued it,” Stein said. “We have lots of metrics we look at that and then it won’t show up.”

What “Under The Hood” Queries Mean For Visibility

Stein described the system as sometimes expanding a search beyond what you type. Google “in many cases actually issues additional Google queries under the hood to expand your search and then brings you the most relevant information for a given question,” he said.

That may help explain why pages sometimes show up in AI Overview citations even when they don’t match your exact query wording. The system pulls in content answering related sub-questions or providing context.

For image-focused queries, AI Overviews integrate with image results. For shopping queries, they connect to product information. The system adapts based on what serves the question.

Where AI Mode Fits In

Stein described AI Mode as the next step for complicated questions that need follow-up conversation. The design assumes you start in traditional Search, get an Overview if it helps, then go deeper into AI Mode when you need more.

“We really designed AI Mode to really help you go deeper with a pretty complicated question,” Stein said, citing examples like comparing cars or researching backup power options.

During AI Mode testing, Google saw “like a two to three … full increase in the query length” compared to typical Search queries. Users also started asking follow-up questions in a conversational pattern.

The longer AI Mode queries included more specificity. Stein’s example: instead of “things to do in Nashville,” users asked “restaurants to go to in Nashville if one friend has an allergy and we have dogs and we want to sit outside.”

Personalization Exists But Is Limited

Some personalization in AI Mode already exists. Users who regularly click video results might see videos ranked higher, for example.

“We are personalizing some of these experiences,” Stein said. “But right now that’s a smaller adjustment probably to the experience because we want to keep it as consistent as possible overall.”

Google’s focus is on maintaining consistency across users while allowing for individual preferences where it makes sense.

Why This Matters

In July 2024, research showed Google had dialed back AIO presence by 52%, from widespread appearance to showing for just 8% of queries. Stein’s description offers one possible explanation for that pattern.

If you’re tracking AIO presence week to week, the fluctuations may reflect user behavior patterns for different question types rather than algorithm changes.

The “under the hood” query expansion means content can appear in citations even without matching your exact phrasing. That matters when you’re explaining CTR drops internally or planning content for complex queries where Overviews are more likely to surface.

Looking Ahead

Google’s AI Overviews earn placement based on usefulness rather than appearing by default.

Personalization is limited today, but the direction is moving toward more tailored experiences that maintain overall consistency.

See the full interview with Stein below:


Featured Image: nwz/Shutterstock

Google Gemini Gains Share As ChatGPT Declines In Similarweb Data via @sejournal, @MattGSouthern

ChatGPT accounted for 64% of worldwide traffic share among gen AI chatbot websites as of January, while Google’s Gemini reached 21%, according to Similarweb’s Global AI Tracker.

Similarweb’s tracker (PDF link) measures total visits at the domain level, so it reflects people who go to these tools directly on the web. It doesn’t capture API usage, embedded assistants, or other integrations where much of the AI usage occurs now.

ChatGPT Down, Gemini Up In A Year Of Share Gains

The share movement is easiest to see year-over-year.

A year ago, Similarweb estimated ChatGPT accounted for 86% of worldwide traffic among tracked chatbot sites. Now, that figure is 64%. Over the same period, Gemini rose from 5% to 21%.

Other tools are much smaller by this measure. DeepSeek was at 3.7%, Grok at 3.4%, and Perplexity and Claude both at 2.0%.

Google has been promoting Gemini through products like Android and Workspace, which may help explain why it’s gaining share among users who access these tools directly.

Winter Break Pulled Down Total Visits

Similarweb pointed to seasonality during the holiday period:

Similarweb wrote on X:

“Driven by the winter break, the daily average visits to all tools dropped to August-September levels.”

That context matters because it helps distinguish overall category softness from shifts in market share.

Writing Tool Domain Traffic Declines

Writing and content generation sites were down 10% over the most recent 12-week window in Similarweb’s category view.

At the individual tool level, Similarweb’s table shows steep drops for several writing platforms. Growthbarseo was down 100%, while Jasper fell 16%, Writesonic dropped 17%, and Rytr declined 9%. Originality was up 17%.

These are still domain-level visit counts, so the clearest takeaway is that fewer people are going directly to specialized writing sites online. That can happen for several reasons, including users relying more on general assistants, switching to apps, or using these models through integrations.

Code Completion Shows Mixed Results

The developer tools category looked more mixed than the writing tools.

Similarweb’s code completion table shows Bolt down 39% over 12 weeks, while Cursor (up 8%), Replit (up 2%), and Base44 (up 49%) moved in different directions.

Traditional Search Looks Close To Flat

In Similarweb’s “disrupted sectors” view, traditional search traffic is down roughly 1% to 3% year-over-year across recent periods, which doesn’t indicate a sharp drop in overall search usage in this dataset.

The same table shows Reddit up 12% year-over-year and Quora down 53%, consistent with the idea that some Q&A behavior is being redistributed even as overall search remains relatively steady.

Why This Matters

When making sense of how AI is changing discovery and demand, these numbers can help you understand where direct, web-based attention is concentrating. That can influence which assistants you monitor for brand mentions, citations, and referral behavior.

Though you should treat this a snapshot, not the full picture. If your audience is interacting with AI through browsers, apps, or embedded assistants, your own analytics will be a better barometer than any domain-level tracker.

Looking Ahead

The next report should clarify whether category traffic rebounds after the holiday period and whether Gemini continues to gain share at the same pace. It will also be a useful read on whether writing tools stabilize or whether more of that usage continues to consolidate into general assistants and bundled experiences.


Featured Image: vfhnb12/Shutterstock

Most Major News Publishers Block AI Training & Retrieval Bots via @sejournal, @MattGSouthern

Most top news publishers block AI training bots via robots.txt, but they’re also blocking the retrieval bots that determine whether sites appear in AI-generated answers.

BuzzStream analyzed the robots.txt files of 100 top news sites across the US and UK and found 79% block at least one training bot. More notably, 71% also block at least one retrieval or live search bot.

Training bots gather content to build AI models, while retrieval bots fetch content in real time when users ask questions. Sites blocking retrieval bots may not appear when AI tools try to cite sources, even if the underlying model was trained on their content.

What The Data Shows

BuzzStream examined the top 50 news sites in each market based on SimilarWeb traffic share, then deduplicated the list. The study grouped bots into three categories: training, retrieval/live search, and indexing.

Training Bot Blocks

Among training bots, Common Crawl’s CCBot was the most frequently blocked at 75%, followed by Anthropic-ai at 72%, ClaudeBot at 69%, and GPTBot at 62%.

Google-Extended, which trains Gemini, was the least blocked training bot at 46% overall. US publishers blocked it at 58%, nearly double the 29% rate among UK publishers.

Harry Clarkson-Bennett, SEO Director at The Telegraph, told BuzzStream:

“Publishers are blocking AI bots using the robots.txt because there’s almost no value exchange. LLMs are not designed to send referral traffic and publishers (still!) need traffic to survive.”

Retrieval Bot Blocks

The study found 71% of sites block at least one retrieval or live search bot.

Claude-Web was blocked by 66% of sites, while OpenAI’s OAI-SearchBot, which powers ChatGPT’s live search, was blocked by 49%. ChatGPT-User was blocked by 40%.

Perplexity-User, which handles user-initiated retrieval requests, was the least blocked at 17%.

Indexing Blocks

PerplexityBot, which Perplexity uses to index pages for its search corpus, was blocked by 67% of sites.

Only 14% of sites blocked all AI bots tracked in the study, while 18% blocked none.

The Enforcement Gap

The study acknowledges that robots.txt is a directive, not a barrier, and bots can ignore it.

We covered this enforcement gap when Google’s Gary Illyes confirmed robots.txt can’t prevent unauthorized access. It functions more like a “please keep out” sign than a locked door.

Clarkson-Bennett raised the same point in BuzzStream’s report:

“The robots.txt file is a directive. It’s like a sign that says please keep out, but doesn’t stop a disobedient or maliciously wired robot. Lots of them flagrantly ignore these directives.”

Cloudflare documented that Perplexity used stealth crawling behavior to bypass robots.txt restrictions. The company rotated IP addresses, changed ASNs, and spoofed its user agent to appear as a browser.

Cloudflare delisted Perplexity as a verified bot and now actively blocks it. Perplexity disputed Cloudflare’s claims and published a response.

For publishers serious about blocking AI crawlers, CDN-level blocking or bot fingerprinting may be necessary beyond robots.txt directives.

Why This Matters

The retrieval-blocking numbers warrant attention here. In addition to opting out of AI training, many publishers are opting out of the citation and discovery layer that AI search tools use to surface sources.

OpenAI separates its crawlers by function: GPTBot gathers training data, while OAI-SearchBot powers live search in ChatGPT. Blocking one doesn’t block the other. Perplexity makes a similar distinction between PerplexityBot for indexing and Perplexity-User for retrieval.

These blocking choices affect where AI tools can pull citations from. If a site blocks retrieval bots, it may not appear when users ask AI assistants for sourced answers, even if the model already contains that site’s content from training.

The Google-Extended pattern is worth watching. US publishers block it at nearly twice the UK rate, though whether that reflects different risk calculations around Gemini’s growth or different business relationships with Google isn’t clear from the data.

Looking Ahead

The robots.txt method has limits, and sites that want to block AI crawlers may find CDN-level restrictions more effective than robots.txt alone.

Cloudflare’s Year in Review found GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives across top domains. The report also noted that most publishers use partial blocks for Googlebot and Bingbot rather than full blocks, reflecting the dual role Google’s crawler plays in search indexing and AI training.

For those tracking AI visibility, the retrieval bot category is what to watch. Training blocks affect future models, while retrieval blocks affect whether your content shows up in AI answers right now.


Featured Image: Kitinut Jinapuck/Shutterstock

Google’s Mueller Weighs In On SEO vs GEO Debate via @sejournal, @MattGSouthern

Google Search Advocate John Mueller says businesses that rely on referral traffic should think about how AI tools fit into the picture.

Mueller responded to a Reddit thread asking whether SEO is still enough or whether practitioners need to start considering GEO, a term some in the industry use for optimizing visibility in AI-powered answer engines like ChatGPT, Gemini, and Perplexity.

“If you have an online business that makes money from referred traffic, it’s definitely a good idea to consider the full picture, and prioritize accordingly,” Mueller wrote.

What Mueller Said

Mueller didn’t endorse or reject the GEO terminology. He framed the question in terms of practical business decisions rather than new optimization techniques.

“What you call it doesn’t matter, but ‘AI’ is not going away, but thinking about how your site’s value works in a world where ‘AI’ is available is worth the time,” he wrote.

He also pushed back on treating AI visibility as a universal priority. Mueller suggested practitioners look at their own data first.

Mueller added:

“Also, be realistic and look at actual usage metrics and understand your audience (what % is using ‘AI’? what % is using Facebook? what does it mean for where you spend your time?).”

Why This Matters

I’ve been tracking Mueller’s public statements for years, and this one lands differently than the usual “it depends” responses he’s known for. He’s reframing the GEO question as a resource allocation problem rather than a terminology debate.

The GEO conversation has picked up steam over the past year as AI answer engines started sending measurable referral traffic. I’ve covered the citation studies, the traffic analyses, and the research comparing Google rankings to LLM citations. What’s been missing is a clear signal from Google: is this a distinct discipline, or just rebranded SEO?

Mueller’s answer is consistent with what Google said at Search Central Live, when Gary Illyes emphasized that AI features share infrastructure with traditional Search. The message from both is that you probably don’t need a separate framework, but you do need to understand how discovery is changing.

What I find more useful is his emphasis on checking your own numbers. Current data shows ChatGPT referrals at roughly 0.19% of traffic for the average site. AI assistants combined still drive less than 1% for most publishers. That’s growing, but it’s not yet a reason to reorganize your entire strategy.

The industry has a habit of chasing trends that apply to some sites but not others. Mueller’s pushing back on that pattern. Look at what percentage of your audience actually uses AI tools before reallocating resources toward them.

Looking Ahead

The GEO terminology will likely stick, regardless of Google’s stance. Mueller’s framing puts the decision back on individual businesses to measure their own audience behavior.

For practitioners, this means the homework is in your analytics. If AI referrals are showing up in your traffic sources, they’re worth understanding. If they’re not, you have other priorities.


Featured Image: Roman Samborskyi/Shutterstock

Google’s Mueller Explains ‘Page Indexed Without Content’ Error via @sejournal, @MattGSouthern

Google Search Advocate John Mueller responded to a question about the “Page Indexed without content” error in Search Console, explaining the issue typically stems from server or CDN blocking rather than JavaScript.

The exchange took place on Reddit after a user reported their homepage dropped from position 1 to position 15 following the error’s appearance.

What’s Happening?

Mueller clarified a common misconception about the cause of “Page Indexed without content” in Search Console.

Mueller wrote:

“Usually this means your server / CDN is blocking Google from receiving any content. This isn’t related to anything JavaScript. It’s usually a fairly low level block, sometimes based on Googlebot’s IP address, so it’ll probably be impossible to test from outside of the Search Console testing tools.”

The Reddit user had already attempted several diagnostic steps. They ran curl commands to fetch the page as Googlebot, checked for JavaScript blocking, and tested with Google’s Rich Results Test. Desktop inspection tools returned “Something went wrong” errors while mobile tools worked normally.

Mueller noted that standard external testing methods won’t catch these blocks.

He added:

“Also, this would mean that pages from your site will start dropping out of the index (soon, or already), so it’s a good idea to treat this as something urgent.”

The affected site uses Webflow as its CMS and Cloudflare as its CDN. The user reported the homepage had been indexing normally with no recent changes to the site.

Why This Matters

I’ve covered this type of problem repeatedly over the years. CDN and server configurations can inadvertently block Googlebot without affecting regular users or standard testing tools. The blocks often target specific IP ranges, which means curl tests and third-party crawlers won’t reproduce the problem.

I covered when Google first added “indexed without content” to the Index Coverage report. Google’s help documentation at the time noted the status means “for some reason Google could not read the content” and specified “this is not a case of robots.txt blocking.” The underlying cause is almost always something lower in the stack.

The Cloudflare detail caught my attention. I reported on a similar pattern when Mueller advised a site owner whose crawling stopped across multiple domains simultaneously. All affected sites used Cloudflare, and Mueller pointed to “shared infrastructure” as the likely culprit. The pattern here looks familiar.

More recently, I covered a Cloudflare outage in November that triggered 5xx spikes affecting crawling. That was a widespread incident. This case appears to be something more targeted, likely a bot protection rule or firewall setting that treats Googlebot’s IP addresses differently from other traffic.

Search Console’s URL Inspection tool and Live URL test remain the primary ways to identify these blocks. When those tools return errors while external tests pass, server-level blocking becomes the likely cause. Mueller made a similar point in August when advising on crawl rate drops, suggesting site owners “double-check what actually happened” and verify “if it was a CDN that actually blocked Googlebot.”

Looking Ahead

If you’re seeing the “Page Indexed without content” error, check the CDN and server configurations for rules that affect Googlebot’s IP ranges. Google publishes its crawler IP addresses, which can help identify whether security rules are targeting them.

The Search Console URL Inspection tool is the most reliable way to see what Google receives when crawling a page. External testing tools won’t catch IP-based blocks that only affect Google’s infrastructure.

For Cloudflare users specifically, check bot management settings, firewall rules, and any IP-based access controls. The configuration may have changed through automatic updates or new default settings rather than manual changes.

Google Ads Using New AI Model To Catch Fraudulent Advertisers via @sejournal, @martinibuster

Google published a research paper about a new AI model for detecting fraud in the Google Ads system that’s a strong improvement over what they were previously using. What’s interesting is that the research paper, dated December 31, 2025,  says that the new AI is deployed, resulting in an improvement in the detection rate of over 40 percentage points and achieving 99.8% precision on specific policies.

ALF: Advertiser Large Foundation Model

The new AI is called ALF (Advertiser Large Foundation Model), the details of which were published on December 31, 2025. ALF is a multimodal large foundation model that analyzes text, images, and video, together with factors like account age, billing details, and historical performance metrics.

The researchers explain that many of these factors in isolation won’t flag an account as potentially problematic, but that comparing all of these factors together provides a better understanding of advertiser behavior and intent.

They write:

“A core challenge in this ecosystem is to accurately and efficiently understand advertiser intent and behavior. This understanding is critical for several key applications, including matching users with ads and identifying fraud and policy violations.

Addressing this challenge requires a holistic approach, processing diverse data types including structured account information (e.g., account age, billing details), multi-modal ad creative assets (text, images, videos), and landing page content.

For example, an advertiser might have a recently created account, have text and image ads for a well known large brand, and have had a credit card payment declined once. Although each element could exist innocently in isolation, the combination strongly suggests a fraudulent operation.”

The researchers address three challenges that previous systems were unable to overcome:

1. Heterogeneous and High-Dimensional Data
Heterogeneous data refers to the fact that advertiser data comes in multiple formats, not just one type. This includes structured data like account age and billing type and unstructured data like creative assets such as images, text, and video. High-dimensional data refers to the hundreds or thousands of data points associated with each advertiser, causing the mathematical representation of each one to become high-dimensional, which presents challenges for conventional models.

2. Unbounded Sets of Creative Assets
Advertisers could have thousands of creative assets, such as images, and hide one or two malicious ones among thousands of innocent assets. This scenario overwhelmed the previous system.

3. Real-World Reliability and Trustworthiness
The system needs to be able to generate trustworthy confidence scores that a business has malicious intent because a false positive would otherwise affect an innocent advertiser. The system must be expected to work without having to constantly retune it to catch mistakes.

Privacy and Safety

Although ALF analyzes sensitive signals like billing history and account details, the researchers emphasize that the system is designed with strict privacy safeguards. Before the AI processes any data, all personally identifiable information (PII) is stripped away. This ensures that the model identifies risk based on behavioral patterns rather than sensitive personal data.

The Secret Sauce: How It Spots Outliers

The model also uses a technique called “Inter-Sample Attention” to improve its detection skills. Instead of analyzing a single advertiser in a vacuum, ALF looks at “large advertiser batches” to compare their interactions against one another. This allows the AI to learn what normal activity looks like across the entire ecosystem and make it more accurate in spotting suspicious outliers that don’t fit into normal behavior.

Alf Outperforms Production Benchmarks

The researchers explain that their tests show that ALF outperforms a heavily tuned production baseline:

“Our experiments show ALF significantly outperforms a heavily tuned production baseline while also performing strongly on public benchmarks. In production, ALF delivers substantial and simultaneous gains in precision and recall, boosting recall by over 40 percentage points on one critical policy while increasing precision to 99.8% on another.”

This result demonstrates that ALF can deliver measurable gains across multiple evaluation criteria under actual real-world production conditions, rather than just in offline or benchmarked environments.

Elsewhere they mention tradeoffs in speed:

“The effectiveness of this approach was validated against an exceptionally strong production baseline, itself the result of an extensive search across various architectures and hyperparameters, including DNNs, ensembles, GBDTs, and logistic regression with feature cross exploration.

While ALF’s latency is higher due to its larger model size, it remains well within the acceptable range for our production environment and can be further optimized using hardware accelerators. Experiments show ALF significantly outperforms the baseline on key risk detection tasks, a performance lift driven by its unique ability to holistically model content embeddings, which simpler architectures struggled to leverage. This trade-off is justified by its successful deployment, where ALF serves millions of requests daily.”

Latency refers to the amount of time the system takes to produce a response after receiving a request, and the researcher data shows that although ALF increases this response time relative to the baseline, the latency remains acceptable for production use and is already operating at scale while delivering substantially better fraud detection performance.

Improved Fraud Detection

The researchers say that ALF is now deployed to the Google Ads Safety system for identifying advertisers that are violating Google Ads policies. There is no indication that the system is being used elsewhere such as in Search or Google Business Profiles. But they did say that future work could focus on time-based factors (“temporal dynamics”) for catching evolving patterns. They also indicated that it could be useful for audience modeling and creative optimization.

Read the original PDF version of the research paper:

ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding

Featured Image by Shutterstock/Login

The Guardian: Google AI Overviews Gave Misleading Health Advice via @sejournal, @MattGSouthern

The Guardian published an investigation claiming health experts found inaccurate or misleading guidance in some AI Overview responses for medical queries. Google disputes the reporting and says many examples were based on incomplete screenshots.

The Guardian said it tested health-related searches and shared AI Overview responses with charities, medical experts, and patient information groups. Google told The Guardian the “vast majority” of AI Overviews are factual and helpful.

What The Guardian Reported Finding

The Guardian said it tested a range of health queries and asked health organizations to review the AI-generated summaries. Several reviewers said the summaries included misleading or incorrect guidance.

One example involved pancreatic cancer. Anna Jewell, director of support, research and influencing at Pancreatic Cancer UK, said advising patients to avoid high-fat foods was “completely incorrect.” She added that following that guidance “could be really dangerous and jeopardise a person’s chances of being well enough to have treatment.”

The reporting also highlighted mental health queries. Stephen Buckley, head of information at Mind, said some AI summaries for conditions such as psychosis and eating disorders offered “very dangerous advice” and were “incorrect, harmful or could lead people to avoid seeking help.”

The Guardian cited a cancer screening example too. Athena Lamnisos, chief executive of the Eve Appeal cancer charity, said a pap test being listed as a test for vaginal cancer was “completely wrong information.”

Sophie Randall, director of the Patient Information Forum, said the examples showed “Google’s AI Overviews can put inaccurate health information at the top of online searches, presenting a risk to people’s health.”

The Guardian also reported that repeating the same search could produce different AI summaries at different times, pulling from different sources.

Google’s Response

Google disputed both the examples and the conclusions.

A spokesperson told The Guardian that many of the health examples shared were “incomplete screenshots,” but from what the company could assess they linked “to well-known, reputable sources and recommend seeking out expert advice.”

Google told The Guardian the “vast majority” of AI Overviews are “factual and helpful,” and that it “continuously” makes quality improvements. The company also argued that AI Overviews’ accuracy is “on a par” with other Search features, including featured snippets.

Google added that when AI Overviews misinterpret web content or miss context, it will take action under its policies.

The Broader Accuracy Context

This investigation lands in the middle of a debate that’s been running since AI Overviews expanded in 2024.

During the initial rollout, AI Overviews drew attention for bizarre results, including suggestions involving glue on pizza and eating rocks. Google later said it would reduce the scope of queries that trigger AI-written summaries and refine how the feature works.

I covered that launch, and the early accuracy problems quickly became part of the public narrative around AI summaries. The question then was whether the issues were edge cases or something more structural.

More recently, data from Ahrefs suggests medical YMYL queries are more likely than average to trigger AI Overviews. In its analysis of 146 million SERPs, Ahrefs reported that 44.1% of medical YMYL queries triggered an AI Overview. That’s more than double the overall baseline rate in the dataset.

Separate research on medical Q&A in LLMs has pointed to citation-support gaps in AI-generated answers. One evaluation framework, SourceCheckup, found that many responses were not fully supported by the sources they cited, even when systems provided links.

Why This Matters

AI Overviews appear above ranked results. When the topic is health, errors carry more weight.

Publishers have spent years investing in documented medical expertise to meet. This investigation puts the same spotlight on Google’s own summaries when they appear at the top of results.

The Guardian’s reporting also highlights a practical problem. The same query can produce different summaries at different times, making it harder to verify what you saw by running the search again.

Looking Ahead

Google has previously adjusted AI Overviews after viral criticism. Its response to The Guardian indicates it expects AI Overviews to be judged like other Search features, not held to a separate standard.

Google’s Recommender System Breakthrough Detects Semantic Intent via @sejournal, @martinibuster

Google published a research paper about helping recommender systems understand what users mean when they interact with them. Their goal with this new approach is to overcome the limitations inherent in the current state-of-the-art recommender systems in order to get a finer, detailed understanding of what users want to read, listen to, or watch at the level of the individual.

Personalized Semantics

Recommender systems predict what a user would like to read or watch next. YouTube, Google Discover, and Google News are examples of recommender systems for recommending content to users. Other kinds of recommender systems are shopping recommendations.

Recommender systems generally work by collecting data about the kinds of things a user clicks on, rates, buys, and watches and then using that data to suggest more content that aligns with a user’s preferences.

The researchers referred to those kinds of signals as primitive user feedback because they’re not so good at recommendations based on an individual’s subjective judgment about what’s funny, cute, or boring.

The intuition behind the research is that the rise of LLMs presents an opportunity to leverage natural language interactions to better understand what a user wants through identifying semantic intent.

The researchers explain:

“Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue).

Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user’s semantic intent from the open-ended terms or attributes often used to describe a desired item. This is critical for recommender systems that wish to support users in their everyday, intuitive use of natural language to refine recommendation results.”

The Soft Attributes Challenge

The researchers explained that hard attributes are something that recommender systems can understand because they are objective ground truths like “genre, artist, director.” What they had problems with were other kinds of attributes called “soft attributes” that are subjective and for which they couldn’t be matched with movies, content, or product items.

The research paper states the following characteristics of soft attributes:

  • “There is no definitive “ground truth” source associating such soft attributes with items
  • The attributes themselves may have imprecise interpretations
  • And they may be subjective in nature (i.e., different users may interpret them differently)”

The problem of soft attributes is the problem that the researchers set out to solve and why the research paper is called Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors.

Novel Use Of Concept Activation Vectors (CAVs)

Concept Activation Vectors (CAVs) are a way to probe AI models to understand the mathematical representations (vectors) the models use internally. They provide a way for humans to connect those internal vectors to concepts.

So the standard direction of the CAV is interpreting the model. What the researchers did was to change that direction so that the goal is now to interpret the users, translating subjective soft attributes into mathematical representations for recommender systems. The researchers discovered that adapting CAVs to interpret users enabled vector representations that helped AI models detect subtle intent and subjective human judgments that are personalized to an individual.

As they write:

“We demonstrate … that our CAV representation not only accurately interprets users’ subjective semantics, but can also be used to improve recommendations through interactive item critiquing.”

For example, the model can learn that users mean different things by “funny” and be better able to leverage those personalized semantics when making recommendations.

The problem the researchers are solving is figuring out how to bridge the semantic gap between how humans speak and how recommender systems “think.”

Humans think in concepts, using vague or subjective descriptions (called soft attributes).

Recommender systems “think” in math: They operate on vectors (lists of numbers) in a high-dimensional “embedding space”.

The problem then becomes making the subjective human speech less ambiguous but without having to modify or retrain the recommender system with all the nuances. The CAVs do that heavy lifting.

The researchers explain:

“…we infer the semantics of soft attributes using the representation learned by the recommender system model itself.”

They list four advantages of their approach:

“(1) The recommender system’s model capacity is directed to predicting user-item preferences without further trying to predict additional side information (e.g., tags), which often does not improve recommender system performance.

(2) The recommender system model can easily accommodate new attributes without retraining should new sources of tags, keywords or phrases emerge from which to derive new soft attributes.

(3) Our approach offers a means to test whether specific soft attributes are relevant to predicting user preferences. Thus, we are able focus attention on attributes most relevant to capturing a user’s intent (e.g., when explaining recommendations, eliciting preferences, or suggesting critiques).

(4) One can learn soft attribute/tag semantics with relatively small amounts of labelled data, in the spirit of pre-training and few-shot learning.”

They then provide a high-level explanation of how the system works:

“At a high-level, our approach works as follows. we assume we are given:

(i) a collaborative filtering-style model (e.g.,probabilistic matrix factorization or dual encoder) which embeds items and users in a latent space based on user-item ratings; and

(ii) a (small) set of tags (i.e., soft attribute labels) provided by a subset of users for a subset of items.

We develop methods that associate with each item the degree to which it exhibits a soft attribute, thus determining that attribute’s semantics. We do this by applying concept activation vectors (CAVs) —a recent method developed for interpretability of machine-learned models—to the collaborative filtering model to detect whether it learned a representation of the attribute.

The projection of this CAV in embedding space provides a (local) directional semantics for the attribute that can then be applied to items (and users). Moreover, the technique can be used to identify the subjective nature of an attribute, specifically, whether different users have different meanings (or tag senses) in mind when using that tag. Such a personalized semantics for subjective attributes can be vital to the sound interpretation of a user’s true intent when trying to assess her preferences.”

Does This System Work?

One of the interesting findings is that their test of an artificial tag (odd year) showed that the systems accuracy rate was barely above a random selection, which corroborated their hypothesis that “CAVs are useful for identifying preference related attributes/tags.”

They also found that using CAVs in recommender systems were useful for understanding “critiquing-based” user behavior and improved those kinds of recommender systems.

The researchers listed four benefits:

“(i) using a collaborative filtering representation to identify attributes of greatest relevance to the recommendation task;

(ii) distinguishing objective and subjective tag usage;

(iii) identifying personalized, user-specific semantics for subjective attributes; and

(iv) relating attribute semantics to preference representations, thus allowing interactions using soft attributes/tags in example critiquing and other forms of preference elicitation.”

They found that their approach improved recommendations for situations where discovery of soft attributes are important. Using this approach for situations in which hard attributes are more the norm, such as in product shopping, is a future area of study to see if soft attributes would aid in making product recommendations.

Takeaways

The research paper was published in 2024 and I had to dig around to actually find it, which may explain why it generally went unnoticed in the search marketing community.

Google tested some of this approach with an algorithm called WALS (Weighted Alternating Least Squares), actual production code that is a product in Google Cloud for developers.

Two notes in a footnote and in the appendix explain:

“CAVs on MovieLens20M data with linear attributes use embeddings that were learned (via WALS) using internal production code, which is not releasable.”

…The linear embeddings were learned (via WALS, Appendix A.3.1) using internal production code, which is not releasable.”

“Production code” refers to software that is currently running in Google’s user-facing products, in this case Google Cloud. It’s likely not the underlying engine for Google Discover, however it’s important to note because it shows how easily it can be integrated into an existing recommender system.

They tested this system using the MovieLens20M dataset, which is a public dataset of 20 million ratings, with some of the tests done with Google’s proprietary recommendation engine (WALS). This lends credibility to the inference that this code can be used on a live system without having to retrain or modify them.

The takeaway that I see in this research paper is that this makes it possible for recommender systems to leverage semantic data about soft attributes. Google Discover is regarded by Google as a subset of search, and search patterns are some of the data that the system uses to surface content. Google doesn’t say whether they are using this kind of method, but given the positive results, it is possible that this approach could be used in Google’s recommender systems. If that’s the case, then that means Google’s recommendations may be more responsive to users’ subjective semantics.

The research paper credits Google Research (60% of the credits), and also Amazon, Midjourney, and Meta AI.

The PDF is available here:

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

Featured Image by Shutterstock/Here

Reddit Introduces Max Campaigns, Its New Automated Campaign Type via @sejournal, @brookeosmundson

Reddit is rolling out Max campaigns, a new automated campaign type now available in beta for traffic and conversion objectives.

The launch comes as Reddit continues to see strong advertiser momentum, supported by rising daily active users and rapid growth in conversion activity.

While automation is now standard across most paid media platforms, Reddit is positioning Max campaigns as a way to simplify campaign management without asking advertisers to operate with limited visibility into performance or audience behavior.

How Reddit Max Campaigns Work

Max campaigns are designed to reduce setup complexity and ongoing management by automating several decisions advertisers typically make manually.

This includes the following, all within guardrails defined by the advertiser:

  • Audience targeting
  • Creative selection and rotation, placements
  • Budget allocation

The system is powered by Reddit Community Intelligence™, which draws from more than 23 billion posts and comments to help predict the value of each ad impression in real time. These signals allow campaigns to adjust delivery dynamically as performance data changes, rather than relying on static rules or frequent manual intervention.

Max campaigns also introduce optional creative automation tools. Advertisers can generate headline suggestions based on trending Reddit language, automatically adapt images into Reddit-friendly thumbnails, and soon will be able to use AI-based video cropping to more easily reuse video assets from other platforms.

In the announcement, Reddit reports that more than 600 advertisers participated in alpha testing. Across 17 split tests conducted between June and August 2025, advertisers saw an average 17% lower cost per acquisition and 27% more conversions compared to business-as-usual campaigns.

In one example, Brooks Running reported a 37% decrease in cost per click and 27% more clicks over a 21-day campaign without making manual changes.

Why This Matters For Advertisers

Platforms like Google and Meta have spent the last several years pushing advertisers toward AI-driven campaign types that consolidate targeting, creative, and bidding into a single system. Performance Max, Advantage+, and similar offerings have become the default recommendation for scaling efficiency.

Reddit’s Max campaigns follow that same directional shift, but with a notable difference in emphasis. Where Google and Meta largely optimize toward outcomes while abstracting audience detail, Reddit is attempting to pair automation with clearer audience context.

On Google and Meta, advertisers often evaluate AI campaigns based on aggregate performance metrics alone, with limited insight into who is driving results beyond high-level breakdowns. Reddit is positioning Max campaigns as a way to automate delivery while still helping advertisers understand which types of users are engaging, what they care about, and how conversations influence response.

Top Audience Personas reflect this approach. Instead of relying solely on predefined segments or modeled interests, Reddit uses community and conversation signals to surface patterns in how real users engage with ads. These insights are not meant to replace targeting decisions, but to inform creative strategy, messaging, and where Reddit fits within a broader media mix.

For advertisers who have grown cautious of automation that prioritizes efficiency at the expense of understanding, this added layer of insight may be the differentiator.

What Advertisers Should Do Next

Max campaigns are now available in beta for traffic and conversion objectives to select advertisers, with wider access expected over the coming months. Top Audience Persona reporting is scheduled to roll out shortly after.

For advertisers already running Reddit campaigns, this is best treated as a controlled test. Running Max campaigns alongside existing setups can help clarify where automation improves efficiency and where hands-on input, especially around creative and community fit, still matters.

Advertisers coming from Performance Max or Advantage+ should expect familiar mechanics, but different signals. Reddit’s value is tied to conversation and context, so creative testing and message alignment will likely play a larger role than pure audience tuning.

As with any beta, things will change. The near-term opportunity is not just performance lift, but learning how Reddit’s version of automation behaves and where it fits alongside other AI-led campaigns in a broader media mix.