5 Google Analytics Reports PPC Marketers Should Actually Use via @sejournal, @brookeosmundson

Google Analytics has never been perfect, but it used to feel familiar.

The shift to Google Analytics 4 forced PPC marketers to rethink how they pull insights, not just where to click.

Reports that once lived front and center now take more effort to find. Some require extra setup. Others feel less intuitive than before and that creates a real problem for PPC managers who need answers quickly.

You are expected to explain performance, justify spend, and make optimization decisions, often without the luxury of rebuilding reports or navigating multiple menus.

This article focuses on five Google Analytics reports that still deliver real value for PPC. These are the reports that help you understand audience behavior, uncover expansion opportunities, and connect paid traffic to outcomes the business actually cares about.

1. Audiences Report

As keyword match types continue to loosen and automation plays a larger role in campaign delivery, audience signals matter more than ever.

The Audiences report in GA4 replaces what many marketers previously relied on interest-based reports for, but with a more practical twist. Instead of inferred intent, this report is built on real user behavior.

This report shows how predefined and custom audiences perform across key engagement and conversion metrics. For PPC marketers, the value lies in analyzing audiences tied to meaningful actions, not generic demographic traits.

Use this report to:

  • Identify which audiences are driving actual conversions, not just traffic.
  • Compare performance between converters, cart viewers, repeat visitors, or high-engagement users.
  • Validate which audiences deserve more aggressive bidding or budget allocation.
  • Build and export high-performing audiences directly into Google Ads.

This report is far more actionable than legacy interest segments and aligns better with how PPC campaigns are structured today.

To find this report, navigate to: Reports > User > User Attributes > Audiences.

Navigating the Audiences report in Google Analytics 4 property.
Screenshot by author, January 2026.

This report will only be useful if you have custom audiences set up in GA4. These are behavior-based audiences you define yourself, not prebuilt segments like In-Market or Affinity audiences you may be used to seeing in Google Ads.

GA4 audiences are built from first-party actions such as page views, events, or conversion behavior, which makes them more relevant for PPC optimization but requires upfront configuration.

2. Site Search Report

The Site Search report remains one of the most underused tools for PPC expansion.
By analyzing what users search for once they land on your site, you gain direct insight into unmet expectations and intent gaps.

In GA4, Site Search data lives under event tracking rather than a standalone report.

For PPC teams, this report can:

  • Inform keyword expansion using real user language.
  • Highlight product or content gaps affecting conversion rates.
  • Reveal mismatches between ad messaging and on-site expectations.

Speaking of gaps, the Site Search report can also help product teams understand if additional demands exist for the products offered.

For example, say you have a wedding invitation website that has a decent product assortment for different themed weddings.

When using the Site Search report, you see an increasing number of searches for “rustic” – but none of the website designs have that rustic feel!

This can inform product marketing that there is a demand for this type of product, and they can take action accordingly.

To find the Site Search report, navigate to Reports > Engagement > Events.

Look for the event “view_search_results” and click on it.

Screenshot by author, January 2026

Once clicked, find the “search_term” custom parameter card on the page.

A few important notes on search terms data:

  • Before using this report, you must create a new custom dimension (event-scoped) for the search term results to populate.
  • Google Analytics will only show data once it meets a minimum aggregation threshold.

While it’s not as robust as the previous Site Search report in Universal Analytics, it does provide basic data on the number of events and total users per search term.

3. Referrals Report

Referral traffic is often ignored by PPC teams, which is a missed opportunity.

The Referrals report shows which external sites send users to your website and how those users behave once they arrive.

To find this report, navigate to Reports > Acquisition > Traffic Acquisition.

Google Analytics 4 Referral report navigation
Screenshot by author, January 2026

To view the websites from the Referral channel, click the “+” in the default channel group and choose “Session source/medium.”

Isolating the Referrals report in Google Analytics 4 to identify which websites drove traffic to a website.
Screenshot by author, January 2026

The key features of this report can:

  • Identify third-party sites sending high-quality traffic.
  • Distinguish between low-intent and high-intent referral sources.
  • Build placement-based audiences for Display or Demand Gen testing.

Testing Display placements based on proven referral sources can be a cost-efficient way to expand reach without sacrificing traffic quality.

This is a cost-efficient way to test expanding new PPC efforts responsibly because the referral websites chosen are known to provide high-quality traffic to your website.

4. Top Conversion Paths Report

As marketers, we’re often asked how “Top of Funnel” (TOFU) or brand awareness campaigns are performing.

Leadership typically prioritizes channels that are proven to perform. So, they want to make sure marketing dollars are spent efficiently.

In today’s economy, this is more important than ever.

This Google Analytics report helps analyze and interpret TOFU behavior.

If you’re running any type of campaign beyond Search, this report is absolutely necessary.

Campaigns like YouTube and Display and other paid channels like social media (Meta, Instagram, TikTok, etc.) naturally have different goals and objectives.

TOF campaigns are undoubtedly criticized for “not performing” at the same rate as a Search campaign.

As marketers, this can be frustrating to hear over and over.

Using the Conversions Path report provides a holistic view of how long it takes a user to eventually make a purchase from the initial interaction.

To find this report, navigate to Advertising > Attribution > Conversion paths.

When drilling down to specific campaign performance, I recommend:

  • Add a filter that contains “Session source/medium” to the specific paid channel in question (“google/cpc”, for example).
  • Include an “AND” statement to the filter for “Session campaign” specific to the TOF campaigns in question.
Conversions Path Report in Google Analytics 4.
Screenshot by author, January 2026

In the example above, we found that our Paid Social campaigns should have been credited in more of the early and mid touchpoints!

The key features of this report can:

  • Identify how many touchpoints to final conversion.
  • Analyze complex user journey interactions when multiple channels are involved (especially for longer sale cycles).
  • Report on credited conversions based on the attribution model.

This report can uncover necessary data to support the request for additional marketing dollars in TOF channels.

A win-win for all parties involved.

5. Conversion Events Report

Most PPC accounts optimize toward a single primary conversion. That makes sense for bidding, but it rarely tells the full story of how paid traffic actually contributes to revenue.

The Conversion Events report in Google Analytics 4 allows you to step back and evaluate all meaningful actions users take, not just the final one that gets credit in-platform.

For PPC decision-making, this report helps answer questions that Google Ads alone cannot, such as:

  • Which actions consistently happen before a purchase or lead submission.
  • Whether certain campaigns drive strong intent but fail to close immediately.
  • How different paid channels influence early-stage engagement versus final conversion.

This becomes especially important when evaluating Display, YouTube, Demand Gen, or paid social campaigns. These campaigns often look inefficient when judged solely on last-click performance, but they may drive key actions like product views, pricing page visits, form starts, or repeat sessions.

To find this report, navigate to: Reports > Engagement > Events.

Screenshot by author, January 2026

Conversion analysis in GA4 depends on which events you explicitly mark as conversions in Admin settings. GA4 does not provide a standalone “conversion-only” filter inside the Events report, so accuracy starts with proper event configuration.

Another practical use of this report is diagnosing drop-off points. If a campaign drives high volumes of early conversion events but struggles to generate final conversions, the issue may lie in landing page experience, form friction, or follow-up timing rather than targeting or bidding.

When paired with campaign-level filters from Google Ads, the Conversion Events report helps PPC managers explain why a campaign matters, even when it is not the last touch.

That context is often the difference between cutting a campaign too early and scaling one that is quietly doing its job.

Turn Analytics Into Better PPC Decisions

Google Analytics is not where most PPC optimizations happen day to day. That work still lives inside ad platforms.

But these reports serve a different purpose. They help PPC managers step back and understand how paid traffic behaves once it reaches the site, how users move across channels, and which actions actually signal intent.

Used monthly or quarterly, these reports surface patterns that daily account reviews often miss. They support smarter targeting decisions, clearer performance explanations, and more confident budget conversations.

When you focus on the reports that consistently answer real PPC questions, Google Analytics becomes less of a chore and more of a strategic asset.

More Resources:


Featured Image: MR Chalee/Shutterstock

Why CFOs Are Cutting AI Budgets (And The 3 Metrics That Save Them) via @sejournal, @purnavirji

Every AI vendor pitch follows the same script: “Our tool saves your team 40% of their time on X task.”

The demo looks impressive. The return on investment (ROI) calculator backs it up, showing millions in labor cost savings. You get budget approval. You deploy.

Six months later, your CFO asks: “Where’s the 40% productivity gain in our revenue?”

You realize the saved time went to email and meetings, not strategic work that moves the business forward.

This is the AI measurement crisis playing out in enterprises right now.

According to Fortune’s December 2025 report, 61% of CEOs report increasing pressure to show returns on AI investments. Yet most organizations are measuring the wrong things.

There’s a problem with how we’ve been tracking AI’s value.

Why ‘Time Saved’ Is A Vanity Metric

Time saved sounds compelling in a business case. It’s concrete, measurable, and easy to calculate.

But time saved doesn’t equal value created.

Anthropic’s November 2025 research analyzing 100,000 real AI conversations found that AI reduces task completion time by approximately 80%. Sounds transformative, right?

What that stat doesn’t capture is the Jevons Paradox of AI.

In economics, the Jevons Paradox occurs when technological progress increases the efficiency with which a resource is used, but the rate of consumption of that resource rises rather than falls.

In the corporate world, this is the Reallocation Fallacy. Just because AI completes a task faster doesn’t mean your team is producing more value. It means they’re producing the same output in less time, but then filling that saved time with lower-value work. Think more meetings, longer email threads, and administrative drift.

Google Cloud’s 2025 ROI of AI report, surveying 3,466 business leaders, found that 74% report seeing ROI within the first year, most commonly through productivity and efficiency gains rather than outcome improvements.

But when you dig into what they’re measuring, it’s primarily efficiency gains, and not outcome improvements.

CFOs understand this intuitively. That’s why “time saved” metrics don’t convince finance teams to increase AI budgets.

What does convince them is measuring what AI enables you to do that you couldn’t do before.

The Three Types Of AI Value Nobody’s Measuring

Recent research from Anthropic, OpenAI, and Google reveals a pattern: The organizations seeing real AI ROI are measuring expansion.

Three types of value actually matter:

Type 1: Quality Lift

AI can make work faster, and it makes good work better.

A marketing team using AI for email campaigns can send emails quicker. And they also have time to A/B test multiple subject lines, personalize content by segment, and analyze results to improve the next campaign.

The metric isn’t “time saved writing emails.” The metric is “15% higher email conversion rate.”

OpenAI’s State of Enterprise AI report, based on 9,000 workers across almost 100 enterprises, found that 85% of marketing and product users report faster campaign execution. But the real value shows up in campaign performance, not campaign speed.

How to measure quality lift:

  • Conversion rate improvements (not just task completion speed).
  • Customer satisfaction scores (not just response time).
  • Error reduction rates (not just throughput).
  • Revenue per campaign (not just campaigns launched).

One B2B SaaS company I talked to deployed AI for content creation.

  • Their old metric was “blog posts published per month.”
  • Their new metric became “organic traffic from AI-assisted content vs. human-only content.”

The AI-assisted content drove 23% more organic traffic because the team had time to optimize for search intent, not just word count.

That’s quality lift.

Type 2: Scope Expansion (The Shadow IT Advantage)

This is the metric most organizations completely miss.

Anthropic’s research on how their own engineers use Claude found that 27% of AI-assisted work wouldn’t have been done otherwise.

More than a quarter of the value AI creates isn’t from doing existing work faster; it’s from doing work that was previously impossible within time and budget constraints.

What does scope expansion look like? It often looks like positive Shadow IT.

The “papercuts” phenomenon: Small bugs that never got prioritized finally get fixed. Technical debt gets addressed. Internal tools that were “someday” projects actually get built because a non-engineer could scaffold them with AI.

The capability unlock: Marketing teams doing data analysis they couldn’t do before. Sales teams creating custom materials for each prospect instead of using generic decks. Customer success teams proactively reaching out instead of waiting for problems.

Google Cloud’s data shows 70% of leaders report productivity gains, with 39% seeing ROI specifically from AI enabling work that wasn’t part of the original scope.

How to measure scope expansion:

  • Track projects completed that weren’t in the original roadmap.
  • Ratio of backlog features cleared by non-engineers.
  • Measure customer requests fulfilled that would have been declined due to resource constraints.
  • Document internal tools built that were previously “someday” projects.

One enterprise software company used this metric to justify its AI investment. It tracked:

  • 47 customer feature requests implemented that would have been declined.
  • 12 internal process improvements that had been on the backlog for over a year.
  • 8 competitive vulnerabilities addressed that were previously “known issues.”

None of that shows up in “time saved” calculations. But it showed up clearly in customer retention rates and competitive win rates.

Type 3: Capability Unlock (The Full-Stack Employee)

We used to hire for deep specialization. AI is ushering in the era of the “Generalist-Specialist.”

Anthropic’s internal research found that security teams are building data visualizations. Alignment researchers are shipping frontend code. Engineers are creating marketing materials.

AI lowers the barrier to entry for hard skills.

A marketing manager doesn’t need to know SQL to query a database anymore; she just needs to know what question to ask the AI. This goes well beyond speed or time saved to removing the dependency bottleneck.

When a marketer can run their own analysis without waiting three weeks for the Data Science team, the velocity of the entire organization accelerates. The marketing generalist is now a front-end developer, a data analyst, and a copywriter all at once.

OpenAI’s enterprise data shows 75% of users report being able to complete new tasks they previously couldn’t perform. Coding-related messages increased 36% for workers outside of technical functions.

How to measure capability unlock:

  • Skills accessed (not skills owned).
  • Cross-functional work completed without handoffs.
  • Speed to execute on ideas that would have required hiring or outsourcing.
  • Projects launched without expanding headcount.

A marketing leader at a mid-market B2B company told me her team can now handle routine reporting and standard analyses with AI support, work that previously required weeks on the analytics team’s queue.

Their campaign optimization cycle accelerated 4x, leading to 31% higher campaign performance.

The “time saved” metric would say: “AI saves two hours per analysis.”

The capability unlock metric says: “We can now run 4x more tests per quarter, and our analytics team tackles deeper strategic work.”

Building A Finance-Friendly AI ROI Framework

CFOs care about three questions:

  • Is this increasing revenue? (Not just reducing cost.)
  • Is this creating competitive advantage? (Not just matching competitors.)
  • Is this sustainable? (Not just a short-term productivity bump.)

How to build an AI measurement framework that actually answers those questions:

Step 1: Baseline Your “Before AI” State

Don’t skip this step, or else it will be impossible to prove AI impact later. Before deploying AI, document current throughput, quality metrics, and scope limitations.

Step 2: Define Leading Vs. Lagging Indicators

You need to track both efficiency and expansion, but you need to frame them correctly to Finance.

  • Leading Indicator (Efficiency): Time saved on existing tasks. This predicts potential capacity.
  • Lagging Indicator (Expansion): New work enabled and revenue impact. This proves the value was realized.

Step 3: Track AI Impact On Revenue, Not Just Cost

Connect AI metrics directly to business outcomes:

  • If AI helps customer success teams → Track retention rate changes.
  • If AI helps sales teams → Track win rate and deal velocity changes.
  • If AI helps marketing teams → Track pipeline contribution and conversion rate changes.
  • If AI helps product teams → Track feature adoption and customer satisfaction changes.

Step 4: Measure The “Frontier” Gap

OpenAI’s enterprise research revealed a widening gap between “frontier” workers and median workers. Frontier firms send 2x more messages per seat.

This means identifying the teams extracting real value versus the teams just experimenting.

Step 5: Build The Measurement Infrastructure First

PwC’s 2026 AI predictions warn that measuring iterations instead of outcomes falls short when AI handles complex workflows.

As PwC notes: “If an outcome that once took five days and two iterations now takes fifteen iterations but only two days, you’re ahead.”

The infrastructure you need before you deploy AI involves baseline metrics, clear attribution models, and executive sponsorship to act on insights.

The Measurement Paradox

The organizations best positioned to measure AI ROI are the ones who already had good measurement infrastructure.

According to Kyndryl’s 2025 Readiness Report, most firms aren’t positioned to prove AI ROI because they lack the foundational data discipline.

Sound familiar? This connects directly to the data hygiene challenge I’ve written about previously. You can’t measure AI’s impact if your data is messy, conflicting, or siloed.

The Bottom Line

The AI productivity revolution is well underway. According to Anthropic’s research, current-generation AI could increase U.S. labor productivity growth by 1.8% annually over the next decade, roughly doubling recent rates.

But capturing that value requires measuring the right things.

Forget asking: “How much time does this save?”

Instead, focus on:

  • “What quality improvements are we seeing in output?”
  • “What work is now possible that wasn’t before?”
  • “What capabilities can we access without expanding headcount?”

These are the metrics that convince CFOs to increase AI budgets. These are the metrics that reveal whether AI is actually transforming your business or just making you busy faster.

Time saved is a vanity metric. Expansion enabled is the real ROI.

Measure accordingly.

More Resources:


Featured Image: SvetaZi/Shutterstock

Google’s New User Intent Extraction Method via @sejournal, @martinibuster

Google published a research paper on how to extract user intent from user interactions that can then be used for autonomous agents. The method they discovered uses on-device small models that do not need to send data back to Google, which means that a user’s privacy is protected.

The researchers discovered they were able to solve the problem by splitting it into two tasks. Their solution worked so well it was able to beat the base performance of multi-modal large language models (MLLMs) in massive data centers.

Smaller Models On Browsers And Devices

The focus of the research is on identifying the user intent through the series of actions that a user takes on their mobile device or browser while also keeping that information on the device so that no information is sent back to Google. That means the processing must happen on the device.

They accomplished this in two stages.

  1. The first stage the model on the device summarizes what the user was doing.
  2. The sequence of summaries are then sent to a second model that identifies the user intent.

The researchers explained:

“…our two-stage approach demonstrates superior performance compared to both smaller models and a state-of-the-art large MLLM, independent of dataset and model type.
Our approach also naturally handles scenarios with noisy data that traditional supervised fine-tuning methods struggle with.”

Intent Extraction From UI Interactions

Intent extraction from screenshots and text descriptions of user interactions was a technique that was proposed in 2025 using Multimodal Large Language Models (MLLMs). The researchers say they followed this approach to their problem but using an improved prompt.

The researchers explained that extracting intent is not a trivial problem to solve and that there are multiple errors that can happen along the steps. The researchers use the word trajectory to describe a user journey within a mobile or web application, represented as a sequence of interactions.

The user journey (trajectory) is turned into a formula where each interaction step consists of two parts:

  1. An Observation
    This is the visual state of the screen (screenshot) of where the user is at that step.
  2. An Action
    The specific action that the user performed on that screen (like clicking a button, typing text, or clicking a link).

They described three qualities of a good extracted intent:

  • “faithful: only describes things that actually occur in the trajectory;
  • comprehensive: provides all of the information about the user intent required to re-enact the trajectory;
  • and relevant: does not contain extraneous information beyond what is needed for comprehensiveness.”

Challenging To Evaluate Extracted Intents

The researchers explain that grading extracted intent is difficult because user intents contain complex details (like dates or transaction data) and the user intents are inherently subjective, containing ambiguities, which is a hard problem to solve. The reason trajectories are subjective is because the underlying motivations are ambiguous.

For example, did a user choose a product because of the price or the features? The actions are visible but the motivations are not. Previous research shows that intents between humans matched 80% on web trajectories and 76% on mobile trajectories, so it’s not like a given trajectory can always indicate a specific intent.

Two-Stage Approach

After ruling out other methods like Chain of Thought (CoT) reasoning (because small language models struggled with the reasoning), they chose a two-stage approach that emulated Chain of Thought reasoning.

The researchers explained their two-stage approach:

“First, we use prompting to generate a summary for each interaction (consisting of a visual screenshot and textual action representation) in a trajectory. This stage is
prompt-based as there is currently no training data available with summary labels for individual interactions.

Second, we feed all of the interaction-level summaries into a second stage model to generate an overall intent description. We apply fine-tuning in the second stage…”

The First Stage: Screenshot Summary

The first summary, for the screenshot of the interaction, they divide the summary into two parts, but there is also a third part.

  1. A description of what’s on the screen.
  2. A description of the user’s action.

The third component (speculative intent) is a way to get rid of speculation about the user’s intent, where the model is basically guessing at what’s going on. This third part is labeled “speculative intent” and they actually just get rid of it. Surprisingly, allowing the model to speculate and then getting rid of that speculation leads to a higher quality result.

The researchers cycled through multiple prompting strategies and this was the one that worked the best.

The Second Stage: Generating Overall Intent Description

For the second stage, the researchers fine tuned a model for generating an overall intent description. They fine tuned the model with training data that is made up of two parts:

  1. Summaries that represent all interactions in the trajectory
  2. The matching ground truth that describes the overall intent for each of the trajectories.

The model initially tended to hallucinate because the first part (input summaries) are potentially incomplete, while the “target intents” are complete. That caused the model to learn to fill in the missing parts in order to make the input summaries match the target intents.

They solved this problem by “refining” the target intents by removing details that aren’t reflected in the input summaries. This trained the model to infer the intents based only on the inputs.

The researchers compared four different approaches and settled on this approach because it performed so well.

Ethical Considerations And Limitations

The research paper ends by summarizing potential ethical issues where an autonomous agent might take actions that are not in the user’s interest and stressed the necessity to build the proper guardrails.

The authors also acknowledged limitations in the research that might limit generalizability of the results. For example, the testing was done only on Android and web environments, which means that the results might not generalize to Apple devices. Another limitation is that the research was limited to users in the United States in the English language.

There is nothing in the research paper or the accompanying blog post that suggests that these processes for extracting user intent are currently in use. The blog post ends by communicating that the described approach is helpful:

“Ultimately, as models improve in performance and mobile devices acquire more processing power, we hope that on-device intent understanding can become a building block for many assistive features on mobile devices going forward.”

Takeaways

Neither the blog post about this research or the research paper itself describe the results of these processes as something that might be used in AI search or classic search. It does mention the context of autonomous agents.

The research paper explicitly mentions the context of an autonomous agent on the device that is observing how the user is interacting with a user interface and then be able to infer what the goal (the intent) of those actions are.

The paper lists two specific applications for this technology:

  1. Proactive Assistance:
    An agent that watches what a user is doing for “enhanced personalization” and “improved work efficiency”.
  2. Personalized Memory
    The process enables a device to “remember” past activities as an intent for later.

Shows The Direction Google Is Heading In

While this might not be used right away, it shows the direction that Google is heading, where small models on a device will be watching user interactions and sometimes stepping in to assist users based on their intent. Intent here is used in the sense of understanding what a user is trying to do.

Read Google’s blog post here:

Small models, big results: Achieving superior intent extraction through decomposition

Read the PDF research paper:

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition (PDF)

Featured Image by Shutterstock/ViDI Studio

The Fraud Hiding in Email Signups

Ecommerce merchants know the costs in time, revenue, and inventory of illicit chargebacks.

For many sellers, however, the damage starts with new accounts. Organized fraudsters may sign up hundreds of times, employing valid but fake email addresses.

“Those fake accounts are being created for purposes like card testing with small-value transactions to see if the number is valid before attempting a bigger transaction,” said Diarmuid Thoma, the head of fraud and data strategy at AtData, an email verification and validation service.

Chargebacks

The primary risk to ecommerce shops comes from chargebacks.

When a cardholder disputes a fraudulent transaction, the store loses the sale, the product, shipping costs, and often incurs additional fees from processors.

Repeated disputes may even jeopardize the business’s relationship with its payment processor.

A seller can feel helpless, since the processor authorized the transaction in the first place, but holds shops responsible for accepting stolen card numbers.

Thoma and other email fraud experts believe fake email addresses are often where the problem begins.

Coupon Abuse

A second form of email-based fraud often shows up in ecommerce marketing data.

Fraudsters use fake but valid email addresses to create accounts at scale to extract promotional value.

Automated scripts submit thousands of signups, collect welcome discounts, and then abandon the accounts once the incentive is redeemed.

“A coupon has a monetary value, and when you do it at scale, it becomes a highly profitable business to use and resell,” said Thoma.

The losses from coupon abuse are massive, as much as $89 billion per year, depending on the source, and likely impacting most ecommerce businesses that offer promotional discounts.

Fake Accounts

Thus fake email addresses facilitate stolen payment card testing and promotion harvesting.

This sort of behavior can be relatively difficult to detect, because “about 98% [of the email addresses used], even the fraudulent ones, will be valid,” Thoma said, “because the fraudster needs them to be valid” to receive a coupon and complete a purchase.

In other words, the earliest phase of this kind of ecommerce fraud often looks identical to that of well-meaning shoppers. By the time the first chargeback appears, the damage has existed for weeks.

Conversely, it gives businesses a relatively simple defense: email validation.

Account Patterns

Creating fake accounts at scale starts with email addresses that follow recognizable patterns, allowing fraudsters to generate thousands of variations while bypassing basic validation checks.

For example, here are three common patterns.

Tumbling, where a fraudster rewrites a single underlying address many times.

  • example@example.com
  • ex.ample@example.com
  • e.x.ample@example.com
  • ex.ample+new@example.com

Small changes, such as added characters or formatting differences, allow each signup to appear unique while still routing messages to the same inbox.

Tumbling is particularly effective at evading duplicate-account controls because every address passes standard validation.

Gibberish emails are machine-generated addresses that appear random but follow consistent, automated structures.

Bad actors create these accounts in large batches within seconds or minutes of each other. Thoma described seeing many gibberish emails arriving simultaneously, on the same day and time.

Enumeration relies on generating large numbers of similar addresses, often based on a shared root. “They’re like user1, user2, user3, not necessarily always in sequence,” Thoma said. “It could skip to 10, 15, whatever.”

Such addresses are easy to create automatically and difficult to flag individually, especially when spread across time, domains, or merchants.

Identification

Each of these techniques produces valid, deliverable email addresses, which is why basic validation often fails to stop them.

Even monitoring for these patterns can produce false positives. The behavior of legitimate consumers may appear automated during sales events, product launches, or bulk onboarding.

Hence pattern detection works best when combined with additional signals, such as account age, name consistency, geographic alignment, device behavior, and transaction history.

The goal is not to block accounts based on a single indicator, but to isolate organized fraud before losses escalate into chargebacks.

Prevention

Fraud is often a matter of scale, which is good for very small ecommerce operations. Criminals aren’t aware or see little potential in the theft.

Large online retailers, however, may want to invest in advanced email validation at the time of submission. Validation at this phase typically costs pennies, and when combined with reasonable business rules, should reduce fraud.

America’s coming war over AI regulation

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached a boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry. Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy, one that would position the US to win the global AI race. The move marked a qualified victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation.

In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead, buoyed by mounting public pressure to protect children from chatbots and rein in power-hungry data centers. Meanwhile, dueling super PACs bankrolled by tech moguls and AI-safety advocates will pour tens of millions into congressional and state elections to seat lawmakers who champion their competing visions for AI regulation. 

Trump’s executive order directs the Department of Justice to establish a task force that sues states whose AI laws clash with his vision for light-touch regulation. It also directs the Department of Commerce to starve states of federal broadband funding if their AI laws are “onerous.” In practice, the order may target a handful of laws in Democratic states, says James Grimmelmann, a law professor at Cornell Law School. “The executive order will be used to challenge a smaller number of provisions, mostly relating to transparency and bias in AI, which tend to be more liberal issues,” Grimmelmann says.

For now, many states aren’t flinching. On December 19, New York’s governor, Kathy Hochul, signed the Responsible AI Safety and Education (RAISE) Act, a landmark law requiring AI companies to publish the protocols used to ensure the safe development of their AI models and report critical safety incidents. On January 1, California debuted the nation’s first frontier AI safety law, SB 53—which the RAISE Act was modeled on—aimed at preventing catastrophic harms such as biological weapons or cyberattacks. While both laws were watered down from earlier iterations to survive bruising industry lobbying, they struck a rare, if fragile, compromise between tech giants and AI safety advocates.

If Trump targets these hard-won laws, Democratic states like California and New York will likely take the fight to court. Republican states like Florida with vocal champions for AI regulation might follow suit. Trump could face an uphill battle. “The Trump administration is stretching itself thin with some of its attempts to effectively preempt [legislation] via executive action,” says Margot Kaminski, a law professor at the University of Colorado Law School. “It’s on thin ice.”

But Republican states that are anxious to stay off Trump’s radar or can’t afford to lose federal broadband funding for their sprawling rural communities might retreat from passing or enforcing AI laws. Win or lose in court, the chaos and uncertainty could chill state lawmaking. Paradoxically, the Democratic states that Trump wants to rein in—armed with big budgets and emboldened by the optics of battling the administration—may be the least likely to budge.

In lieu of state laws, Trump promises to create a federal AI policy with Congress. But the gridlocked and polarized body won’t be delivering a bill this year. In July, the Senate killed a moratorium on state AI laws that had been inserted into a tax bill, and in November, the House scrapped an encore attempt in a defense bill. In fact, Trump’s bid to strong-arm Congress with an executive order may sour any appetite for a bipartisan deal. 

The executive order “has made it harder to pass responsible AI policy by hardening a lot of positions, making it a much more partisan issue,” says Brad Carson, a former Democratic congressman from Oklahoma who is building a network of super PACs backing candidates who support AI regulation. “It hardened Democrats and created incredible fault lines among Republicans,” he says. 

While AI accelerationists in Trump’s orbit—AI and crypto czar David Sacks among them—champion deregulation, populist MAGA firebrands like Steve Bannon warn of rogue superintelligence and mass unemployment. In response to Trump’s executive order, Republican state attorneys general signed a bipartisan letter urging the FCC not to supersede state AI laws.

With Americans increasingly anxious about how AI could harm mental health, jobs, and the environment, public demand for regulation is growing. If Congress stays paralyzed, states will be the only ones acting to keep the AI industry in check. In 2025, state legislators introduced more than 1,000 AI bills, and nearly 40 states enacted over 100 laws, according to the National Conference of State Legislatures.

Efforts to protect children from chatbots may inspire rare consensus. On January 7, Google and Character Technologies, a startup behind the companion chatbot Character.AI, settled several lawsuits with families of teenagers who killed themselves after interacting with the bot. Just a day later, the Kentucky attorney general sued Character Technologies, alleging that the chatbots drove children to suicide and other forms of self-harm. OpenAI and Meta face a barrage of similar suits. Expect more to pile up this year. Without AI laws on the books, it remains to be seen how product liability laws and free speech doctrines apply to these novel dangers. “It’s an open question what the courts will do,” says Grimmelmann. 

While litigation brews, states will move to pass child safety laws, which are exempt from Trump’s proposed ban on state AI laws. On January 9, OpenAI inked a deal with a former foe, the child-safety advocacy group Common Sense Media, to back a ballot initiative in California called the Parents & Kids Safe AI Act, setting guardrails around how chatbots interact with children. The measure proposes requiring AI companies to verify users’ age, offer parental controls, and undergo independent child-safety audits. If passed, it could be a blueprint for states across the country seeking to crack down on chatbots. 

Fueled by widespread backlash against data centers, states will also try to regulate the resources needed to run AI. That means bills requiring data centers to report on their power and water use and foot their own electricity bills. If AI starts to displace jobs at scale, labor groups might float AI bans in specific professions. A few states concerned about the catastrophic risks posed by AI may pass safety bills mirroring SB 53 and the RAISE Act. 

Meanwhile, tech titans will continue to use their deep pockets to crush AI regulations. Leading the Future, a super PAC backed by OpenAI president Greg Brockman and the venture capital firm Andreessen Horowitz, will try to elect candidates who endorse unfettered AI development to Congress and state legislatures. They’ll follow the crypto industry’s playbook for electing allies and writing the rules. To counter this, super PACs funded by Public First, an organization run by Carson and former Republican congressman Chris Stewart of Utah, will back candidates advocating for AI regulation. We might even see a handful of candidates running on anti-AI populist platforms.

In 2026, the slow, messy process of American democracy will grind on. And the rules written in state capitals could decide how the most disruptive technology of our generation develops far beyond America’s borders, for years to come.

Measles is surging in the US. Wastewater tracking could help.

This week marked a rather unpleasant anniversary: It’s a year since Texas reported a case of measles—the start of a significant outbreak that ended up spreading across multiple states. Since the start of January 2025, there have been over 2,500 confirmed cases of measles in the US. Three people have died.

As vaccination rates drop and outbreaks continue, scientists have been experimenting with new ways to quickly identify new cases and prevent the disease from spreading. And they are starting to see some success with wastewater surveillance.

After all, wastewater contains saliva, urine, feces, shed skin, and more. You could consider it a rich biological sample. Wastewater analysis helped scientists understand how covid was spreading during the pandemic. It’s early days, but it is starting to help us get a handle on measles.

Globally, there has been some progress toward eliminating measles, largely thanks to vaccination efforts. Such efforts led to an 88% drop in measles deaths between 2000 and 2024, according to the World Health Organization. It estimates that “nearly 59 million lives have been saved by the measles vaccine” since 2000.

Still, an estimated 95,000 people died from measles in 2024 alone—most of them young children. And cases are surging in Europe, Southeast Asia, and the Eastern Mediterranean region.

Last year, the US saw the highest levels of measles in decades. The country is on track to lose its measles elimination status—a sorry fate that met Canada in November after the country recorded over 5,000 cases in a little over a year.

Public health efforts to contain the spread of measles—which is incredibly contagious—typically involve clinical monitoring in health-care settings, along with vaccination campaigns. But scientists have started looking to wastewater, too.

Along with various bodily fluids, we all shed viruses and bacteria into wastewater, whether that’s through brushing our teeth, showering, or using the toilet. The idea of looking for these pathogens in wastewater to track diseases has been around for a while, but things really kicked into gear during the covid-19 pandemic, when scientists found that the coronavirus responsible for the disease was shed in feces.

This led Marlene Wolfe of Emory University and Alexandria Boehm of Stanford University to establish WastewaterSCAN, an academic-led program developed to analyze wastewater samples across the US. Covid was just the beginning, says Wolfe. “Over the years we have worked to expand what can be monitored,” she says.

Two years ago, for a previous edition of the Checkup, Wolfe told Cassandra Willyard that wastewater surveillance of measles was “absolutely possible,” as the virus is shed in urine. The hope was that this approach could shed light on measles outbreaks in a community, even if members of that community weren’t able to access health care and receive an official diagnosis. And that it could highlight when and where public health officials needed to act to prevent measles from spreading. Evidence that it worked as an effective public health measure was, at the time, scant.

Since then, she and her colleagues have developed a test to identify measles RNA. They trialed it at two wastewater treatment plants in Texas between December 2024 and May 2025. At each site, the team collected samples two or three times a week and tested them for measles RNA.

Over that period, the team found measles RNA in 10.5% of the samples they collected, as reported in a preprint paper published at medRxiv in July and currently under review at a peer-reviewed journal. The first detection came a week before the first case of measles was officially confirmed in the area. That’s promising—it suggests that wastewater surveillance might pick up measles cases early, giving public health officials a head start in efforts to limit any outbreaks.

There are more promising results from a team in Canada. Mike McKay and Ryland Corchis-Scott at the University of Windsor in Ontario and their colleagues have also been testing wastewater samples for measles RNA. Between February and November 2025, the team collected samples from a wastewater treatment facility serving over 30,000 people in Leamington, Ontario. 

These wastewater tests are somewhat limited—even if they do pick up measles, they won’t tell you who has measles, where exactly infections are occurring, or even how many people are infected. McKay and his colleagues have begun to make some progress here. In addition to monitoring the large wastewater plant, the team used tampons to soak up wastewater from a hospital lateral sewer.

They then compared their measles test results with the number of clinical cases in that hospital. This gave them some idea of the virus’s “shedding rate.” When they applied this to the data collected from the Leamington wastewater treatment facility, the team got estimates of measles cases that were much higher than the figures officially reported. 

Their findings track with the opinions of local health officials (who estimate that the true number of cases during the outbreak was around five to 10 times higher than the confirmed case count), the team members wrote in a paper published on medRxiv a couple of weeks ago.

There will always be limits to wastewater surveillance. “We’re looking at the pool of waste of an entire community, so it’s very hard to pull in information about individual infections,” says Corchis-Scott.

Wolfe also acknowledges that “we have a lot to learn about how we can best use the tools so they are useful.” But her team at WastewaterSCAN has been testing wastewater across the US for measles since May last year. And their findings are published online and shared with public health officials.

In some cases, the findings are already helping inform the response to measles. “We’ve seen public health departments act on this data,” says Wolfe. Some have issued alerts, or increased vaccination efforts in those areas, for example. “[We’re at] a point now where we really see public health departments, clinicians, [and] families using that information to help keep themselves and their communities safe,” she says.

McKay says his team has stopped testing for measles because the Ontario outbreak “has been declared over.” He says testing would restart if and when a single new case of measles is confirmed in the region, but he also thinks that his research makes a strong case for maintaining a wastewater surveillance system for measles.

McKay wonders if this approach might help Canada regain its measles elimination status. “It’s sort of like [we’re] a pariah now,” he says. If his approach can help limit measles outbreaks, it could be “a nice tool for public health in Canada to [show] we’ve got our act together.”

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

The Download: chatbots for health, and US fights over AI regulation

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

“Dr. Google” had its issues. Can ChatGPT Health do better?  

For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice was so common that it gained the pejorative moniker “Dr. Google.” But times are changing, and many medical-information seekers are now using LLMs. According to OpenAI, 230 million people ask ChatGPT health-related queries each week.  

That’s the context around the launch of OpenAI’s new ChatGPT Health product, which debuted earlier this month. The big question is: can the obvious risks of using AI for health-related queries be mitigated enough for them to be a net benefit? Read the full story

—Grace Huckins

America’s coming war over AI regulation  

In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry.  

Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy. The move marked a victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation.

In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead. Read our story about what’s on the horizon

—Michelle Kim

This story is from MIT Technology Review’s What’s Next series of stories that look across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.  

Measles is surging in the US. Wastewater tracking could help.

This week marked a rather unpleasant anniversary: It’s a year since Texas reported a case of measles—the start of a significant outbreak that ended up spreading across multiple states. Since the start of January 2025, there have been over 2,500 confirmed cases of measles in the US. Three people have died. 

As vaccination rates drop and outbreaks continue, scientists have been experimenting with new ways to quickly identify new cases and prevent the disease from spreading. And they are starting to see some success with wastewater surveillance. Read the full story.

—Jessica Hamzelou 

This story is from The Checkup, our weekly newsletter giving you the inside track on all things health and biotech. Sign up to receive it in your inbox every Thursday.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 The US is dismantling itself
A foreign enemy could not invent a better chain of events to wreck its standing in the world. (Wired $)  
+ We need to talk about whether Donald Trump might be losing it.  (New Yorker $)

2 Big Tech is taking on more debt to fund its AI aspirations
And the bubble just keeps growing. (WP $)
Forget unicorns. 2026 is shaping up to be the year of the “hectocorn.” (The Guardian)
+ Everyone in tech agrees we’re in a bubble. They just can’t agree on what happens when it pops. (MIT Technology Review)

3 DOGE accessed even more personal data than we thought 
Even now, the Trump administration still can’t say how much data is at risk, or what it was used for. (NPR)

4 TikTok has finalized a deal to create a new US entity 
Ending years of uncertainty about its fate in America. (CNN)
Why China is the big winner out of all of this. (FT $)

5 The US is now officially out of the World Health Organization 
And it’s leaving behind nearly $300 million in bills unpaid. (Ars Technica
The US withdrawal from the WHO will hurt us all. (MIT Technology Review)

6 AI-powered disinformation swarms pose a threat to democracy
A would-be autocrat could use them to persuade populations to accept cancelled elections or overturn results. (The Guardian)
The era of AI persuasion in elections is about to begin. (MIT Technology Review)

7 We’re about to start seeing more robots everywhere
But exactly what they’ll look like remains up for debate. (Vox $)
Chinese companies are starting to dominate entire sectors of AI and robotics. (MIT Technology Review)

8 Some people seem to be especially vulnerable to loneliness
If you’re ‘other-directed’, you could particularly benefit from less screentime. (New Scientist $)

9 This academic lost two years of work with a single click
TL;DR: Don’t rely on ChatGPT to store your data. (Nature)

10 How animals develop a sense of direction 🦇🧭
Their ‘internal compass’ seems to be informed by landmarks that help them form a mental map. (Quanta $)

Quote of the day

“The rate at which AI is progressing, I think we have AI that is smarter than any human this year, and no later than next year.”

—Elon Musk simply cannot resist the urge to make wild predictions at Davos, Wired reports. 

One more thing

ADAM DETOUR

Africa fights rising hunger by looking to foods of the past

After falling steadily for decades, the prevalence of global hunger is now on the rise—nowhere more so than in sub-Saharan Africa. 

Africa’s indigenous crops are often more nutritious and better suited to the hot and dry conditions that are becoming more prevalent, yet many have been neglected by science, which means they tend to be more vulnerable to diseases and pests and yield well below their theoretical potential.

Now the question is whether researchers, governments, and farmers can work together in a way that gets these crops onto plates and provides Africans from all walks of life with the energy and nutrition that they need to thrive, whatever climate change throws their way. Read the full story.

—Jonathan W. Rosen

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ The only thing I fancy dry this January is a martini. Here’s how to make one.
+ If you absolutely adore the Bic crystal pen, you might want this lamp
+ Cozy up with a nice long book this winter. ($)
+ Want to eat healthier? Slow down and tune out food ‘noise’. ($)

M&A Advisor on Ecommerce Valuations

Frank Kosarek is the co-founder of BizPort, a mergers-and-acquisitions marketplace launched in November 2025. Before that, he was head of acquisitions for a large ecommerce aggregator.

He says buyers of ecommerce businesses today focus on discretionary earnings, not revenue, and seek recurring sales, such as subscriptions.

He addressed those items, the state of ecommerce M&A, and more in our recent conversation.

Our entire audio is embedded below. The transcript is edited for length and clarity.

Eric Bandholz: Who are you, and what do you do?

Frank Kosarek: I’m the co-founder of BizPort, a marketplace that helps founders exit their companies. I lead BizPort’s ecommerce division, connecting buyers and sellers. Before BizPort, I was the head of mergers and acquisitions at OpenStore, an aggregator in Miami, where I acquired about 50 Shopify brands. That experience exposed me to ecommerce transactions and what founders should and shouldn’t do when preparing to sell their businesses.

One of the most important concepts in exits is the seller’s discretionary earnings. It’s the foundation of most ecommerce valuations. SDE starts with a company’s annual net income (what’s on the tax return), then adds back the owner’s salary and benefits, and any one-time or non-recurring expenses.

For example, if a business earns $250,000 in net income, the founder pays herself $100,000, has $40,000 in benefits, and incurs a one-time $10,000 legal expense, the SDE would be about $400,000. That number is then multiplied by a valuation multiple, typically 2x to 2.5x for most ecommerce brands, and up to 5x for category leaders.

The best advice for founders is to track SDE monthly. Know your true net income and add-backs. It gives you a clear picture of growth and future valuation.

Eric Bandholz: What’s the demand for ecommerce acquisitions?

Frank Kosarek: Ecommerce experienced extreme acceleration in 2020. We saw years of growth compressed into about 12 months as Covid reshaped consumer behavior. During that period, valuation multiples increased, and many ecommerce businesses launched that probably shouldn’t have. Some lacked product-market fit or a dependable, repeat customer base.

What’s changed since then is buyer behavior. Aggregators, in particular, have pulled back or refined their strategies. As a result, sellers can no longer assume there’s an easy, quick exit waiting for them. Acquirers are more selective and more disciplined about what they buy.

Companies that exist at top multiples tend to resemble subscription businesses. A one-time purchase product, such as a kids’ tricycle, doesn’t create much long-term value if the customer never returns. Compare that to categories such as skincare or supplements, where consumers can subscribe and reorder. Buyers focus heavily on lifetime value and how much revenue they can generate from a customer after paying to acquire them.

That’s why brands without repeat or subscription-driven revenue often see leaner valuations, while strong subscription-heavy brands can still command multiples closer to 5x SDE.

Eric Bandholz: What’s the minimum revenue level to sell an ecommerce business?

Frank Kosarek: At BizPort, we generally look for brands doing at least $1 million in annual revenue before getting involved. At that level, ecommerce margins usually provide enough cash flow to underwrite a transaction, whether through a loan, capital injection, or both. That’s typically the minimum size where an acquisition becomes feasible.

When annual revenue reaches $30 million, potential buyers include private equity firms or larger strategic buyers. Those acquirers are more likely to evaluate businesses using revenue multiples instead of earnings multiples. There isn’t a hard line, but it’s an important distinction for founders to be aware of as their brands scale.

Eric Bandholz: How do founders separate personal attachment from fair market value?

Frank Kosarek: M&A for small ecommerce brands is much more art than science. There’s no one-size-fits-all deal structure. Most ecommerce founders have very high expectations for their company’s value, often thinking in large multiples of revenue.

That’s understandable because building a brand from the ground up requires a huge amount of work, much of which doesn’t show up on an income statement. That effort is intangible, and outside buyers can’t fully appreciate it from financials alone. Plus, many founders don’t realize that a multiple of discretionary earnings, not top-line revenue, typically values ecommerce businesses. That often leads to a reality check.

Eric Bandholz: How often do earn-outs fail?

Frank Kosarek: Some sellers want a complete exit with no ongoing involvement, and buyers generally understand that. Still, a smart buyer will usually negotiate a transition period, often three to six months, to help transfer operations and institutional knowledge. Additional support can turn into a short-term consulting agreement in which sellers receive a fixed monthly fee. In that case, sellers no longer have equity or performance-based upside; they’re simply helping with continuity.

I’ve seen situations where sellers and buyers clash operationally or strategically. When that happens, earn-outs often suffer. Sellers miss targets and don’t receive additional payouts, and buyers struggle because the transition doesn’t go smoothly.

Bandholz: What can stop a deal or hurt valuation?

Kosarek: One major piece of advice for sellers is to sell when your numbers are strong. Don’t wait until performance starts to decline or the market turns against you. Be open to exploratory conversations, especially after a banner year. Waiting until the curve crashes makes exits much harder.

Another common mistake is overspending on marketing to inflate top-line revenue. For smaller ecommerce brands, valuation is typically based on profit, not revenue. Pumping the top line at the expense of the bottom usually doesn’t earn a premium.

Another red flag is a lack of operational structure. Buyers don’t want to walk into a business and have to build everything from scratch. They want to see systems and processes in place. That includes working with a third-party logistics provider for fulfillment and returns, clear ownership of marketing functions, and documented processes.

Buyers’ confidence in the deal increases when they can quickly understand how the company operates and distributes work.

Bandholz: Where can people follow you, reach out to you?

Kosarek: Our site is Biz-port.com. You can find me on LinkedIn.

SEO Pulse: Google’s AI Mode Gets Personal, AI Bots Blocked, Domains Matter in Search via @sejournal, @MattGSouthern

Welcome to the week’s SEO Pulse. This week’s updates affect how AI Mode personalizes answers, which AI bots can access your site, and why your domain choice still matters for search visibility.

Here’s what matters for you and your work.

Google Connects Gmail And Photos To AI Mode

Google is rolling out Personal Intelligence, a feature that connects Gmail and Google Photos to AI Mode in Search, delivering personalized responses based on users’ own data.

Key facts: The feature is available to Google AI Pro and AI Ultra subscribers who opt in. It launches as a Labs experiment for eligible users in the U.S. Google says it doesn’t train on users’ Gmail inbox or Photos library.

Why This Matters

This is the personal context feature Google promised at I/O but delayed until now. We covered the delay in December when Nick Fox, Google’s SVP of Knowledge and Information, said the feature was “still to come” with no public timeline.

For the 75 million daily active users Fox reported in AI Mode, this could reduce how much context you need to type to get tailored responses. Google’s examples include trip recommendations that factor in hotel bookings from Gmail and past travel photos, or coat suggestions that account for preferred brands and upcoming travel weather.

The SEO effects depend on how this changes query patterns. If users rely on Google pulling context from their email and photos instead of typing it, queries may get shorter and more ambiguous. That makes it harder to target long-tail searches with explicit intent signals.

What People Are Saying

The early social reaction is framing this as Google pushing AI Mode from “ask and answer” into “already knows your context.” Robby Stein, VP of Product at Google Search, positioned it as a more personal search experience driven by opt-in data connections.

On LinkedIn, the discussion quickly moved to trust and privacy tradeoffs. Michele Curtis, a content marketing specialist, framed personalization as something that only works when trust comes first.

Curtis wrote:

“Personalization only works when trust is architected before intelligence.”

Syed Shabih Haider, founder of Fluxxy AI, raised security concerns about connecting multiple apps.

Haider wrote:

“Personal Intelligence.. yeah the features/benefits look amazing.. but cant help but wonder about the data security. Once all apps are connected, the risk for breach becomes extremely high..”

Read our full coverage: Google Launches Personal Intelligence In AI Mode

AI Training Bots Lose Access While Search Bots Expand

Hostinger analyzed 66 billion bot requests across more than 5 million websites and found AI crawlers are following two different paths. Training bots are losing access as more sites block them. Search and assistant bots are expanding their reach.

Key facts: Hostinger reports 55.67% coverage for GPTBot and 55.67% average coverage for OAI-SearchBot, but their trajectories differ. GPTBot, which collects training data, fell from 84% to 12% over the measurement period. OAI-SearchBot, which powers ChatGPT search, reached that average without the same decline. Googlebot maintained 72% coverage. Apple’s bot reached 24.33%.

Why This Matters

The data confirms what we’ve tracked through multiple studies over the past year. BuzzStream found 79% of top news publishers block at least one training bot. Cloudflare’s Year in Review showed GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives. The Hostinger data puts numbers on the access gap between training and search crawlers.

The distinction matters because these bots serve different purposes. Training bots collect data to build models, while search bots retrieve content in real time when users ask questions. Blocking training bots opts you out of future model updates, and blocking search bots means you won’t appear when AI tools try to cite sources.

As a best practice, check your server logs to see what’s hitting your site, then make blocking decisions based on your goals.

What People Are Saying

On the practical SEO side, the most consistent advice is to separate “training” from “search and retrieval” in your robots decisions where you can. Aleyda Solís previously summarized the idea as blocking GPTBot while still allowing OAI-SearchBot, so your content can be surfaced in ChatGPT-style search experiences without being used for model training.

Solís wrote:

“disallow the ‘GPTbot’ user-agent but allow ‘OAI-SearchBot’”

At the same time, developers and site operators keep emphasizing the cost side of bot traffic. In one r/webdev discussion, a commenter said AI bots made up 95% of requests before blocking and rate limiting.

A commenter in r/webdev wrote:

“95% of the requests to one of our websites was AI bots before I started blocking and rate limiting them”

Read our full coverage: OpenAI Search Crawler Passes 55% Coverage In Hostinger Study

Mueller: Free Subdomain Hosting Makes SEO Harder

Google’s John Mueller warned that free subdomain hosting services create SEO challenges even when publishers do everything else right. The advice came in response to a Reddit post from a publisher whose site shows up in Google but doesn’t appear in normal search results.

Key facts: The publisher uses Digitalplat Domains, a free subdomain service on the Public Suffix List. Mueller explained that free subdomain services attract spam and low-effort content, making it harder for search engines to assess individual site quality. He recommended building direct traffic through promotion and community engagement rather than expecting search visibility first.

Why This Matters

Mueller’s guidance fits a pattern we’ve covered over the years. Google’s Gary Illyes previously warned against cheap TLDs for the same reason. When a domain extension becomes overrun by spam, search engines may struggle to identify legitimate sites among the noise.

Free subdomain hosting creates a specific version of this problem. While the Public Suffix List is meant to treat these subdomains as separate registrable units, the neighborhood signal can still matter. If most subdomains on a host contain spam, Google’s systems have to work harder to find yours.

This affects anyone considering free hosting as a way to test an idea before buying a real domain. The test environment itself becomes part of the evaluation. As Mueller wrote, “Being visible in popular search results is not the first step to becoming a useful & popular web presence.”

For anyone advising clients or building new projects, the domain investment is part of the SEO foundation. Starting on a free subdomain may save money upfront, but it adds friction to visibility that a proper domain avoids.

What SEO Professionals Are Saying

Most of the social sharing here is treating Mueller’s “neighborhood” analogy as the headline takeaway. In the original Reddit exchange, he said publishing on free subdomain hosts can mean opening up shop among “problematic flatmates,” which makes it harder for search systems to understand your site’s value in context.

Mueller wrote:

“opening up shop on a site that’s filled with … potentially problematic ‘flatmates’.”

On LinkedIn, the story is being recirculated as a broader reminder that “cheap or free” hosting decisions can quietly cap performance even when everything else looks right. Fernando Paez V, a digital marketing specialist, called it out as a visibility issue tied to spam-heavy environments.

Paez V wrote:

“free subdomain hosting services … attract spam and make it more difficult for legitimate sites to gain visibility”

Read our full coverage: Google’s Mueller: Free Subdomain Hosting Makes SEO Harder

Theme Of The Week: Access Is The New Advantage

This week’s stories share a common element. Access, whether to personal data, to websites via bots, or to fair evaluation by choosing the right domain, shapes outcomes before any optimization happens.

Personal Intelligence gives AI Mode access to your email and photos, changing what kinds of queries even need to happen. The Hostinger data shows search bots gaining access while training bots get locked out. Mueller’s subdomain warning reminds us that domain choice determines whether Google’s systems give your content a fair evaluation at all.

The common thread is that visibility increasingly depends on what you allow in and where you build. Blocking the wrong bots can reduce your chances of being surfaced or cited in AI tools. Building on a spam-heavy domain puts you at a disadvantage before you write a word. And Google’s AI features now have access to personal context that publishers can’t access or observe.

For practitioners, this means access decisions, both yours and the platforms’, shape results more than incremental optimization gains. Review your crawler permissions and domain choices, and watch how personal context in AI Mode changes the queries you’re trying to rank for.

Top Stories Of The Week:

More Resources:


Featured Image: Accogliente Design/Shutterstock

User Data Is Important In Google Search, Per Liz Reid’s DOJ Filing via @sejournal, @marie_haynes

I found some interesting things in the latest document in the DOJ vs. Google trial. Google has appealed the ruling that says they need to give proprietary information to competitors.

Image Credit: Marie Haynes

Key Takeaways:

  • Google has been ordered to give information to competitors so as not to be an illegal monopoly. Google does not want to give its extensive user-side data away.
  • Google’s data on page quality and freshness is proprietary. They don’t want to give it away.
  • Pages that are indexed are marked up with annotations, including signals that identify spam pages.
  • If spammers got hold of those spam signals, it would make stopping spam difficult.
  • User data is important to Google’s Glue system that stores info on every query searched, what the user saw, and how they interacted with the search results.
  • User data is important for training RankEmbed BERT – one of the deep learning systems behind Search.

OK, let’s get into the interesting stuff!

Google Has Proprietary Page Quality And Freshness Signals

This really isn’t a surprise. I did find it interesting that freshness signals are at the heart of Google’s proprietary secrets.

Image Credit: Marie Haynes

Again, here’s more on the importance of Google’s proprietary freshness signals:

Image Credit: Marie Haynes

Pages That Are Crawled Are Marked Up With ‘Proprietary Page Understanding Annotations’

Every page in Google’s index is marked up with annotations to help it understand the page. These include signals to identify spam and duplicate pages. I’ve written before about how every page in the index has a spam score.

Image Credit: Marie Haynes

Spam Scores Could Be Used To Reverse Engineer Ranking Systems

Google doesn’t want to share information with its competitors on these scores.

Image Credit: Marie Haynes

If the spam scores get out, it could lead to more spamming and more difficulty for Google in fighting spam.

Image Credit: Marie Haynes

Google Builds The Index Using These Marked-Up Pages

The pages that Google has added page understanding annotations on are organized based on how frequently Google expects the content will need to be accessed and how fresh the content needs to be.

Image Credit: Marie Haynes

Only A Fraction Of Pages Make It Into Google’s Index

Google argues that giving competitors a list of indexed URLs will enable them to “forgo crawling and analyzing the larger web, and to instead focus their efforts on crawling only the fraction of pages Google has included in its index.” Building this index costs Google extensive time and money. They don’t want to give that away for free.

Image Credit: Marie Haynes

The Role Of User Data In Google’s Ranking Systems

This is the most interesting part. I feel that we do not pay enough attention to Google’s use of user data. (Stay tuned to my YouTube channel as I’m soon about to release a very interesting video with my thoughts on how user-side data is so important – likely the MOST important factor in Google’s ranking systems.)

User Data Is Used To Build GLUE And RankEmbed Models

Google Glue is a huge table of user activity. It collects the text of the queries searched, the user’s language, location and device type, and information on what appeared on the SERP, what the user clicked on or hovered over, how long they stayed on a SERP, and more.

RankEmbed BERT is even more interesting. RankEmbed BERT is one of the deep learning systems that underpins Search. In the Pandu Nayak testimony, we learned that RankEmbed BERT is used in reranking the results returned by traditional ranking systems. RankEmbed BERT is trained on click and query data from actual users.

The AI systems behind search are continually learning to improve upon presenting searchers with satisfying results. Google looks at what they are clicking on and whether they return to the SERPs or not. Google also runs live experiments that look at what searchers choose to click on and stay on. Those actions help train RankEmbed BERT. It is further fine-tuned by ratings from the quality raters. I will be publishing more on this soon. The take-home point I want to hammer on is that user satisfaction is by far the most important thing we should be optimizing for!

From the Liz Reid document we are analyzing today, we can see that user data is used to train, build, and operate RankEmbed models.

Image Credit: Marie Haynes

Once again, we learn that the user data that is used to train these models includes query, location, time of search, and how the user interacted with what was displayed to them.

Image Credit: Marie Haynes

This is talking about the actions that users take from within the Google Search results. What I really want to know is how much of a role Chrome data uses. Does Google look at whether people are engaging with your pages, filling out your forms, making your recipes, and more? I think they do. The judgment summary of this trial hints that Chrome data is used in the ranking systems, but not a lot of detail is shared.

Image Credit: Marie Haynes

Google Says That If Someone Had The Glue And RankEmbed User Data, They Could Train An LLM With It

This user data is the key to Google’s success.

Image Credit: Marie Haynes

It’s worthwhile reading the whole declaration from Liz Reid.

More Resources:


This post was originally published on Marie Haynes Consulting.


Featured Image: N Universe/Shutterstock