Why chatbots are starting to check your age

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How do tech companies check if their users are kids?

This question has taken on new urgency recently thanks to growing concern about the dangers that can arise when children talk to AI chatbots. For years Big Tech asked for birthdays (that one could make up) to avoid violating child privacy laws, but they weren’t required to moderate content accordingly. Two developments over the last week show how quickly things are changing in the US and how this issue is becoming a new battleground, even among parents and child-safety advocates.

In one corner is the Republican Party, which has supported laws passed in several states that require sites with adult content to verify users’ ages. Critics say this provides cover to block anything deemed “harmful to minors,” which could include sex education. Other states, like California, are coming after AI companies with laws to protect kids who talk to chatbots (by requiring them to verify who’s a kid). Meanwhile, President Trump is attempting to keep AI regulation a national issue rather than allowing states to make their own rules. Support for various bills in Congress is constantly in flux.

So what might happen? The debate is quickly moving away from whether age verification is necessary and toward who will be responsible for it. This responsibility is a hot potato that no company wants to hold.

In a blog post last Tuesday, OpenAI revealed that it plans to roll out automatic age prediction. In short, the company will apply a model that uses factors like the time of day, among others, to predict whether a person chatting is under 18. For those identified as teens or children, ChatGPT will apply filters to “reduce exposure” to content like graphic violence or sexual role-play. YouTube launched something similar last year. 

If you support age verification but are concerned about privacy, this might sound like a win. But there’s a catch. The system is not perfect, of course, so it could classify a child as an adult or vice versa. People who are wrongly labeled under 18 can verify their identity by submitting a selfie or government ID to a company called Persona. 

Selfie verifications have issues: They fail more often for people of color and those with certain disabilities. Sameer Hinduja, who co-directs the Cyberbullying Research Center, says the fact that Persona will need to hold millions of government IDs and masses of biometric data is another weak point. “When those get breached, we’ve exposed massive populations all at once,” he says. 

Hinduja instead advocates for device-level verification, where a parent specifies a child’s age when setting up the child’s phone for the first time. This information is then kept on the device and shared securely with apps and websites. 

That’s more or less what Tim Cook, the CEO of Apple, recently lobbied US lawmakers to call for. Cook was fighting lawmakers who wanted to require app stores to verify ages, which would saddle Apple with lots of liability. 

More signals of where this is all headed will come on Wednesday, when the Federal Trade Commission—the agency that would be responsible for enforcing these new laws—is holding an all-day workshop on age verification. Apple’s head of government affairs, Nick Rossi, will be there. He’ll be joined by higher-ups in child safety at Google and Meta, as well as a company that specializes in marketing to children.

The FTC has become increasingly politicized under President Trump (his firing of the sole Democratic commissioner was struck down by a federal court, a decision that is now pending review by the US Supreme Court). In July, I wrote about signals that the agency is softening its stance toward AI companies. Indeed, in December, the FTC overturned a Biden-era ruling against an AI company that allowed people to flood the internet with fake product reviews, writing that it clashed with President Trump’s AI Action Plan.

Wednesday’s workshop may shed light on how partisan the FTC’s approach to age verification will be. Red states favor laws that require porn websites to verify ages (but critics warn this could be used to block a much wider range of content). Bethany Soye, a Republican state representative who is leading an effort to pass such a bill in her state of South Dakota, is scheduled to speak at the FTC meeting. The ACLU generally opposes laws requiring IDs to visit websites and has instead advocated for an expansion of existing parental controls.

While all this gets debated, though, AI has set the world of child safety on fire. We’re dealing with increased generation of child sexual abuse material, concerns (and lawsuits) about suicides and self-harm following chatbot conversations, and troubling evidence of kids’ forming attachments to AI companions. Colliding stances on privacy, politics, free expression, and surveillance will complicate any effort to find a solution. Write to me with your thoughts. 

Inside OpenAI’s big play for science 

In the three years since ChatGPT’s explosive debut, OpenAI’s technology has upended a remarkable range of everyday activities at home, at work, in schools—anywhere people have a browser open or a phone out, which is everywhere.

Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them.

The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community.

And yet OpenAI is also late to the party. Google DeepMind, the rival firm behind groundbreaking scientific models such as AlphaFold and AlphaEvolve, has had an AI-for-science team for years. (When I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that team, he told me: “This is the reason I started DeepMind … In fact, it’s why I’ve worked my whole career in AI.”)

So why now? How does a push into science fit with OpenAI’s wider mission? And what exactly is the firm hoping to achieve?

I put these questions to Kevin Weil, a vice president at OpenAI who leads the new OpenAI for Science team, in an exclusive interview last week.

On mission

Weil is a product guy. He joined OpenAI a couple of years ago as chief product officer after being head of product at Twitter and Instagram. But he started out as a scientist. He got two-thirds of the way through a PhD in particle physics at Stanford University before ditching academia for the Silicon Valley dream. Weil is keen to highlight his pedigree: “I thought I was going to be a physics professor for the rest of my life,” he says. “I still read math books on vacation.”

Asked how OpenAI for Science fits with the firm’s existing lineup of white-collar productivity tools or the viral video app Sora, Weil recites the company mantra: “The mission of OpenAI is to try and build artificial general intelligence and, you know, make it beneficial for all of humanity.”

Just imagine the future impact this technology could have on science he says: New medicines, new materials, new devices. “Think about it helping us understand the nature of reality, helping us think through open problems. Maybe the biggest, most positive impact we’re going to see from AGI will actually be from its ability to accelerate science.”

He adds: “With GPT-5, we saw that becoming possible.” 

As Weil tells it, LLMs are now good enough to be useful scientific collaborators. They can spitball ideas, suggest novel directions to explore, and find fruitful parallels between new problems and old solutions published in obscure journals decades ago or in foreign languages.

That wasn’t the case a year or so ago. Since it announced its first so-called reasoning model—a type of LLM that can break down problems into multiple steps and work through them one by one—in December 2024, OpenAI has been pushing the envelope of what the technology can do. Reasoning models have made LLMs far better at solving math and logic problems than they used to be. “You go back a few years and we were all collectively mind-blown that the models could get an 800 on the SAT,” says Weil.

But soon LLMs were acing math competitions and solving graduate-level physics problems. Last year, OpenAI and Google DeepMind both announced that their LLMs had achieved gold-medal-level performance in the International Math Olympiad, one of the toughest math contests in the world. “These models are no longer just better than 90% of grad students,” says Weil. “They’re really at the frontier of human abilities.”

That’s a huge claim, and it comes with caveats. Still, there’s no doubt that GPT-5, which includes a reasoning model, is a big improvement on GPT-4 when it comes to complicated problem-solving. Measured against an industry benchmark known as GPQA, which includes more than 400 multiple-choice questions that test PhD-level knowledge in biology, physics, and chemistry, GPT-4 scores 39%, well below the human-expert baseline of around 70%. According to OpenAI, GPT-5.2 (the latest update to the model, released in December) scores 92%. 

Overhyped

The excitement is evident—and perhaps excessive. In October, senior figures at OpenAI, including Weil, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians were quick to point out that in fact what GPT-5 appeared to have done was dig up existing solutions in old research papers, including at least one written in German. That was still useful, but it wasn’t the achievement OpenAI seemed to have claimed. Weil and his colleagues deleted their posts.

Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don’t spend time struggling on a problem that is already solved, that’s an acceleration all of its own.”

He plays down the idea that LLMs are about to come up with a game-changing new discovery. “I don’t think models are there yet,” he says. “Maybe they’ll get there. I’m optimistic that they will.”

But, he insists, that’s not the mission: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”

For Weil, the question is this: “Does science actually happen faster because scientists plus models can do much more, and do it more quickly, than scientists alone? I think we’re already seeing that.”

In November, OpenAI published a series of anecdotal case studies contributed by scientists, both inside and outside the company, that illustrated how they had used GPT-5 and how it had helped. “Most of the cases were scientists that were already using GPT-5 directly in their research and had come to us one way or another saying, ‘Look at what I’m able to do with these tools,’” says Weil.

The key things that GPT-5 seems to be good at are finding references and connections to existing work that scientists were not aware of, which sometimes sparks new ideas; helping scientists sketch mathematical proofs; and suggesting ways for scientists to test hypotheses in the lab.  

“GPT 5.2 has read substantially every paper written in the last 30 years,” says Weil. “And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields.”

“That’s incredibly powerful,” he continues. “You can always find a human collaborator in an adjacent field, but it’s difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter. And in addition to that, I can work with the model late at night—it doesn’t sleep—and I can ask it 10 things in parallel, which is kind of awkward to do to a human.”

Solving problems

Most of the scientists OpenAI reached out to back up Weil’s position.

Robert Scherrer, a professor of physics and astronomy at Vanderbilt University, only played around with ChatGPT for fun (“I used to it rewrite the theme song for Gilligan’s Island in the style of Beowulf, which it did very well,” he tells me) until his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, told him that GPT-5 had helped solve a problem he’d been working on.

Lupsasca gave Scherrer access to GPT-5 Pro, OpenAI’s $200-a-month premium subscription. “It managed to solve a problem that I and my graduate student could not solve despite working on it for several months,” says Scherrer.

It’s not perfect, he says: “GTP-5 still makes dumb mistakes. Of course, I do too, but the mistakes GPT-5 makes are even dumber.” And yet it keeps getting better, he says: “If current trends continue—and that’s a big if—I suspect that all scientists will be using LLMs soon.”

Derya Unutmaz, a professor of biology at the Jackson Laboratory, a nonprofit research institute, uses GPT-5 to brainstorm ideas, summarize papers, and plan experiments in his work studying the immune system. In the case study he shared with OpenAI, Unutmaz used GPT-5 to analyze an old data set that his team had previously looked at. The model came up with fresh insights and interpretations.  

“LLMs are already essential for scientists,” he says. “When you can complete analysis of data sets that used to take months, not using them is not an option anymore.”

Nikita Zhivotovskiy, a statistician at the University of California, Berkeley, says he has been using LLMs in his research since the first version of ChatGPT came out.

Like Scherrer, he finds LLMs most useful when they highlight unexpected connections between his own work and existing results he did not know about. “I believe that LLMs are becoming an essential technical tool for scientists, much like computers and the internet did before,” he says. “I expect a long-term disadvantage for those who do not use them.”

But he does not expect LLMs to make novel discoveries anytime soon. “I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own,” he says. “So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches.”

I also contacted a handful of scientists who are not connected to OpenAI.

Andy Cooper, a professor of chemistry at the University of Liverpool and director of the Leverhulme Research Centre for Functional Materials Design, is less enthusiastic. “We have not found, yet, that LLMs are fundamentally changing the way that science is done,” he says. “But our recent results suggest that they do have a place.”

Cooper is leading a project to develop a so-called AI scientist that can fully automate parts of the scientific workflow. He says that his team doesn’t use LLMs to come up with ideas. But the tech is starting to prove useful as part of a wider automated system where an LLM can help direct robots, for example.

“My guess is that LLMs might stick more in robotic workflows, at least initially, because I’m not sure that people are ready to be told what to do by an LLM,” says Cooper. “I’m certainly not.”

Making errors

LLMs may be becoming more and more useful, but caution is still key. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that had made its way into a scientific journal. “OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea—possibly the first peer-reviewed paper where an LLM generated the core contribution,” Oppenheim posted on X. “One small problem: GPT-5’s idea tests the wrong thing.”

He continued: “GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. Related-sounding, but different. It’s like asking for a COVID test, and the LLM cheerfully hands you a test for chickenpox.”

It is clear that a lot of scientists are finding innovative and intuitive ways to engage with LLMs. It is also clear that the technology makes mistakes that can be so subtle even experts miss them.

Part of the problem is the way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core issue is that LLMs are being trained to validate the user, while science needs tools that challenge us.” In an extreme case, one individual (who was not a scientist) was persuaded by ChatGPT into thinking for months that he’d invented a new branch of mathematics.

Of course, Weil is well aware of the problem of hallucination. But he insists that newer models are hallucinating less and less. Even so, focusing on hallucination might be missing the point, he says.

“One of my teammates here, an ex math professor, said something that stuck with me,” says Weil. “He said: ‘When I’m doing research, if I’m bouncing ideas off a colleague, I’m wrong 90% of the time and that’s kind of the point. We’re both spitballing ideas and trying to find something that works.’”

“That’s actually a desirable place to be,” says Weil. “If you say enough wrong things and then somebody stumbles on a grain of truth and then the other person seizes on it and says, ‘Oh, yeah, that’s not quite right, but what if we—’ You gradually kind of find your trail through the woods.”

This is Weil’s core vision for OpenAI for Science. GPT-5 is good, but it is not an oracle. The value of this technology is in pointing people in new directions, not coming up with definitive answers, he says.

In fact, one of the things OpenAI is now looking at is making GPT-5 dial down its confidence when it delivers a response. Instead of saying Here’s the answer, it might tell scientists: Here’s something to consider.

“That’s actually something that we are spending a bunch of time on,” says Weil. “Trying to make sure that the model has some sort of epistemological humility.”

Watching the watchers

Another thing OpenAI is looking at is how to use GPT-5 to fact-check GPT-5. It’s often the case that if you feed one of GPT-5’s answers back into the model, it will pick it apart and highlight mistakes.

“You can kind of hook the model up as its own critic,” says Weil. “Then you can get a workflow where the model is thinking and then it goes to another model, and if that model finds things that it could improve, then it passes it back to the original model and says, ‘Hey, wait a minute—this part wasn’t right, but this part was interesting. Keep it.’ It’s almost like a couple of agents working together and you only see the output once it passes the critic.”

What Weil is describing also sounds a lot like what Google DeepMind did with AlphaEvolve, a tool that wrapped the firms LLM, Gemini, inside a wider system that filtered out the good responses from the bad and fed them back in again to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.

OpenAI faces stiff competition from rival firms, whose own LLMs can do most, if not all, of the things it claims for its own models. If that’s the case, why should scientists use GPT-5 instead of Gemini or Anthropic’s Claude, families of models that are themselves improving every year? Ultimately, OpenAI for Science may be as much an effort to plant a flag in new territory as anything else. The real innovations are still to come. 

“I think 2026 will be for science what 2025 was for software engineering,” says Weil. “At the beginning of 2025, if you were using AI to write most of your code, you were an early adopter. Whereas 12 months later, if you’re not using AI to write most of your code, you’re probably falling behind. We’re now seeing those same early flashes for science as we did for code.”

He continues: “I think that in a year, if you’re a scientist and you’re not heavily using AI, you’ll be missing an opportunity to increase the quality and pace of your thinking.”

New Microsoft Retail AI Guide Echoes SEO

Microsoft published a playbook early this month to help retailers increase visibility in AI search, browsers, and assistants.

“A guide to AEO and GEO” (PDF) from the heads of Microsoft Shopping and Copilot, and Microsoft Advertising, includes and confirms actionable tips worth the read.

Microsoft’s new guide aims to help retailers increase AI visibility.

GEO vs. AEO

The rise of AI platforms has created a proliferation of ill-defined acronyms. The guide attempts to clarify two of them:

  • GEO. Generative engine optimization. “Optimizes content for generative AI search environments (like LLM-powered engines) to make it discoverable, trustworthy, and authoritative.”
  • AEO. Answer/Agentic Engine Optimization. “Optimizes content for AI agents and assistants (like Copilot or ChatGPT) so they can find, understand, and present answers effectively.”

I question the need for new acronyms, as the concepts have existed for years in traditional search engine optimization. “GEO” is synonymous with “EEAT” — Experience, Expertise, Authoritativeness, Trustworthiness — Google’s term for instructing human quality raters.

“AEO” is akin to optimizing for featured snippets in traditional search results.

The key difference is that GEO and AEO focus on a product’s pre-training data to impact exposure in AI answers.

And GEO extends beyond a site’s content to include external resources such as reviews, Reddit mentions, product-comparison articles, and similar.

Intent-driven product data

To me, the most useful part of the guide reinforces my article on optimizing product feeds for AI. Product feeds and on-page descriptions should clearly address use cases, such as shoes “best for day hikes above 40 degrees.”

The guide also recommends:

  • Product page titles that are detailed and descriptive,
  • Front-loading product descriptions with benefits: who it’s for, the problem it solves, and how it’s better,
  •  Q&As,
  • Comparison tables,
  • Detailed alt text for product images,
  • Complementary products that match the intent,
  • Transcripts for videos.

Social proof

The guide emphasizes the importance of factual entities such as verified customer reviews, certifications, sustainability badges, and partnerships. It warns against using exaggerated or unverifiable claims, stating, “AI systems penalize low-trust language.”

It advises applying social proof consistently across your site and all channels, and verifying any subjective claims about your business or product. For example, if you assert a product is the best in a category, include why, such as “according to [XYZ’s] tests.”

Structured data

Per the guide, structured data markup, such as Schema.org, is key for AI visibility.

However, I’ve seen no evidence to support that recommendation. The guide does not explain how LLMs use Schema. To my knowledge, AI training data does not store Schema markup, and AI bots crawl text-only content.

Yet for live searches, Schema may be helpful because traditional search engines support it, and LLMs rely on those platforms.

Nonetheless, the guide recommends:

  • Schema Types: Product, Offer, AggregateRating, Review, Brand, ItemList, and FAQ.
  • Dynamic fields: price, availability, color, size, SKU, GTIN, and dateModified.
  • ItemList markup for collections and category pages to clarify product groupings.

While helpful, Microsoft’s “A guide to AEO and GEO” doesn’t introduce anything new. The recommendations align with longstanding SEO tactics and reinforce the views of industry pros.

Why Google Gemini Has No Ads Yet: ‘Trust In Your Assistant’ via @sejournal, @MattGSouthern

Google DeepMind CEO Demis Hassabis said Google doesn’t have any current plans to introduce advertising into its Gemini AI assistant, citing unresolved questions about user trust.

Speaking at the World Economic Forum in Davos, Hassabis said AI assistants represent a different product than search. He believes Gemini should be built for users first.

“In the realm of assistants, if you think of the chatbot as an assistant that’s meant to be helpful and ideally in my mind, as they become more powerful, the kind of technology that works for you as the individual,” Hassabis said in an interview with Axios. “That’s what I’d like to see with these systems.”

He said no one in the industry has figured out how advertising fits into that model.

“There is a question about how does ads fit into that model, where you want to have trust in your assistant,” Hassabis said. “I think no one’s really got a full answer to that yet.”

When asked directly about Google’s plans, Hassabis said: “We don’t have any current plans to do it ourselves.”

What Hassabis Said About OpenAI

The comments came days after OpenAI said it plans to begin testing ads in ChatGPT in the coming weeks for logged-in adults in the U.S. on free and Go tiers.

Hassabis said he was “a little bit surprised they’ve moved so early into that.”

He acknowledged advertising has funded much of the consumer internet and can be useful to users when done well. But he warned that poor execution in AI assistants could damage user relationships.

“I think it can be done right, but it can also be done in a way that’s not good,” Hassabis said. “In the end, what we want to do is be the most useful we can be to our users.”

Search Is Different

Hassabis drew a line between AI assistants and search when discussing advertising.

When asked whether his comments applied to Google Search, where the company already shows ads in AI Overviews, he said the two products work differently.

“But there it’s completely different use case because you’ve already just like how it’s always worked with search, you’ve already, you know, we know what your intent is basically and so we can be helpful there,” Hassabis said. “That’s a very different construct.”

Google began rolling out ads in AI Overviews in October 2024 and has continued expanding them since. The company claims AI Overviews generate ad revenue equal to traditional search results.

Why This Matters

This is the second time in two months that a Google executive has said Gemini ads aren’t currently planned.

In December, Google Ads VP Dan Taylor disputed an Adweek report claiming the company had told advertisers to expect Gemini ads in 2026. Taylor called that report “inaccurate” and said Google has “no current plans” to monetize the Gemini app.

Hassabis’s comments reinforce that position but go further by explaining the reasoning. His “technology that works for you” framing suggests Google sees a tension between advertising and the assistant relationship it wants Gemini to build.

Looking Ahead

Google is comfortable expanding ads where user intent is explicit, like search queries triggering AI Overviews. The company is holding back where intent is less defined and the relationship is more personal.

How long Google maintains its current position depends in part on how users respond to advertising in rival assistants.


Featured Image: Screenshot from: youtube.com/@axios, January 2026. 

5 Google Analytics Reports PPC Marketers Should Actually Use via @sejournal, @brookeosmundson

Google Analytics has never been perfect, but it used to feel familiar.

The shift to Google Analytics 4 forced PPC marketers to rethink how they pull insights, not just where to click.

Reports that once lived front and center now take more effort to find. Some require extra setup. Others feel less intuitive than before and that creates a real problem for PPC managers who need answers quickly.

You are expected to explain performance, justify spend, and make optimization decisions, often without the luxury of rebuilding reports or navigating multiple menus.

This article focuses on five Google Analytics reports that still deliver real value for PPC. These are the reports that help you understand audience behavior, uncover expansion opportunities, and connect paid traffic to outcomes the business actually cares about.

1. Audiences Report

As keyword match types continue to loosen and automation plays a larger role in campaign delivery, audience signals matter more than ever.

The Audiences report in GA4 replaces what many marketers previously relied on interest-based reports for, but with a more practical twist. Instead of inferred intent, this report is built on real user behavior.

This report shows how predefined and custom audiences perform across key engagement and conversion metrics. For PPC marketers, the value lies in analyzing audiences tied to meaningful actions, not generic demographic traits.

Use this report to:

  • Identify which audiences are driving actual conversions, not just traffic.
  • Compare performance between converters, cart viewers, repeat visitors, or high-engagement users.
  • Validate which audiences deserve more aggressive bidding or budget allocation.
  • Build and export high-performing audiences directly into Google Ads.

This report is far more actionable than legacy interest segments and aligns better with how PPC campaigns are structured today.

To find this report, navigate to: Reports > User > User Attributes > Audiences.

Navigating the Audiences report in Google Analytics 4 property.
Screenshot by author, January 2026.

This report will only be useful if you have custom audiences set up in GA4. These are behavior-based audiences you define yourself, not prebuilt segments like In-Market or Affinity audiences you may be used to seeing in Google Ads.

GA4 audiences are built from first-party actions such as page views, events, or conversion behavior, which makes them more relevant for PPC optimization but requires upfront configuration.

2. Site Search Report

The Site Search report remains one of the most underused tools for PPC expansion.
By analyzing what users search for once they land on your site, you gain direct insight into unmet expectations and intent gaps.

In GA4, Site Search data lives under event tracking rather than a standalone report.

For PPC teams, this report can:

  • Inform keyword expansion using real user language.
  • Highlight product or content gaps affecting conversion rates.
  • Reveal mismatches between ad messaging and on-site expectations.

Speaking of gaps, the Site Search report can also help product teams understand if additional demands exist for the products offered.

For example, say you have a wedding invitation website that has a decent product assortment for different themed weddings.

When using the Site Search report, you see an increasing number of searches for “rustic” – but none of the website designs have that rustic feel!

This can inform product marketing that there is a demand for this type of product, and they can take action accordingly.

To find the Site Search report, navigate to Reports > Engagement > Events.

Look for the event “view_search_results” and click on it.

Screenshot by author, January 2026

Once clicked, find the “search_term” custom parameter card on the page.

A few important notes on search terms data:

  • Before using this report, you must create a new custom dimension (event-scoped) for the search term results to populate.
  • Google Analytics will only show data once it meets a minimum aggregation threshold.

While it’s not as robust as the previous Site Search report in Universal Analytics, it does provide basic data on the number of events and total users per search term.

3. Referrals Report

Referral traffic is often ignored by PPC teams, which is a missed opportunity.

The Referrals report shows which external sites send users to your website and how those users behave once they arrive.

To find this report, navigate to Reports > Acquisition > Traffic Acquisition.

Google Analytics 4 Referral report navigation
Screenshot by author, January 2026

To view the websites from the Referral channel, click the “+” in the default channel group and choose “Session source/medium.”

Isolating the Referrals report in Google Analytics 4 to identify which websites drove traffic to a website.
Screenshot by author, January 2026

The key features of this report can:

  • Identify third-party sites sending high-quality traffic.
  • Distinguish between low-intent and high-intent referral sources.
  • Build placement-based audiences for Display or Demand Gen testing.

Testing Display placements based on proven referral sources can be a cost-efficient way to expand reach without sacrificing traffic quality.

This is a cost-efficient way to test expanding new PPC efforts responsibly because the referral websites chosen are known to provide high-quality traffic to your website.

4. Top Conversion Paths Report

As marketers, we’re often asked how “Top of Funnel” (TOFU) or brand awareness campaigns are performing.

Leadership typically prioritizes channels that are proven to perform. So, they want to make sure marketing dollars are spent efficiently.

In today’s economy, this is more important than ever.

This Google Analytics report helps analyze and interpret TOFU behavior.

If you’re running any type of campaign beyond Search, this report is absolutely necessary.

Campaigns like YouTube and Display and other paid channels like social media (Meta, Instagram, TikTok, etc.) naturally have different goals and objectives.

TOF campaigns are undoubtedly criticized for “not performing” at the same rate as a Search campaign.

As marketers, this can be frustrating to hear over and over.

Using the Conversions Path report provides a holistic view of how long it takes a user to eventually make a purchase from the initial interaction.

To find this report, navigate to Advertising > Attribution > Conversion paths.

When drilling down to specific campaign performance, I recommend:

  • Add a filter that contains “Session source/medium” to the specific paid channel in question (“google/cpc”, for example).
  • Include an “AND” statement to the filter for “Session campaign” specific to the TOF campaigns in question.
Conversions Path Report in Google Analytics 4.
Screenshot by author, January 2026

In the example above, we found that our Paid Social campaigns should have been credited in more of the early and mid touchpoints!

The key features of this report can:

  • Identify how many touchpoints to final conversion.
  • Analyze complex user journey interactions when multiple channels are involved (especially for longer sale cycles).
  • Report on credited conversions based on the attribution model.

This report can uncover necessary data to support the request for additional marketing dollars in TOF channels.

A win-win for all parties involved.

5. Conversion Events Report

Most PPC accounts optimize toward a single primary conversion. That makes sense for bidding, but it rarely tells the full story of how paid traffic actually contributes to revenue.

The Conversion Events report in Google Analytics 4 allows you to step back and evaluate all meaningful actions users take, not just the final one that gets credit in-platform.

For PPC decision-making, this report helps answer questions that Google Ads alone cannot, such as:

  • Which actions consistently happen before a purchase or lead submission.
  • Whether certain campaigns drive strong intent but fail to close immediately.
  • How different paid channels influence early-stage engagement versus final conversion.

This becomes especially important when evaluating Display, YouTube, Demand Gen, or paid social campaigns. These campaigns often look inefficient when judged solely on last-click performance, but they may drive key actions like product views, pricing page visits, form starts, or repeat sessions.

To find this report, navigate to: Reports > Engagement > Events.

Screenshot by author, January 2026

Conversion analysis in GA4 depends on which events you explicitly mark as conversions in Admin settings. GA4 does not provide a standalone “conversion-only” filter inside the Events report, so accuracy starts with proper event configuration.

Another practical use of this report is diagnosing drop-off points. If a campaign drives high volumes of early conversion events but struggles to generate final conversions, the issue may lie in landing page experience, form friction, or follow-up timing rather than targeting or bidding.

When paired with campaign-level filters from Google Ads, the Conversion Events report helps PPC managers explain why a campaign matters, even when it is not the last touch.

That context is often the difference between cutting a campaign too early and scaling one that is quietly doing its job.

Turn Analytics Into Better PPC Decisions

Google Analytics is not where most PPC optimizations happen day to day. That work still lives inside ad platforms.

But these reports serve a different purpose. They help PPC managers step back and understand how paid traffic behaves once it reaches the site, how users move across channels, and which actions actually signal intent.

Used monthly or quarterly, these reports surface patterns that daily account reviews often miss. They support smarter targeting decisions, clearer performance explanations, and more confident budget conversations.

When you focus on the reports that consistently answer real PPC questions, Google Analytics becomes less of a chore and more of a strategic asset.

More Resources:


Featured Image: MR Chalee/Shutterstock

Why CFOs Are Cutting AI Budgets (And The 3 Metrics That Save Them) via @sejournal, @purnavirji

Every AI vendor pitch follows the same script: “Our tool saves your team 40% of their time on X task.”

The demo looks impressive. The return on investment (ROI) calculator backs it up, showing millions in labor cost savings. You get budget approval. You deploy.

Six months later, your CFO asks: “Where’s the 40% productivity gain in our revenue?”

You realize the saved time went to email and meetings, not strategic work that moves the business forward.

This is the AI measurement crisis playing out in enterprises right now.

According to Fortune’s December 2025 report, 61% of CEOs report increasing pressure to show returns on AI investments. Yet most organizations are measuring the wrong things.

There’s a problem with how we’ve been tracking AI’s value.

Why ‘Time Saved’ Is A Vanity Metric

Time saved sounds compelling in a business case. It’s concrete, measurable, and easy to calculate.

But time saved doesn’t equal value created.

Anthropic’s November 2025 research analyzing 100,000 real AI conversations found that AI reduces task completion time by approximately 80%. Sounds transformative, right?

What that stat doesn’t capture is the Jevons Paradox of AI.

In economics, the Jevons Paradox occurs when technological progress increases the efficiency with which a resource is used, but the rate of consumption of that resource rises rather than falls.

In the corporate world, this is the Reallocation Fallacy. Just because AI completes a task faster doesn’t mean your team is producing more value. It means they’re producing the same output in less time, but then filling that saved time with lower-value work. Think more meetings, longer email threads, and administrative drift.

Google Cloud’s 2025 ROI of AI report, surveying 3,466 business leaders, found that 74% report seeing ROI within the first year, most commonly through productivity and efficiency gains rather than outcome improvements.

But when you dig into what they’re measuring, it’s primarily efficiency gains, and not outcome improvements.

CFOs understand this intuitively. That’s why “time saved” metrics don’t convince finance teams to increase AI budgets.

What does convince them is measuring what AI enables you to do that you couldn’t do before.

The Three Types Of AI Value Nobody’s Measuring

Recent research from Anthropic, OpenAI, and Google reveals a pattern: The organizations seeing real AI ROI are measuring expansion.

Three types of value actually matter:

Type 1: Quality Lift

AI can make work faster, and it makes good work better.

A marketing team using AI for email campaigns can send emails quicker. And they also have time to A/B test multiple subject lines, personalize content by segment, and analyze results to improve the next campaign.

The metric isn’t “time saved writing emails.” The metric is “15% higher email conversion rate.”

OpenAI’s State of Enterprise AI report, based on 9,000 workers across almost 100 enterprises, found that 85% of marketing and product users report faster campaign execution. But the real value shows up in campaign performance, not campaign speed.

How to measure quality lift:

  • Conversion rate improvements (not just task completion speed).
  • Customer satisfaction scores (not just response time).
  • Error reduction rates (not just throughput).
  • Revenue per campaign (not just campaigns launched).

One B2B SaaS company I talked to deployed AI for content creation.

  • Their old metric was “blog posts published per month.”
  • Their new metric became “organic traffic from AI-assisted content vs. human-only content.”

The AI-assisted content drove 23% more organic traffic because the team had time to optimize for search intent, not just word count.

That’s quality lift.

Type 2: Scope Expansion (The Shadow IT Advantage)

This is the metric most organizations completely miss.

Anthropic’s research on how their own engineers use Claude found that 27% of AI-assisted work wouldn’t have been done otherwise.

More than a quarter of the value AI creates isn’t from doing existing work faster; it’s from doing work that was previously impossible within time and budget constraints.

What does scope expansion look like? It often looks like positive Shadow IT.

The “papercuts” phenomenon: Small bugs that never got prioritized finally get fixed. Technical debt gets addressed. Internal tools that were “someday” projects actually get built because a non-engineer could scaffold them with AI.

The capability unlock: Marketing teams doing data analysis they couldn’t do before. Sales teams creating custom materials for each prospect instead of using generic decks. Customer success teams proactively reaching out instead of waiting for problems.

Google Cloud’s data shows 70% of leaders report productivity gains, with 39% seeing ROI specifically from AI enabling work that wasn’t part of the original scope.

How to measure scope expansion:

  • Track projects completed that weren’t in the original roadmap.
  • Ratio of backlog features cleared by non-engineers.
  • Measure customer requests fulfilled that would have been declined due to resource constraints.
  • Document internal tools built that were previously “someday” projects.

One enterprise software company used this metric to justify its AI investment. It tracked:

  • 47 customer feature requests implemented that would have been declined.
  • 12 internal process improvements that had been on the backlog for over a year.
  • 8 competitive vulnerabilities addressed that were previously “known issues.”

None of that shows up in “time saved” calculations. But it showed up clearly in customer retention rates and competitive win rates.

Type 3: Capability Unlock (The Full-Stack Employee)

We used to hire for deep specialization. AI is ushering in the era of the “Generalist-Specialist.”

Anthropic’s internal research found that security teams are building data visualizations. Alignment researchers are shipping frontend code. Engineers are creating marketing materials.

AI lowers the barrier to entry for hard skills.

A marketing manager doesn’t need to know SQL to query a database anymore; she just needs to know what question to ask the AI. This goes well beyond speed or time saved to removing the dependency bottleneck.

When a marketer can run their own analysis without waiting three weeks for the Data Science team, the velocity of the entire organization accelerates. The marketing generalist is now a front-end developer, a data analyst, and a copywriter all at once.

OpenAI’s enterprise data shows 75% of users report being able to complete new tasks they previously couldn’t perform. Coding-related messages increased 36% for workers outside of technical functions.

How to measure capability unlock:

  • Skills accessed (not skills owned).
  • Cross-functional work completed without handoffs.
  • Speed to execute on ideas that would have required hiring or outsourcing.
  • Projects launched without expanding headcount.

A marketing leader at a mid-market B2B company told me her team can now handle routine reporting and standard analyses with AI support, work that previously required weeks on the analytics team’s queue.

Their campaign optimization cycle accelerated 4x, leading to 31% higher campaign performance.

The “time saved” metric would say: “AI saves two hours per analysis.”

The capability unlock metric says: “We can now run 4x more tests per quarter, and our analytics team tackles deeper strategic work.”

Building A Finance-Friendly AI ROI Framework

CFOs care about three questions:

  • Is this increasing revenue? (Not just reducing cost.)
  • Is this creating competitive advantage? (Not just matching competitors.)
  • Is this sustainable? (Not just a short-term productivity bump.)

How to build an AI measurement framework that actually answers those questions:

Step 1: Baseline Your “Before AI” State

Don’t skip this step, or else it will be impossible to prove AI impact later. Before deploying AI, document current throughput, quality metrics, and scope limitations.

Step 2: Define Leading Vs. Lagging Indicators

You need to track both efficiency and expansion, but you need to frame them correctly to Finance.

  • Leading Indicator (Efficiency): Time saved on existing tasks. This predicts potential capacity.
  • Lagging Indicator (Expansion): New work enabled and revenue impact. This proves the value was realized.

Step 3: Track AI Impact On Revenue, Not Just Cost

Connect AI metrics directly to business outcomes:

  • If AI helps customer success teams → Track retention rate changes.
  • If AI helps sales teams → Track win rate and deal velocity changes.
  • If AI helps marketing teams → Track pipeline contribution and conversion rate changes.
  • If AI helps product teams → Track feature adoption and customer satisfaction changes.

Step 4: Measure The “Frontier” Gap

OpenAI’s enterprise research revealed a widening gap between “frontier” workers and median workers. Frontier firms send 2x more messages per seat.

This means identifying the teams extracting real value versus the teams just experimenting.

Step 5: Build The Measurement Infrastructure First

PwC’s 2026 AI predictions warn that measuring iterations instead of outcomes falls short when AI handles complex workflows.

As PwC notes: “If an outcome that once took five days and two iterations now takes fifteen iterations but only two days, you’re ahead.”

The infrastructure you need before you deploy AI involves baseline metrics, clear attribution models, and executive sponsorship to act on insights.

The Measurement Paradox

The organizations best positioned to measure AI ROI are the ones who already had good measurement infrastructure.

According to Kyndryl’s 2025 Readiness Report, most firms aren’t positioned to prove AI ROI because they lack the foundational data discipline.

Sound familiar? This connects directly to the data hygiene challenge I’ve written about previously. You can’t measure AI’s impact if your data is messy, conflicting, or siloed.

The Bottom Line

The AI productivity revolution is well underway. According to Anthropic’s research, current-generation AI could increase U.S. labor productivity growth by 1.8% annually over the next decade, roughly doubling recent rates.

But capturing that value requires measuring the right things.

Forget asking: “How much time does this save?”

Instead, focus on:

  • “What quality improvements are we seeing in output?”
  • “What work is now possible that wasn’t before?”
  • “What capabilities can we access without expanding headcount?”

These are the metrics that convince CFOs to increase AI budgets. These are the metrics that reveal whether AI is actually transforming your business or just making you busy faster.

Time saved is a vanity metric. Expansion enabled is the real ROI.

Measure accordingly.

More Resources:


Featured Image: SvetaZi/Shutterstock

Google’s New User Intent Extraction Method via @sejournal, @martinibuster

Google published a research paper on how to extract user intent from user interactions that can then be used for autonomous agents. The method they discovered uses on-device small models that do not need to send data back to Google, which means that a user’s privacy is protected.

The researchers discovered they were able to solve the problem by splitting it into two tasks. Their solution worked so well it was able to beat the base performance of multi-modal large language models (MLLMs) in massive data centers.

Smaller Models On Browsers And Devices

The focus of the research is on identifying the user intent through the series of actions that a user takes on their mobile device or browser while also keeping that information on the device so that no information is sent back to Google. That means the processing must happen on the device.

They accomplished this in two stages.

  1. The first stage the model on the device summarizes what the user was doing.
  2. The sequence of summaries are then sent to a second model that identifies the user intent.

The researchers explained:

“…our two-stage approach demonstrates superior performance compared to both smaller models and a state-of-the-art large MLLM, independent of dataset and model type.
Our approach also naturally handles scenarios with noisy data that traditional supervised fine-tuning methods struggle with.”

Intent Extraction From UI Interactions

Intent extraction from screenshots and text descriptions of user interactions was a technique that was proposed in 2025 using Multimodal Large Language Models (MLLMs). The researchers say they followed this approach to their problem but using an improved prompt.

The researchers explained that extracting intent is not a trivial problem to solve and that there are multiple errors that can happen along the steps. The researchers use the word trajectory to describe a user journey within a mobile or web application, represented as a sequence of interactions.

The user journey (trajectory) is turned into a formula where each interaction step consists of two parts:

  1. An Observation
    This is the visual state of the screen (screenshot) of where the user is at that step.
  2. An Action
    The specific action that the user performed on that screen (like clicking a button, typing text, or clicking a link).

They described three qualities of a good extracted intent:

  • “faithful: only describes things that actually occur in the trajectory;
  • comprehensive: provides all of the information about the user intent required to re-enact the trajectory;
  • and relevant: does not contain extraneous information beyond what is needed for comprehensiveness.”

Challenging To Evaluate Extracted Intents

The researchers explain that grading extracted intent is difficult because user intents contain complex details (like dates or transaction data) and the user intents are inherently subjective, containing ambiguities, which is a hard problem to solve. The reason trajectories are subjective is because the underlying motivations are ambiguous.

For example, did a user choose a product because of the price or the features? The actions are visible but the motivations are not. Previous research shows that intents between humans matched 80% on web trajectories and 76% on mobile trajectories, so it’s not like a given trajectory can always indicate a specific intent.

Two-Stage Approach

After ruling out other methods like Chain of Thought (CoT) reasoning (because small language models struggled with the reasoning), they chose a two-stage approach that emulated Chain of Thought reasoning.

The researchers explained their two-stage approach:

“First, we use prompting to generate a summary for each interaction (consisting of a visual screenshot and textual action representation) in a trajectory. This stage is
prompt-based as there is currently no training data available with summary labels for individual interactions.

Second, we feed all of the interaction-level summaries into a second stage model to generate an overall intent description. We apply fine-tuning in the second stage…”

The First Stage: Screenshot Summary

The first summary, for the screenshot of the interaction, they divide the summary into two parts, but there is also a third part.

  1. A description of what’s on the screen.
  2. A description of the user’s action.

The third component (speculative intent) is a way to get rid of speculation about the user’s intent, where the model is basically guessing at what’s going on. This third part is labeled “speculative intent” and they actually just get rid of it. Surprisingly, allowing the model to speculate and then getting rid of that speculation leads to a higher quality result.

The researchers cycled through multiple prompting strategies and this was the one that worked the best.

The Second Stage: Generating Overall Intent Description

For the second stage, the researchers fine tuned a model for generating an overall intent description. They fine tuned the model with training data that is made up of two parts:

  1. Summaries that represent all interactions in the trajectory
  2. The matching ground truth that describes the overall intent for each of the trajectories.

The model initially tended to hallucinate because the first part (input summaries) are potentially incomplete, while the “target intents” are complete. That caused the model to learn to fill in the missing parts in order to make the input summaries match the target intents.

They solved this problem by “refining” the target intents by removing details that aren’t reflected in the input summaries. This trained the model to infer the intents based only on the inputs.

The researchers compared four different approaches and settled on this approach because it performed so well.

Ethical Considerations And Limitations

The research paper ends by summarizing potential ethical issues where an autonomous agent might take actions that are not in the user’s interest and stressed the necessity to build the proper guardrails.

The authors also acknowledged limitations in the research that might limit generalizability of the results. For example, the testing was done only on Android and web environments, which means that the results might not generalize to Apple devices. Another limitation is that the research was limited to users in the United States in the English language.

There is nothing in the research paper or the accompanying blog post that suggests that these processes for extracting user intent are currently in use. The blog post ends by communicating that the described approach is helpful:

“Ultimately, as models improve in performance and mobile devices acquire more processing power, we hope that on-device intent understanding can become a building block for many assistive features on mobile devices going forward.”

Takeaways

Neither the blog post about this research or the research paper itself describe the results of these processes as something that might be used in AI search or classic search. It does mention the context of autonomous agents.

The research paper explicitly mentions the context of an autonomous agent on the device that is observing how the user is interacting with a user interface and then be able to infer what the goal (the intent) of those actions are.

The paper lists two specific applications for this technology:

  1. Proactive Assistance:
    An agent that watches what a user is doing for “enhanced personalization” and “improved work efficiency”.
  2. Personalized Memory
    The process enables a device to “remember” past activities as an intent for later.

Shows The Direction Google Is Heading In

While this might not be used right away, it shows the direction that Google is heading, where small models on a device will be watching user interactions and sometimes stepping in to assist users based on their intent. Intent here is used in the sense of understanding what a user is trying to do.

Read Google’s blog post here:

Small models, big results: Achieving superior intent extraction through decomposition

Read the PDF research paper:

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition (PDF)

Featured Image by Shutterstock/ViDI Studio

The Fraud Hiding in Email Signups

Ecommerce merchants know the costs in time, revenue, and inventory of illicit chargebacks.

For many sellers, however, the damage starts with new accounts. Organized fraudsters may sign up hundreds of times, employing valid but fake email addresses.

“Those fake accounts are being created for purposes like card testing with small-value transactions to see if the number is valid before attempting a bigger transaction,” said Diarmuid Thoma, the head of fraud and data strategy at AtData, an email verification and validation service.

Chargebacks

The primary risk to ecommerce shops comes from chargebacks.

When a cardholder disputes a fraudulent transaction, the store loses the sale, the product, shipping costs, and often incurs additional fees from processors.

Repeated disputes may even jeopardize the business’s relationship with its payment processor.

A seller can feel helpless, since the processor authorized the transaction in the first place, but holds shops responsible for accepting stolen card numbers.

Thoma and other email fraud experts believe fake email addresses are often where the problem begins.

Coupon Abuse

A second form of email-based fraud often shows up in ecommerce marketing data.

Fraudsters use fake but valid email addresses to create accounts at scale to extract promotional value.

Automated scripts submit thousands of signups, collect welcome discounts, and then abandon the accounts once the incentive is redeemed.

“A coupon has a monetary value, and when you do it at scale, it becomes a highly profitable business to use and resell,” said Thoma.

The losses from coupon abuse are massive, as much as $89 billion per year, depending on the source, and likely impacting most ecommerce businesses that offer promotional discounts.

Fake Accounts

Thus fake email addresses facilitate stolen payment card testing and promotion harvesting.

This sort of behavior can be relatively difficult to detect, because “about 98% [of the email addresses used], even the fraudulent ones, will be valid,” Thoma said, “because the fraudster needs them to be valid” to receive a coupon and complete a purchase.

In other words, the earliest phase of this kind of ecommerce fraud often looks identical to that of well-meaning shoppers. By the time the first chargeback appears, the damage has existed for weeks.

Conversely, it gives businesses a relatively simple defense: email validation.

Account Patterns

Creating fake accounts at scale starts with email addresses that follow recognizable patterns, allowing fraudsters to generate thousands of variations while bypassing basic validation checks.

For example, here are three common patterns.

Tumbling, where a fraudster rewrites a single underlying address many times.

  • example@example.com
  • ex.ample@example.com
  • e.x.ample@example.com
  • ex.ample+new@example.com

Small changes, such as added characters or formatting differences, allow each signup to appear unique while still routing messages to the same inbox.

Tumbling is particularly effective at evading duplicate-account controls because every address passes standard validation.

Gibberish emails are machine-generated addresses that appear random but follow consistent, automated structures.

Bad actors create these accounts in large batches within seconds or minutes of each other. Thoma described seeing many gibberish emails arriving simultaneously, on the same day and time.

Enumeration relies on generating large numbers of similar addresses, often based on a shared root. “They’re like user1, user2, user3, not necessarily always in sequence,” Thoma said. “It could skip to 10, 15, whatever.”

Such addresses are easy to create automatically and difficult to flag individually, especially when spread across time, domains, or merchants.

Identification

Each of these techniques produces valid, deliverable email addresses, which is why basic validation often fails to stop them.

Even monitoring for these patterns can produce false positives. The behavior of legitimate consumers may appear automated during sales events, product launches, or bulk onboarding.

Hence pattern detection works best when combined with additional signals, such as account age, name consistency, geographic alignment, device behavior, and transaction history.

The goal is not to block accounts based on a single indicator, but to isolate organized fraud before losses escalate into chargebacks.

Prevention

Fraud is often a matter of scale, which is good for very small ecommerce operations. Criminals aren’t aware or see little potential in the theft.

Large online retailers, however, may want to invest in advanced email validation at the time of submission. Validation at this phase typically costs pennies, and when combined with reasonable business rules, should reduce fraud.

America’s coming war over AI regulation

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached a boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry. Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy, one that would position the US to win the global AI race. The move marked a qualified victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation.

In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead, buoyed by mounting public pressure to protect children from chatbots and rein in power-hungry data centers. Meanwhile, dueling super PACs bankrolled by tech moguls and AI-safety advocates will pour tens of millions into congressional and state elections to seat lawmakers who champion their competing visions for AI regulation. 

Trump’s executive order directs the Department of Justice to establish a task force that sues states whose AI laws clash with his vision for light-touch regulation. It also directs the Department of Commerce to starve states of federal broadband funding if their AI laws are “onerous.” In practice, the order may target a handful of laws in Democratic states, says James Grimmelmann, a law professor at Cornell Law School. “The executive order will be used to challenge a smaller number of provisions, mostly relating to transparency and bias in AI, which tend to be more liberal issues,” Grimmelmann says.

For now, many states aren’t flinching. On December 19, New York’s governor, Kathy Hochul, signed the Responsible AI Safety and Education (RAISE) Act, a landmark law requiring AI companies to publish the protocols used to ensure the safe development of their AI models and report critical safety incidents. On January 1, California debuted the nation’s first frontier AI safety law, SB 53—which the RAISE Act was modeled on—aimed at preventing catastrophic harms such as biological weapons or cyberattacks. While both laws were watered down from earlier iterations to survive bruising industry lobbying, they struck a rare, if fragile, compromise between tech giants and AI safety advocates.

If Trump targets these hard-won laws, Democratic states like California and New York will likely take the fight to court. Republican states like Florida with vocal champions for AI regulation might follow suit. Trump could face an uphill battle. “The Trump administration is stretching itself thin with some of its attempts to effectively preempt [legislation] via executive action,” says Margot Kaminski, a law professor at the University of Colorado Law School. “It’s on thin ice.”

But Republican states that are anxious to stay off Trump’s radar or can’t afford to lose federal broadband funding for their sprawling rural communities might retreat from passing or enforcing AI laws. Win or lose in court, the chaos and uncertainty could chill state lawmaking. Paradoxically, the Democratic states that Trump wants to rein in—armed with big budgets and emboldened by the optics of battling the administration—may be the least likely to budge.

In lieu of state laws, Trump promises to create a federal AI policy with Congress. But the gridlocked and polarized body won’t be delivering a bill this year. In July, the Senate killed a moratorium on state AI laws that had been inserted into a tax bill, and in November, the House scrapped an encore attempt in a defense bill. In fact, Trump’s bid to strong-arm Congress with an executive order may sour any appetite for a bipartisan deal. 

The executive order “has made it harder to pass responsible AI policy by hardening a lot of positions, making it a much more partisan issue,” says Brad Carson, a former Democratic congressman from Oklahoma who is building a network of super PACs backing candidates who support AI regulation. “It hardened Democrats and created incredible fault lines among Republicans,” he says. 

While AI accelerationists in Trump’s orbit—AI and crypto czar David Sacks among them—champion deregulation, populist MAGA firebrands like Steve Bannon warn of rogue superintelligence and mass unemployment. In response to Trump’s executive order, Republican state attorneys general signed a bipartisan letter urging the FCC not to supersede state AI laws.

With Americans increasingly anxious about how AI could harm mental health, jobs, and the environment, public demand for regulation is growing. If Congress stays paralyzed, states will be the only ones acting to keep the AI industry in check. In 2025, state legislators introduced more than 1,000 AI bills, and nearly 40 states enacted over 100 laws, according to the National Conference of State Legislatures.

Efforts to protect children from chatbots may inspire rare consensus. On January 7, Google and Character Technologies, a startup behind the companion chatbot Character.AI, settled several lawsuits with families of teenagers who killed themselves after interacting with the bot. Just a day later, the Kentucky attorney general sued Character Technologies, alleging that the chatbots drove children to suicide and other forms of self-harm. OpenAI and Meta face a barrage of similar suits. Expect more to pile up this year. Without AI laws on the books, it remains to be seen how product liability laws and free speech doctrines apply to these novel dangers. “It’s an open question what the courts will do,” says Grimmelmann. 

While litigation brews, states will move to pass child safety laws, which are exempt from Trump’s proposed ban on state AI laws. On January 9, OpenAI inked a deal with a former foe, the child-safety advocacy group Common Sense Media, to back a ballot initiative in California called the Parents & Kids Safe AI Act, setting guardrails around how chatbots interact with children. The measure proposes requiring AI companies to verify users’ age, offer parental controls, and undergo independent child-safety audits. If passed, it could be a blueprint for states across the country seeking to crack down on chatbots. 

Fueled by widespread backlash against data centers, states will also try to regulate the resources needed to run AI. That means bills requiring data centers to report on their power and water use and foot their own electricity bills. If AI starts to displace jobs at scale, labor groups might float AI bans in specific professions. A few states concerned about the catastrophic risks posed by AI may pass safety bills mirroring SB 53 and the RAISE Act. 

Meanwhile, tech titans will continue to use their deep pockets to crush AI regulations. Leading the Future, a super PAC backed by OpenAI president Greg Brockman and the venture capital firm Andreessen Horowitz, will try to elect candidates who endorse unfettered AI development to Congress and state legislatures. They’ll follow the crypto industry’s playbook for electing allies and writing the rules. To counter this, super PACs funded by Public First, an organization run by Carson and former Republican congressman Chris Stewart of Utah, will back candidates advocating for AI regulation. We might even see a handful of candidates running on anti-AI populist platforms.

In 2026, the slow, messy process of American democracy will grind on. And the rules written in state capitals could decide how the most disruptive technology of our generation develops far beyond America’s borders, for years to come.

Measles is surging in the US. Wastewater tracking could help.

This week marked a rather unpleasant anniversary: It’s a year since Texas reported a case of measles—the start of a significant outbreak that ended up spreading across multiple states. Since the start of January 2025, there have been over 2,500 confirmed cases of measles in the US. Three people have died.

As vaccination rates drop and outbreaks continue, scientists have been experimenting with new ways to quickly identify new cases and prevent the disease from spreading. And they are starting to see some success with wastewater surveillance.

After all, wastewater contains saliva, urine, feces, shed skin, and more. You could consider it a rich biological sample. Wastewater analysis helped scientists understand how covid was spreading during the pandemic. It’s early days, but it is starting to help us get a handle on measles.

Globally, there has been some progress toward eliminating measles, largely thanks to vaccination efforts. Such efforts led to an 88% drop in measles deaths between 2000 and 2024, according to the World Health Organization. It estimates that “nearly 59 million lives have been saved by the measles vaccine” since 2000.

Still, an estimated 95,000 people died from measles in 2024 alone—most of them young children. And cases are surging in Europe, Southeast Asia, and the Eastern Mediterranean region.

Last year, the US saw the highest levels of measles in decades. The country is on track to lose its measles elimination status—a sorry fate that met Canada in November after the country recorded over 5,000 cases in a little over a year.

Public health efforts to contain the spread of measles—which is incredibly contagious—typically involve clinical monitoring in health-care settings, along with vaccination campaigns. But scientists have started looking to wastewater, too.

Along with various bodily fluids, we all shed viruses and bacteria into wastewater, whether that’s through brushing our teeth, showering, or using the toilet. The idea of looking for these pathogens in wastewater to track diseases has been around for a while, but things really kicked into gear during the covid-19 pandemic, when scientists found that the coronavirus responsible for the disease was shed in feces.

This led Marlene Wolfe of Emory University and Alexandria Boehm of Stanford University to establish WastewaterSCAN, an academic-led program developed to analyze wastewater samples across the US. Covid was just the beginning, says Wolfe. “Over the years we have worked to expand what can be monitored,” she says.

Two years ago, for a previous edition of the Checkup, Wolfe told Cassandra Willyard that wastewater surveillance of measles was “absolutely possible,” as the virus is shed in urine. The hope was that this approach could shed light on measles outbreaks in a community, even if members of that community weren’t able to access health care and receive an official diagnosis. And that it could highlight when and where public health officials needed to act to prevent measles from spreading. Evidence that it worked as an effective public health measure was, at the time, scant.

Since then, she and her colleagues have developed a test to identify measles RNA. They trialed it at two wastewater treatment plants in Texas between December 2024 and May 2025. At each site, the team collected samples two or three times a week and tested them for measles RNA.

Over that period, the team found measles RNA in 10.5% of the samples they collected, as reported in a preprint paper published at medRxiv in July and currently under review at a peer-reviewed journal. The first detection came a week before the first case of measles was officially confirmed in the area. That’s promising—it suggests that wastewater surveillance might pick up measles cases early, giving public health officials a head start in efforts to limit any outbreaks.

There are more promising results from a team in Canada. Mike McKay and Ryland Corchis-Scott at the University of Windsor in Ontario and their colleagues have also been testing wastewater samples for measles RNA. Between February and November 2025, the team collected samples from a wastewater treatment facility serving over 30,000 people in Leamington, Ontario. 

These wastewater tests are somewhat limited—even if they do pick up measles, they won’t tell you who has measles, where exactly infections are occurring, or even how many people are infected. McKay and his colleagues have begun to make some progress here. In addition to monitoring the large wastewater plant, the team used tampons to soak up wastewater from a hospital lateral sewer.

They then compared their measles test results with the number of clinical cases in that hospital. This gave them some idea of the virus’s “shedding rate.” When they applied this to the data collected from the Leamington wastewater treatment facility, the team got estimates of measles cases that were much higher than the figures officially reported. 

Their findings track with the opinions of local health officials (who estimate that the true number of cases during the outbreak was around five to 10 times higher than the confirmed case count), the team members wrote in a paper published on medRxiv a couple of weeks ago.

There will always be limits to wastewater surveillance. “We’re looking at the pool of waste of an entire community, so it’s very hard to pull in information about individual infections,” says Corchis-Scott.

Wolfe also acknowledges that “we have a lot to learn about how we can best use the tools so they are useful.” But her team at WastewaterSCAN has been testing wastewater across the US for measles since May last year. And their findings are published online and shared with public health officials.

In some cases, the findings are already helping inform the response to measles. “We’ve seen public health departments act on this data,” says Wolfe. Some have issued alerts, or increased vaccination efforts in those areas, for example. “[We’re at] a point now where we really see public health departments, clinicians, [and] families using that information to help keep themselves and their communities safe,” she says.

McKay says his team has stopped testing for measles because the Ontario outbreak “has been declared over.” He says testing would restart if and when a single new case of measles is confirmed in the region, but he also thinks that his research makes a strong case for maintaining a wastewater surveillance system for measles.

McKay wonders if this approach might help Canada regain its measles elimination status. “It’s sort of like [we’re] a pariah now,” he says. If his approach can help limit measles outbreaks, it could be “a nice tool for public health in Canada to [show] we’ve got our act together.”

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.