How Do You Compete In Agentic Commerce? via @sejournal, @Kevin_Indig

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

Agentic commerce transforms organic search from a source of cheap traffic into the mandatory gatekeeper of AI verification. Marketing arbitrage dies; product truth wins.

Image Credit: Kevin Indig

This week, we’re covering:

  • Why agentic commerce filters out marketing-first brands and rewards granular product data.
  • How ChatGPT, Copilot, and Google’s protocols reshape merchant economics and customer relationships.
  • Which feeds to optimize, which protocols to prioritize, and the implementation sequence that matters.
Image Credit: Kevin Indig

Agentic commerce acts as a “great filter,” so to speak, for marketing arbitrage, transforming organic search from a source of cheap traffic into the mandatory gatekeeper of AI verification.

The signal is already visible in the noise. During the 2025 holiday season, AI agents powered 20% of retail sales. Even allowing for loose definitions, the era of agentic commerce has arrived.

All major LLMs now offer direct checkout and new commerce protocols:

  1. ChatGPT has Instant Checkout with Shopify and Etsy, and ACP (Agentic Commerce Protocol).
  2. Microsoft Copilot uses ACP and offers Copilot Checkout with PayPal, Shopify, and Stripe.
  3. Google has embedded checkout in AI Mode and Gemini via its Universal Commerce Protocol (UCP).

The infrastructure question is settled, but the strategic question remains: How do you compete when users don’t need to click through to websites to buy?

1. Agentic Commerce Has A Hole In The Middle

The phrasing “agentic commerce” sets the wrong expectation. Autonomous purchasing, where you give an agent a credit card and monthly allowance to buy on your behalf, is not becoming a reality in the near future.

  • High-priced purchases like plane tickets or cars are too risky to delegate. You have idiosyncratic preferences (airline seat rules, car features) that no agent can reliably model.
  • Low-priced purchases like toilet paper or laundry detergent already have automation via subscription services (Instacart recurring orders, Subscribe & Save). An agent adds no incremental value.
  • The middle ground is smaller than the hype suggests. If high-priced resists delegation and low-priced is already “automated,” where does autonomous purchasing actually generate value?

“Conversational commerce” is a better frame. Instead of 100% automating the act of buying, LLMs compress the funnel by offering far superior research to classic search engines and showing products in the user interface.

  • Models read expert reviews, product specs, ingredient lists, and actual user feedback rather than ranking by keyword bids and conversion history.
  • The value lies in collapsing 14 clicks (Amazon’s disclosed average before purchase) into one or two.

2. Protocols Make Ecommerce “Headless”

The new commerce protocols allow AI agents to directly plug into the backend of your business, instead of crawling your site to show them in a list of search results. Protocols make commerce “headless” and decouple the front from the back-end:

  • Websites become less important as destinations and more important as databases.
  • The game shifts from optimizing landing page design for human eyes to optimizing data feeds for machine ingestion.
  • If your shipping speed, inventory status, or return policy isn’t accessible via API, you are invisible to the agent.

The shift from crawling to protocols collapses the legacy 14-click funnel (search, browse, click, checkout) into just two interactions: (1) the model parses intent by matching expert reviews against real-time inventory, and (2) the user executes a single click to buy using stored credentials.

Image Credit: Kevin Indig

While both protocols, ACP and UCP, enable the same user experience, they offer vastly different terms for the merchant.

OpenAI’s ACP (Agentic Commerce Protocol)

  • The Vision: The “Walled Garden.” OpenAI aims to handle the entire transaction within the chat interface, treating merchants effectively as suppliers.
  • The Trade-off: Efficiency vs. LTV. You gain access to 700 million weekly users, but you lose the direct customer relationship. Because OpenAI currently restricts passing customer emails for marketing, you lose the ability to remarket – effectively killing the 15-20% of Lifetime Value (LTV) that typically comes from post-purchase email flows.

Google’s UCP (Universal Commerce Protocol)

  • The Vision: The “Distributed Layer.” Google extends its Shopping Graph into a transactional layer that sits on top of Search, Lens, and Gemini.
  • The Trade-off: Ownership vs. Competition. Unlike ACP, Google allows merchants to retain the full customer lifecycle, including email rights and loyalty data. The cost is significantly higher competition intensity: Instead of fighting for 10 blue links, you are fighting for one of three “slots” in an AI Overview, making the margin for error in your product data effectively zero.

3. Conversational Commerce Disrupts The Whole Ecosystem

The shift from search to conversation creates a distinct set of winners, losers, and strategic dilemmas.

Buyers get a dramatically better user experience.

  • Discovery: High-consideration purchases (e.g., specific running shoes) shift from clicking through six potentially irrelevant product listing ads to receiving top-tier recommendations based on expert reviews.
  • Cognitive Load: The model handles the research, collapsing the average 14-click journey into one to two interactions.

Merchants face a tradeoff between distribution and control.

  • On ChatGPT: You gain access to early adopters, but lose the direct customer relationship and email marketing rights. You have no leverage over commission rates or recommendation logic.
  • On Google/Copilot: You retain merchant-of-record status, but as the funnel compresses, on-site ad inventory loses value. While conversion rates may rise, total ad revenue falls.

Affiliates die when LLMs disintermediate the click.

  • The Trap: If ChatGPT synthesizes reviews without sending traffic, affiliates stop writing. This creates an “ouroboros” where models train on their own AI-generated output.
  • The Pivot: Publishers must paywall premium content or charge merchants directly for reviews.

Amazon dominates on price and speed, but faces a business model conflict.

  • The Conflict: Retail margins are thin (~1%); profitability comes from the $60 billion advertising business.
  • The Risk: Amazon’s ad machine relies on a 14-click funnel. If conversational commerce compresses this to one click, sponsored product inventory evaporates.
  • The Choice: They must either block crawlers to protect ad revenue (current strategy) or participate and cannibalize it. Walmart joining ChatGPT forces their hand.

Google is best positioned to weather the shift.

  • Parity: They are already monetizing AI Overviews at parity with legacy search.
  • Economics: Higher relevance leads to exploding conversion rates. Advertisers will pay more per click to offset the lower click volume, balancing the ecosystem.

4. SEO Shifts From Optimizing Clicks To Optimizing Ingestion

We are moving from a world of infinite shelf space (10 blue links, endless pagination) to a world of constrained shelf space (three recommendation slots in an AI response).

In this environment, SEO shifts from optimizing for clicks to optimizing for ingestion. The goal isn’t to get a human to visit your landing page; it’s to get your product data into the agent’s context window with enough authority that it recommends you.

The New “Technical SEO”: Feed quality in the legacy model meant site speed, mobile responsiveness, and Core Web Vitals. In the protocol era, technical SEO is feed integrity. Agents don’t “browse” your site; they query your API. Your website becomes less of a visual destination and more of a structured database. The winners will be merchants who treat their product feed as their primary storefront.

The New “On-Page SEO”: Legacy SEO often rewarded articles that simply summarized what everyone else was already saying to rank for broad keywords. LLMs, however, are trained on that consensus. To be cited now, you must provide Information Gain, the delta between what the model already knows and the unique value you provide on top of the consensus.

  • You cannot “market” your way out of inferior specs. If you claim to be the “best running shoe for flat feet,” the model doesn’t look for adjectives; it validates your arch support measurements against podiatry standards in its training set.
  • Your content must shift from general engagement to structured “Product Truth.” LLMs prioritize detailed comparison tables, proprietary test results (e.g., “we dropped this phone 50 times”), and ingredient breakdowns. If your data isn’t structured for easy ingestion/verification, the model will bypass you for a source that is.

The New “Off-Page SEO”: Backlinks still matter, but their function changes. Instead of passing “link juice” for ranking, they now serve as verification sources for reputation synthesis, together with reviews and web mentions.

  • LLMs scrape third-party sites (e.g., Reddit, specialized forums, expert review sites) to form a consensus. A high volume of verified, specific reviews on trusted third-party platforms is the strongest signal you can send.
  • In a world where an AI suggests three options, brand familiarity becomes a tie-breaker. Brand advertising and organic brand building return as a critical lever to ensure users recognize the recommendation the AI provides.

5. The End Of “Marketing Brands”

The last decade allowed white-label brands to arbitrage their way to growth via ads, but agentic commerce acts as the quality filter for this model. While humans are swayed by slick branding, LLMs are dispassionate readers of data that will not recommend a “premium” product when the specs prove it is identical to a generic alternative.

The shift to protocols creates a paradox: Models understand long-tail intent perfectly but fulfill it with fat head inventory.

  • Safety Bias: Models prefer consensus to avoid hallucinations. A niche brand looks like noise; a Category King looks like truth.
  • The RAG Reality: RAG tools typically only scan the top 10-20 search results. Since search engines already favor authority, RAG often just reinforces the incumbents.

The only force that overrides this bias is granular data. Your merchant feed acts as the Claim, but RAG acts as the Trust Layer to verify it.

The market bifurcates:

  • The Incumbents win general intent via “trust” (consensus).
  • The Specialists win specific intent via “granularity” (specs), but only if they rank in the top search results.

If you expose data points the giants ignore (e.g., exact sourcing, chemical analysis), the model’s reasoning engine must select you to fulfill the constraint, but only if you rank on page 1 to be fetched.

Organic search is no longer about the click; it is the prerequisite for agentic verification.


Featured Image: Paulo Bobita/Search Engine Journal

Breaking Into The Black Box: Unlocking Meta’s Product-Level Ad Data

Ecommerce and Meta often go hand in hand. You can give Meta a 20,000-item catalog and a budget, and with its AI-powered Advantage+ campaigns, it’ll try to pair the right person with the right product, whether that’s a new customer or someone who’s already viewed those products before.

But what’s actually happening inside that ad? And is there a way to optimize this “black box” Dynamic Product Ad (DPA) format?

Advertisers can see ad-level performance, but have no platform-native insights on which specific products are being shown, clicked, or ignored within a broad DPA.

Is The Algorithm Making The Right Decisions?

That’s exactly the question we wanted to answer.

There are three common traps brands fall into:

1. Over-segmentation: Brands that want more insight break apart their catalog into niche product sets with tons of DPAs.

  • Pros: You can give each ad a bespoke name, which tells you exactly what’s being served. Nice!
  • Cons: This reduces data density and can kill ROI. There’s also a tendency to try to predict which audiences will respond to which products, which is no longer effective for most brands since Meta’s improved Andromeda updates

2. Convoluted reporting: Brands try to infer what products Meta is prioritizing by pairing Google Analytics 4 session data (sessions by product) to Meta ads data (the campaigns/ads that sent these users).

  • Pros: Enables some analysis without falling into the “over-segmentation” pitfall.
  • Cons: Time-consuming to set up, and incomplete. This method doesn’t tell us anything about product-specific engagement within Meta; we would only be guessing at click-through-rate, spend, and impressions.

3. “Set it and forget”: Brands give up all control and let Meta take the wheel.

  • Pros: Avoids over-segmentation issues.
  • Cons: There’s a big risk in trusting the algorithm. You might be pushing products that get high impressions but low sales, effectively burning your budget and losing efficiency.

Trying to make decisions from just Meta Ads Manager UI data is a risk. Many marketers are still not confident in AI-powered campaigns.

At my agency, we created technology to solve this challenge, but fear not, I can walk you through the exact steps so you can do the same for your brand.

Our pilot client for the new technology was a major bathroom retailer investing heavily in DPAs within conversion campaigns.

Let’s go through the three phases in our journey to overcoming this ecommerce challenge.

Phase One: Surfacing Engagement Data

The first stage was visibility: understanding what was happening now within these “black box” DPA formats.

As I said above, Meta doesn’t directly report which specific product led to a specific purchase within a DPA in the Ads Manager interface. It’s simply not an available breakdown in the same way that age, placement, etc. are offered.

But the good news is that a treasure trove of insight is buried in the Meta APIs:

  1. Meta Marketing API (specifically the Insights API) is the main API we use to get all ad performance data. It’s how we’re pulling the key metrics like spend, impressions, and clicks for each ad_id and product_id.
  2. Meta Commerce Platform API (or Catalog API). This API provides the list of all product_ids and their associated details (like name, price, category, etc.).

Here are the steps:

  1. You first need to pipe API data into a data warehouse (we used BigQuery). Make sure you’re pulling the following metrics from the Insights AP: impressions, clicks, spend, ad_id, product_id. If you aren’t a developer, you can use ETL connectors (like Supermetrics, Funnel.io) to get this data into BigQuery or Google Sheets, or use Python scripts if you have a data team.
  2. Once you have these two data streams, join these APIs in a table, using a specific Join Key. We used Product ID; this is the common thread that must exist in both the Ad data and the Catalog data to make the connection work.

Once you’ve done this, you can view your ad performance data (clicks, impressions), but now with a breakdown by product.

This new, combined dataset was then visualized in a Looker Studio report template. Again, other reporting options are available.

To make sense of the data, we needed an easily navigable report rather than pages of raw data. We built the following visualizations:

Screenshot of Product scatter chart from Impression DPEx tool
Product Scatter Chart, Impression Dynamic Product Explorer (DPEx), (Image from author, December 2025)

Product Scatter Chart: Separating each product into four distinct categories:

  • “Star Performers”: High impressions and high clicks.
  • “Promising Products”: Low impressions but a high click-through rate.
  • “Window Shoppers”: High impressions but very low clicks.
  • “Low Priority”: Low clicks and impressions.
Screenshot of DPEx chart
Top 10 Product Types Chart (Image from author, December 2025)
Screenshot of DPEx chart
Bottom 10 Product Types (Image from author, December 2025)

Top/Bottom Products Bar Charts: See at a glance the top 10 and bottom 10 products by engagement.

Product Details Table: View detailed metrics for each product.

This could all be filtered by product name, product type, availability, and any other metrics we wanted (color, price, etc.).

We produced our first-ever client report for product-level ad engagement, and even with just engagement data, we learned a lot:

Creative: We used the data to improve creative briefs.

  • In our client data report, it was interesting to see how much Meta was pushing non-white products (orange sinks, green baths), despite the fact that 95% of their product sales are traditional white variations.
  • We hadn’t prioritized these products initially for the client, but have now created lots more video and creator content featuring these highly clickable variations.

Product Segmentation: We built powerful, data-driven product sets based on real engagement metrics.

  • For example, we tested showing only our most engaging “Star Performer” products in feed-powered collection ads in our upper funnel campaigns, where usually the algorithm has fewer signals to optimize towards

Efficiency: This automated a complex analysis that was previously unwieldy and time-consuming.

Crucially, for the first time, we had enough evidence to challenge Meta’s “best practice” of using the widest possible product set.

Pitfalls & Key Considerations

This was a great first step, but we knew there were some key areas that just tapping into Meta’s APIs won’t solve:

  • Engagement Vs. Conversions: The major downfall with this is that product-level breakdowns are only available for clicks and impression data, not revenue or conversions. The “Window Shoppers” category, for example, identifies products that get low clicks, but we couldn’t (in this phase) definitively say they don’t lead to sales.
  • Context Is Key: This data is a powerful new diagnostic tool. It tells us what Meta is showing and what users are clicking, which is a huge step forward. The why (e.g., “is this high-impression, low-click item just a high-value product?”) still requires our team’s analysis.

Phase Two: Evolving Meta Engagement Data With GA4 Revenue Data

We knew the above Meta-only data just explores one part of the journey. To evolve, we needed to join with GA4 data to find out what customers are actually buying after they’re interacting with our feed-powered dynamic product ads.

The Technical Bridge: How We Joined the Data

While Phase One relied on ETL connectors to pull Meta’s API data, Phase Two requires a different stream for GA4. We tapped into the native GA4 BigQuery export specifically for purchase events. This provides the raw event-level data, revenue and units sold, for every transaction.

The join isn’t a single step – but relies on two primary keys to connect the datasets:

  • The Ad ID Bridge: To link a GA4 session back to a specific Meta ad, we captured the ad_id via dynamic UTM parameters. By setting your URL parameters to utm_content={{ad.id}}, you create a magic bridge between the click and the session.
  • The Item ID Match: Once the session is linked, we use the Item ID. This must be perfectly aligned so that your Meta product_id and GA4 item_id are identical; otherwise, the model breaks.

Pitfalls & Key Considerations

Joining Meta and GA4 data sounds easy enough, but there were some key blockers to overcome.

Clean Data. The whole model breaks if your Meta ID doesn’t cleanly match your GA4 IDs. You must ensure your product catalogs and your GA4 tagging are perfectly aligned before you start.

However, our second issue is harder to overcome: attribution issues. The GA4 data will almost always show lower conversion numbers than Meta’s UI.

This is because, in our experience, Meta often “over-credits.” It benefits from longer attribution windows, including view-through conversions, and it gives itself full credit for each conversion it measures (rather than spreading out across multiple channels).

GA4 often “under-credits” channels like Meta. It uses data-driven attribution to try and give credit to multiple touchpoints. However, it is unable to completely follow user journeys, especially those that don’t include clicks to the site. This means GA4 doesn’t know to credit a social ad, even if that ad was the deciding factor in the purchase journey.

Although we’d love to be able to get a 1:1 match from each product purchase back to a specific product interacted with on Meta, neither GA4 nor Meta can achieve this insight easily. However, there’s still value in the relative insights and trends.

Here’s an example:

  • Meta’s UI: Reported our “Luxury Bath – Green” product was our top performer last month, with high volumes of clicks and impressions in our dynamic ads.
  • The Problem: When we joined our GA4 data, we saw no sales for that specific bath last month, at all, from any channel!
  • The Assumption: If we only used ad engagement data, we’d assume this product is wasting spend by generating low-quality traffic

But, by looking at all items purchased in those GA4 sessions that originated from the “Luxury Bath – Green” product, we discover that many users who clicked the bath went on to convert, just for the white variation instead.

The Insight: The “Luxury Bath” ad wasn’t a failure; it was a highly effective halo product for our client. As a result, it drew in aspirational customers who then converted to buy other products.

The Action: We can confidently commission creator content, focusing on the green bath, to draw in new users even if we know users are likely to buy a different color when it comes to purchase.

Phase Three: Performance-Enhanced Feeds

Once we had this data at our fingertips, the temptation was to focus on it purely for insights and data.

The next level was even better, using this data to create automated supplementary feeds.

It was time to bring back those four product performance segments from our scatter charts.

Using our feed management tools, we pushed the product performance segments into our Meta product feed as new custom labels. This means we were able to dynamically set new product sets based on product performance, for example, a rule was created to Product Set where Custom Label 0 equals Star Performer.

We could then conduct the following product set tests:

  • “Window Shoppers”: (High impressions, low clicks/sales). Feed these into an exclusion set to understand if efficiency improves when we remove from the feed.
  • “Promising Products”: (High CTR, high CVR, low impressions). Feed these into a scaling set with more budget to understand if demand is hidden.
  • “Star Performers”: (High impressions, high clicks). Feed these into a retargeting set to recapture engaged users with our signature ranges.

Pitfalls & Key Considerations

The tests above are simply examples of hypotheses. However, your mileage will vary! We strongly recommend structured experimentation to understand impacts on overall performance.

Is Your Brand Ready To Break Out Of The ‘Black Box’?

You can partially break out of Meta’s “black box,” and this can be a strategic move for ecommerce brands.

The journey moves from surfacing basic engagement data (Phase One) to joining it with sales data for true, profit-driven insights (Phase Two), and ultimately, to automating your strategy with performance-enhanced feeds (Phase Three).

This is how you move from trusting the algorithm to challenging it with evidence. If you’re a decision-maker wondering where to start, here are the three questions to ask:

  1. “Can you show me which specific products in our catalog are being prioritized by Meta?”
  2. “Are our Meta product_ids and GA4 item_ids identical?”
  3. “Are we capturing the ad.id in our UTM parameters on every single ad?”

If the answers to these questions are “I don’t know,” you’re probably still operating inside the black box. Breaking it open is possible. It just requires the right data, the right technical expertise, and the will to finally see what’s truly driving performance.

More Resources:


Featured Image: Roman Samborskyi/Shutterstock

WP Go Maps Plugin Vulnerability Affects Up To 300K WordPress Sites via @sejournal, @martinibuster

A security advisory was published about a vulnerability affecting the WP Go Maps plugin for WordPress installed on over 300,000 websites. The flaw enables authenticated subscribers to modify map engine settings.

WP Go Maps Plugin

The WP Go Maps plugin is used by local business WordPress sites to display customizable maps on pages and posts, including contact page maps, delivery areas, and store locations. Site owners can manage map markers and map settings without writing code.

The plugin had four vulnerabilities in 2025 and seven vulnerabilities in 2024. Vulnerabilities were discovered in the previous years stretching back to 2019 but not as often.

Vulnerability

The vulnerability can be exploited by authenticated attackers with Subscriber-level access or higher. The Subscriber role is the lowest WordPress permission role. This means an attacker only needs a basic user account to exploit the issue but only if that account level is offered to users on affected websites.

The vulnerability is caused by a missing capability check in the plugin’s processBackgroundAction() function. A capability check is used to verify whether a logged-in user is allowed to perform a specific action. Because this check is missing, the function processes requests from users who do not have permission to change plugin settings.

As a result, authenticated attackers with Subscriber-level access can modify global map engine settings used by the plugin. These settings apply site-wide and affect how the plugin functions across the website.

Wordfence described the vulnerability as an unauthorized modification of data caused by a missing capability check. In practice, this means the plugin allows low-privileged users to change global settings that should be restricted to administrators.

The Wordfence advisory explains:

“The WP Go Maps (formerly WP Google Maps) plugin for WordPress is vulnerable to unauthorized modification of data due to a missing capability check on the processBackgroundAction() function in all versions up to, and including, 10.0.04. This makes it possible for authenticated attackers, with Subscriber-level access and above, to modify global map engine settings”

Any site running an affected version of the plugin with subscriber level registration enabled is exposed to authenticated attackers.

The vulnerability affects all versions of WP Go Maps up to and including version 10.0.04. A patch is available. Site owners are recommended to update the WP Go Maps plugin to version 10.0.05 or newer to fix the vulnerability.

Featured Image by Shutterstock/Dean Drobot

Sam Altman Says OpenAI “Screwed Up” GPT-5.2 Writing Quality via @sejournal, @MattGSouthern

Sam Altman said OpenAI “screwed up” GPT-5.2’s writing quality during a developer town hall Monday evening.

When asked about user feedback that GPT-5.2 produces writing that’s “unwieldy” and “hard to read” compared to GPT-4.5, Altman was blunt.

He said:

“I think we just screwed that up. We will make future versions of GPT 5.x hopefully much better at writing than 4.5 was.”

Altman explained that OpenAI made a deliberate choice to focus GPT-5.2’s development on technical capabilities:

“We did decide, and I think for good reason, to put most of our effort in 5.2 into making it super good at intelligence, reasoning, coding, engineering, that kind of thing. And we have limited bandwidth here, and sometimes we focus on one thing and neglect another.”

How OpenAI Positioned Each Model

The contrast between GPT-4.5 and GPT-5.2 shows where OpenAI focused its resources.

When OpenAI introduced GPT-4.5 in February 2025, the company emphasized natural interaction and writing. OpenAI said interacting with GPT-4.5 “feels more natural” and called it “useful for tasks like improving writing.”

GPT-5.2’s announcement took a different direction. OpenAI positioned it as the most capable model series yet for professional knowledge work, with improvements in creating spreadsheets, building presentations, writing code, and handling complex, multi-step projects.

The release post spotlights spreadsheets, presentations, tool use, and coding. Writing appears more briefly, with technical writing noted as an improvement for GPT-5.2 Instant. But Altman’s comments suggest the overall writing experience still fell short for users comparing it to GPT-4.5.

Why This Matters

We’ve covered the iterative changes to ChatGPT since GPT-5 launched in August, including updates to warmth and tone and the GPT-5.1 instruction-following improvements. OpenAI regularly adjusts model behavior based on user feedback, and regressions in one area while improving another aren’t new.

What’s unusual is hearing Altman acknowledge a tradeoff this directly. For anyone using ChatGPT output in client-facing work, drafts, or polished writing, this explains why outputs may have changed. Model upgrades don’t guarantee improvement across every capability.

If you rely on ChatGPT for writing, treat model updates like any other dependency change. Re-test your prompts when defaults change, and keep a fallback if output quality matters for your workflow.

Looking Ahead

Altman said he believes “the future is mostly going to be about very good general purpose models” and that even coding-focused models should “write well, too.”

No timeline was given for when GPT-5.x writing improvements will ship. OpenAI typically iterates on model behavior through point releases, so changes could arrive gradually rather than in a single update.

Hear Altman’s full statement in the video below:


Featured Image: FotoField/Shutterstock

The Download: why LLMs are like aliens, and the future of head transplants

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Meet the new biologists treating LLMs like aliens  

How large is a large language model? We now coexist with machines so vast and so complicated that nobody quite understands what they are, how they work, or what they can really do—not even the people who build them.

That’s a problem. Even though nobody fully understands how it works—and thus exactly what its limitations might be—hundreds of millions of people now use this technology every day. 

To help overcome our ignorance, researchers are studying LLMs as if they were doing biology or neuroscience on vast living creatures—city-size xenomorphs that have appeared in our midst. And they’re discovering that large language models are even weirder than they thought. Read the full story.

—Will Douglas Heaven

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we publish each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

And mechanistic interpretability, the technique these researchers are using to try and understand AI models, is one of our 10 Breakthrough Technologies for 2026. Check out the rest of the list here!

Job titles of the future: Head-transplant surgeon

The Italian neurosurgeon Sergio Canavero has been preparing for a surgery that might never happen. His idea? Swap a sick person’s head—or perhaps just the brain—onto a younger, healthier body.

Canavero caused a stir in 2017 when he announced that a team he advised in China had exchanged heads between two corpses. But he never convinced skeptics that his technique could succeed—or to believe his claim that a procedure on a live person was imminent.

Canavero may have withdrawn from the spotlight, but the idea of head transplants isn’t going away. Instead, he says, the concept has recently been getting a fresh look from life-extension enthusiasts and stealth Silicon Valley startups. Read the full story.

—Antonio Regalado

This story is from the latest print issue of MIT Technology Review magazine, which is all about exciting innovations. If you haven’t already, subscribe now to receive future issues once they land.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Big Tech is facing multiple high-profile social media addiction lawsuits 
Meta, TikTok and YouTube will face parents’ accusations in court this week. (WP $)
+ It’s the first time they’re defending against these claims before a jury in a court of law. (CNN)

2 Power prices are surging in the world’s largest data center hub
Virginia is struggling to meet record demand during a winter storm, partly because of the centers’ electricity demands. (Reuters)
+ Why these kinds of violent storms are getting harder to forecast. (Vox)
+ AI is changing the grid. Could it help more than it harms? (MIT Technology Review)

3 TikTok has started collecting even more data on its users
Including precise information about their location. (Wired $)

4 ICE-watching groups are successfully fighting DHS efforts to unmask them
An anonymous account holder sued to block ICE from identifying them—and won. (Ars Technica)

5 A new wave of AI companies want to use AI to make AI better
The AI ouroboros is never-ending. (NYT $)
+ Is AI really capable of making bona fide scientific advancements? (Undark)
+ AI trained on AI garbage spits out AI garbage. (MIT Technology Review)

6 Iran is testing a two-tier internet
Meaning its current blackout could become permanent. (Rest of World)

7 Don’t believe the humanoid robot hype
Even a leading robot maker admits that at best, they’re only half as efficient as humans. (FT $)
+ Tesla wants to put its Optimus bipedal machine to work in its Austin factory. (Insider)
+ Why the humanoid workforce is running late. (MIT Technology Review)

8 AI is changing how manufacturers create new products
Including thinner chewing gum containers and new body wash odors. (WSJ $)
+ AI could make better beer. Here’s how. (MIT Technology Review)

9 New Jersey has had enough of e-bikes 🚲
But will other US states follow its lead? (The Verge)

10 Sci-fi writers are cracking down on AI
Human-produced works only, please. (TechCrunch)
+ San Diego Comic-Con was previously a safe space for AI-generated art. (404 Media)
+ Generative AI is reshaping South Korea’s webcomics industry. (MIT Technology Review)

Quote of the day

“Choosing American digital technology by default is too easy and must stop.”

—Nicolas Dufourcq, head of French state-owned investment bank Bpifrance, makes his case for why Big European companies should use European-made software as tensions with the US rise, the Wall Street Journal reports.

One more thing

The return of pneumatic tubes

Pneumatic tubes were once touted as something that would revolutionize the world. In science fiction, they were envisioned as a fundamental part of the future—even in dystopias like George Orwell’s 1984, where they help to deliver orders for the main character, Winston Smith, in his job rewriting history to fit the ruling party’s changing narrative.

In real life, the tubes were expected to transform several industries in the late 19th century through the mid-20th. For a while, the United States took up the systems with gusto.

But by the mid to late 20th century, use of the technology had largely fallen by the wayside, and pneumatic tube technology became virtually obsolete. Except in hospitals. Read the full story.

—Vanessa Armstrong

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ You really can’t beat the humble jacket potato for a cheap, comforting meal. 
+ These tips might help you whenever anxiety strikes. ($)
+ There are some amazing photos in this year’s Capturing Ecology awards.
+ You can benefit from meditation any time, anywhere. Give it a go!

The power of sound in a virtual world

In an era where business, education, and even casual conversations occur via screens, sound has become a differentiating factor. We obsess over lighting, camera angles, and virtual backgrounds, but how we sound can be just as critical to credibility, trust, and connection.

That’s the insight driving Erik Vaveris, vice president of product management and chief marketing officer at Shure, and Brian Scholl, director of the Perception & Cognition Laboratory at Yale University. Both see audio as more than a technical layer: It’s a human factor shaping how people perceive intelligence, trustworthiness, and authority in virtual settings.

“If you’re willing to take a little bit of time with your audio set up, you can really get across the full power of your message and the full power of who you are to your peers, to your employees, your boss, your suppliers, and of course, your customers,” says Vaveris.

Scholl’s research shows that poor audio quality can make a speaker seem less persuasive, less hireable, and even less credible.

“We know that [poor] sound doesn’t reflect the people themselves, but we really just can’t stop ourselves from having those impressions,” says Scholl. “We all understand intuitively that if we’re having difficulty being understood while we’re talking, then that’s bad. But we sort of think that as long as you can make out the words I’m saying, then that’s probably all fine. And this research showed in a somewhat surprising way, to a surprising degree, that this is not so.”

For organizations navigating hybrid work, training, and marketing, the stakes have become high.

Vaveris points out that the pandemic was a watershed moment for audio technology. As classrooms, boardrooms, and conferences shifted online almost overnight, demand accelerated for advanced noise suppression, echo cancellation, and AI-driven processing tools that make meetings more seamless. Today, machine learning algorithms can strip away keyboard clicks or reverberation and isolate a speaker’s voice in noisy environments. That clarity underpins the accuracy of AI meeting assistants that can step in to transcribe, summarize, and analyze discussions.

The implications across industries are rippling. Clearer audio levels the playing field for remote participants, enabling inclusive collaboration. It empowers executives and creators alike to produce broadcast-quality content from the comfort of their home office. And it offers companies new ways to build credibility with customers and employees without the costly overhead of traditional production.

Looking forward, the convergence of audio innovation and AI promises an even more dynamic landscape: from real-time captioning in your native language to audio filtering, to smarter meeting tools that capture not only what is said but how it’s said, and to technologies that disappear into the background while amplifying the human voice at the center.

“There’s a future out there where this technology can really be something that helps bring people together,” says Vaveris. “Now that we have so many years of history with the internet, we know there’s usually two sides to the coin of technology, but there’s definitely going to be a positive side to this, and I’m really looking forward to it.

In a world increasingly mediated by screens, sound may prove to be the most powerful connector of all.

This episode of Business Lab is produced in partnership with Shure.

Full Transcript

Megan Tatum: From MIT Technology Review, I’m Megan Tatum, and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

This episode is produced in partnership with Shure.

Our topic today is the power of sound. As our personal and professional lives become increasingly virtual, audio is emerging as an essential tool for everything from remote work to virtual conferences to virtual happy hour. While appearance is often top of mind in video conferencing and streaming, audio can be as or even more important, not only to effective communication, but potentially to brand equity for both the speaker and the company.

Two words for you: crystal clear.

My guests today are Erik Vaveris, VP of Product Management and Chief Marketing Officer at Shure, and Brian Scholl, Director of the Perception & Cognition Laboratory at Yale University.

Welcome, Erik and Brian.

Erik Vaveris: Thank you, Megan. And hello, Brian. Thrilled to be here today.

Brian Scholl: Good afternoon, everyone.

Megan: Fantastic. Thank you both so much for being here. Erik, let’s open with a bit of background. I imagine the pandemic changed the audio industry in some significant ways, given the pivot to our modern remote hybrid lifestyles. Could you talk a bit about that journey and some of the interesting audio advances that arose from that transformative shift?

Erik: Absolutely, Megan. That’s an interesting thing to think about now being here in 2025. And if you put yourself back in those moments in 2020, when things were fully shut down and everything was fully remote, the importance of audio quality became immediately obvious. As people adopted Zoom or Teams or platforms like that overnight, there were a lot of technical challenges that people experienced, but the importance of how they were presenting themselves to people via their audio quality was a bit less obvious. As Brian’s noted in a lot of the press that he’s received for his wonderful study, we know how we look on video. We can see ourselves back on the screen, but we don’t know how we sound to the people with whom we’re speaking.

If a meeting participant on the other side can manage to parse the words that you’re saying, they’re not likely to speak up and say, “Hey, I’m having a little bit of trouble hearing you.” They’ll just let the meeting continue. And if you don’t have a really strong level of audio quality, you’re asking the people that you’re talking to devote way too much brainpower to just determining the words that you’re saying. And you’re going to be fatiguing to listen to. And your message won’t come across. In contrast, if you’re willing to take a little bit of time with your audio set up, you can really get across the full power of your message and the full power of who you are to your peers, to your employees, your boss, your suppliers, and of course your customers. Back in 2020, this very quickly became a marketing story that we had to tell immediately.

And I have to say, it’s so gratifying to see Brian’s research in the news because, to me, it was like, “Yes, this is what we’ve been experiencing. And this is what we’ve been trying to educate people about.” Having the real science to back it up means a lot. But from that, development on improvements to key audio processing algorithms accelerated across the whole AV industry.

I think, Megan and Brian, you probably remember hearing loud keyboard clicking when you were on calls and meetings, or people eating potato chips and things like that back on those. But you don’t hear that much today because most platforms have invested in AI-trained algorithms to remove undesirable noises. And I know we’re going to talk more about that later on.

But the other thing that happened, thankfully, was that as we got into the late spring and summer of 2020, was that educational institutions, especially universities, and also businesses realized that things were going to need to change quickly. Nothing was going to be the same. And universities realized that all classrooms were going to need hybrid capabilities for both remote students and students in the classroom. And that helped the market for professional AV equipment start to recover because we had been pretty much completely shut down in the earlier months. But that focus on hybrid meeting spaces of all types accelerated more investment and more R&D into making equipment and further developing those key audio processing algorithms for more and different types of spaces and use cases. And since then, we’ve really seen a proliferation of different types of unobtrusive audio capture devices based on arrays of microphones and the supporting signal processing behind them. And right now, machine-learning-trained signal processing is really the norm. And that all accelerated, unfortunately, because of the pandemic.

Megan: Yeah. Such an interesting period of change, as you say. And Brian, what did you observe and experience in academia during that time? How did that time period affect the work at your lab?

Brian: I’ll admit, Megan, I had never given a single thought to audio quality or anything like that, certainly until the pandemic hit. I was thrown into this, just like the rest of the world was. I don’t believe I’d ever had a single video conference with a student or with a class or anything like that before the pandemic hit. But in some ways, our experience in universities was quite extreme. I went on a Tuesday from teaching an in-person class with 300 students to being on Zoom with everyone suddenly on a Thursday. Business meetings come in all shapes and sizes. But this was quite extreme. This was a case where suddenly I’m talking to hundreds and hundreds of people over Zoom. And every single one of them knows exactly what I sound like, except for me, because I’m just speaking my normal voice and I have no idea how it’s being translated through all the different levels of technology.

I will say, part of the general rhetoric we have about the pandemic focuses on all the negatives and the lack of personal connection and nuance and the fact that we can’t see how everyone’s paying attention to each other. Our experience was a bit more mixed. I’ll just tell you one anecdote. Shortly after the pandemic started, I started teaching a seminar with about 20 students. And of course, this was still online. What I did is I just invited, for whatever topic we were discussing on any given day, I sent a note to whoever was the clear world leader in the study of whatever that topic was. I said, “Hey, don’t prepare a talk. You don’t have to answer any questions. But just come join us on Zoom and just participate in the conversation. The students will have read some of your work.”

Every single one of them said, “Let me check my schedule. Oh, I’m stuck at home for a year. Sure. I’d be happy to do that.” And that was quite a positive. The students got to meet a who’s who of cognitive science from this experience. And it’s true that there were all these technological difficulties, but that would never, ever have happened if we were teaching the class in real life. That would’ve just been way too much travel and airfare and hotel and scheduling and all of that. So, it was a mixed bag for us.

Megan: That’s fascinating.

Erik: Yeah. Megan, can I add?

Megan: Of course.

Erik: That is really interesting. And that’s such a cool idea. And it’s so wonderful that that worked out. I would say that working for a global company, we like to think that, “Oh, we’re all together. And we’re having these meetings. And we’re in the same room,” but the reality was we weren’t in the same room. And there hadn’t been enough attention paid to the people who were conferencing in speaking not their native language in a different time zone, maybe pretty deep into the evening, in some cases. And the remote work that everybody got thrown into immediately at the start of the pandemic did force everybody to start to think more about those types of interactions and put everybody on a level playing field.

And that was insightful. And that helped some people have stronger voices in the work that we were doing than they maybe did before. And it’s also led businesses really across the board, there’s a lot written about this, to be much more focused on making sure that participants from those who may be remote at home, may be in the office, may be in different offices, may be in different time zones, are all able to participate and collaborate on really a level playing field. And that is a positive. That’s a good thing.

Megan: Yeah. There are absolutely some positive side effects there, aren’t there? And it inspired you, Brian, to look at this more closely. And you’ve done a study that shows poor audio quality can actually affect the perception of listeners. So, I wonder what prompted the study, in particular. And what kinds of data did you gather? What methodology did you use?

Brian: Yeah. The motivation for this study was actually a real-world experience, just like we’ve been talking about. In addition to all of our classes moving online with no notice whatsoever, the same thing was true of our departmental faculty meetings. Very early on in the pandemic, we had one of these meetings. And we were talking about some contentious issue about hiring or whatever. And two of my colleagues, who I’d known very well and for many, many years, spoke up to offer their opinions. And one of these colleagues is someone who I’m very close with. We almost always see eye to eye. He was actually a former graduate student of mine once upon a time. And we almost always see eye to eye on things. He happened to be participating in that meeting from an old not-so-hot laptop. His audio quality had that sort of familiar tinny quality that we’re all familiar with. I could totally understand everything he was saying, but I found myself just being a little skeptical.

I didn’t find his points so compelling as usual. Meanwhile, I had another colleague, someone who I deeply respect, I’ve collaborated with, but we don’t always see eye to eye on these things. And he was participating in this first virtual faculty meeting from his home recording studio. Erik, I don’t know if his equipment would be up to your level or not, but he sounded better than real life. He sounded like he was all around us. And I found myself just sort of naturally agreeing with his points, which sort of was notable and a little surprising in that context. And so, we turned this into a study.

We played people a number of short audio clips, maybe like 30 seconds or so. And we had these being played in the context of very familiar situations and decisions. One of them might be like a hiring decision. You would have to listen to this person telling you why they think they might be a good fit for your job. And then afterwards, you had to make a simple judgment. It might be of a trait. How intelligent did that person seem? Or it might be a real-world decision like, “Hey, based on this, how likely would you be to pursue trying to hire them?” And critically, we had people listen to exactly the same sort of scripts, but with a little bit of work behind the scenes to affect the audio quality. In one case, the audio sounded crisp and clear. Recorded with a decent microphone. And here’s what it sounded like.

Audio Clip: After eight years in sales, I’m currently seeking a new challenge which will utilize my meticulous attention to detail and friendly professional manner. I’m an excellent fit for your company and will be an asset to your team as a senior sales manager.

Brian: Okay. Whatever you think of the content of that message, at least it’s nice and clear. Other subjects listened to exactly the same recording. But again, it had that sort of tinny quality that we’re all familiar with when people’s voices are filtered through a microphone or a recording setup that’s not so hot. That sounded like this.

Audio Clip: After eight years in sales, I’m currently seeking a new challenge which will utilize my meticulous attention to detail and friendly professional manner. I’m an excellent fit for your company and will be an asset to your team as a senior sales manager.

Brian: All right. Now, the thing that I hope you can get from that recording there is that although it clearly has this what we would call, as a technical term, a disfluent sound, it’s just a little harder to process, you are ultimately successful, right? Megan, Erik, you were able to understand the words in that second recording.

Megan: Yeah.

Erik: Mm-hmm.

Brian: And we made sure this was true for all of our subjects. We had them do word-for-word transcription after they made these judgments. And I’ll also just point out that this kind of manipulation clearly can’t be about the person themselves, right? You couldn’t make your voices sound like that in real world conversation if you tried. Voices just don’t do those sorts of things. Nevertheless, in a way that sort of didn’t make sense, that was kind of irrational because this couldn’t reflect the person, this affected all sorts of judgments about people.

So, people were judged to be about 8% less hirable. They were judged to be about 8% less intelligent. We also did this in other contexts. We did this in the context of dateability as if you were listening to a little audio clip from someone who was maybe interested in dating you, and then you had to make a judgment of how likely would you be to date this person. Same exact result. People were a little less datable when their audio was a little more tinny, even though they were completely understandable.

The experiment, the result that I thought was in some ways most striking is one of the clips was about someone who had been in a car accident. It was a little narrative about what had happened in the car accident. And they were talking as if to the insurance agent. They were saying, “Hey, it wasn’t my fault. This is what happened.” And afterwards, we simply had people make a natural intuitive judgment of how credible do you think the person’s story was. And when it was recorded with high-end audio, these messages were judged to be about 8% more credible in this context. So those are our experiments. What it shows really is something about the power of perception. We know that that sort of sound doesn’t reflect the people themselves, but we really just can’t stop ourselves from having those impressions made. And I don’t know about you guys, but, Erik, I think you’re right, that we all understand intuitively that if we’re having difficulty being understood while we’re talking, then that’s bad. But we sort of think that as long as you can make out the words I’m saying, then that’s probably all fine. And this research showed in a somewhat surprising way to a surprising degree that this is not so.

Megan: It’s absolutely fascinating.

Erik: Wow.

Megan: From an industry perspective, Erik, what are your thoughts on those study results? Did it surprise you as well?

Erik: No, like I said, I found it very, very gratifying because we invest a lot in trying to make sure that people understand the importance of quality audio, but we kind of come about that intuitively. Our entire company is audio people. So of course, we think that. And it’s our mission to help other people achieve those higher levels of audio in everything that they do, whether you’re a minister at a church or you’re teaching a class or you’re performing on stage. When I first saw in the news about Brian’s study, I think it was the NPR article that just came up in one of my feeds. I read it and it made me feel like my life’s work has been validated to some extent. I wouldn’t say we were surprised by it, but iIt made a lot of sense to us. Let’s put it that way.

Megan: And how-

Brian: This is what we’re hearing. Oh, sorry. Megan, I was going to say this is what we’re hearing from a lot of the audio professionals as they’re saying, “Hey, you scientists, you finally caught up to us.” But of course-

Erik: I wouldn’t say it that way, Brian.

Brian: Erik, you’re in an unusual circumstance because you guys think about audio every day. When we’re on Zoom, look, I can see the little rectangle as well as you can. I can see exactly how I look like. I can check the lighting. I check my hair. We all do that every day. But I would say most people really, they use whatever microphone came with their setup, and never give a second thought to what they sound like because they don’t know what they sound like.

Megan: Yeah. Absolutely.

Erik: Absolutely.

Megan: Avoid listening to yourself back as well. I think that’s common. We don’t scrutinize audio as much as we should. I wonder, Erik, since the study came out, how are you seeing that research play out across industry? Can you talk a bit about the importance of strong, clear audio in today’s virtual world and the challenges that companies and employees are facing as well?

Erik: Yeah. Sure, Megan. That’s a great question. And studies kind of back this up, businesses understand that collaboration is the key to many things that we do. They know that that’s critical. And they are investing in making the experiences for the people at work better because of that knowledge, that intuitive understanding. But there are challenges. It can be expensive. You need solutions that people who are going to walk into a room or join a meeting on their personal device, that they’re motivated to use and that they can use because they’re simple. You also have to overcome the barriers to investment. We in the AV industry have had to look a lot at how can we bring down the overall cost of ownership of setting up AV technology because, as we’ve seen, the prices of everything that goes into making a product are not coming down.

Simplifying deployment and management is critical. Beyond just audio technology, IoT technology and cloud technology for IT teams to be able to easily deploy and manage classrooms across an entire university campus or conference rooms across a global enterprise are really, really critical. And those are quickly evolving. And integrations with more standard common IT tools are coming out. And that’s one area. Another thing is just for the end user, having the same user interface in each conference room that is familiar to everyone from their personal devices is also important. For many, many years, a lot of people had the experience where, “Hey, it’s time we’re going to actually do a conference meeting.” And you might have a few rooms in your company or in your office area that could do that. And you walk into the meeting room. And how long does it take you to actually get connected to the people you’re going to talk with?

There was always a joke that you’d have to spend the first 15 minutes of a meeting working all of that out. And that’s because the technology was fragmented and you had to do a lot of custom work to make that happen. But these days, I would say platforms like Zoom and Teams and Google and others are doing a really great job with this. If you have the latest and greatest in your meeting rooms and you know how to join from your own personal device, it’s basically the same experience. And that is streamlining the process for everyone. Bringing down the costs of owning it so that companies can get to those benefits to collaboration is kind of the key.

Megan: I was going to ask if we could dive a little deeper into that kind of audio quality, the technological advancements that AI has made possible, which you did touch on slightly there, Erik. What are the most significant advancements, in your view? And how are those impacting the ways we use audio and the things we can do with it?

Erik: Okay. Let me try to break that down into-

Megan: That’s a big question. Sorry.

Erik: … a couple different sections. Yeah. No, and one that’s just so exciting. Machine-learning-based digital signal processing, or DSP, is here and is the norm now. If you think about the beginning of telephones and teleconferencing, just going way back, one of the initial problems you had whenever you tried to get something out of a dedicated handset onto a table was echo. And I’m sure we’ve all heard that at some point in our life. You need to have a way to cancel echo. But by the way, you also want people to be able to speak at the same time on both ends of a call. You get to some of those very rudimentary things. Machine learning is really supercharging those algorithms to provide better performance with fewer trade-offs, fewer artifacts in the actual audio signal.

Noise reduction has come a long way. I mentioned earlier on, keyboard sounds and the sounds of people eating, and how you just don’t hear that anymore, at least I don’t when I’m on conference calls. But only a few years ago, that could be a major problem. The machine-learning-trained digital signal processing is in the market now and it’s doing a better job than ever in removing things that you don’t want from your sound. We have a new de-verberation algorithm, so if you have a reverberant room with echoes and reflections that’s getting into the audio signal, that can degrade the experience there. We can remove that now. Another thing, the flip side of that is that there’s also a focus on isolating the sound that you do want and the signal that you do want.

Microsoft has rolled out a voice print feature in Teams that allows you, if you’re willing, to provide them with a sample of your voice. And then whenever you’re talking from your device, it will take out anything else that the microphone may be picking up so that even if you’re in a really noisy environment outdoors or, say, in an airport, the people that you’re speaking with are going to hear you and only you. And it’s pretty amazing as well. So those are some of the things that are happening today and are available today.

Another thing that’s emerged from all of this is we’ve been talking about how important audio quality is to the people participating in a discussion, the people speaking, the people listening, how everyone is perceived, but a new consumer, if you will, of audio in a discussion or a meeting has emerged, and that is in the form of the AI agent that can summarize meetings and create action plans, do those sorts of things. But for it to work, a clean transcription of what was said is already table stakes. It can’t garbled. It can’t miss key things. It needs to get it word for word, sentence for sentence throughout the entire meeting. And the ability to attribute who said what to the meeting participants, even if they’re all in the same room, is quickly upon us. And the ability to detect and integrate sentiment and emotion of the participants is going to become very important as well for us to really get the full value out of those kinds of AI agents.

So audio quality is as important as ever for humans, as Brian notes, in some ways more important because this is now the normal way that we talk and meet, but it’s also critical for AI agents to work properly. And it’s different, right? It’s a different set of considerations. And there’s a lot of emerging thought and work that’s going into that as well. And boy, Megan, there’s so much more we could say about this beyond meetings and video conferences. AI tools to simplify the production process. And of course, there’s generative AI of music content. I know that’s beyond the scope of what we’re talking about. But it’s really pretty incredible when you look around at the work that’s happening and the capabilities that are emerging.

Megan: Yeah. Absolutely. Sounds like there are so many elements to consider and work going on. It’s all fascinating. Brian, what kinds of emerging capabilities and use cases around AI and audio quality are you seeing in your lab as well?

Brian: Yeah. Well, I’m sorry that Brian himself was not able to be here today, but I’m an AI agent.

Megan: You got me for a second there.

Brian: Just kidding. The fascinating thing that we’re seeing from the lab, from the study of people’s impressions is that all of this technology that Erik has described, when it works best, it’s completely invisible. Erik, I loved your point about not hearing potato chips being eaten or rain in the background or something like that. You’re totally right. I used to notice that all the time. I don’t think I’ve noticed that recently, but I also didn’t notice that I haven’t noticed that recently, right? It just kind of disappears. The interesting thing about these perceptual impressions, we’re constantly drawing intuitive conclusions about people based on how they sound. And that might be a good thing or a bad thing when we’re judging things like trustworthiness, for example, on the basis of a short audio clip.

But clearly, some of these things are valid, right? We can judge the size of someone or even of an animal based on how they sound, right? A chihuahua can’t make the sound of a lion. A lion can’t make the sound of a chihuahua. And that’s always been true because we’re producing audio signals that go right into each other’s ears. And now, of course, everything that Erik is talking about, that’s not true. It goes through all of these different layers of technology increasingly fueled by AI. But when that technology works the best way, it’s as if it isn’t there at all and we’re just hearing each other directly.

Erik: That’s the goal, right? That it’s seamless open communication and we don’t have to think about the technology anymore.

Brian: It’s a tough business to be in, I think, though, Erik, because people have to know what’s going on behind the surface in order to value it. Otherwise, we just expect it to work.

Erik: Well, that’s why we try to put the logo of our products on the side of them so they show up in the videos. But yeah, it’s a good point.

Brian: Very good. Very good.

Erik: Yeah.

Megan: And we’ve talked about virtual meetings and conversations quite a bit, but there’s also streamed and recorded content, which are increasingly important at work as well. I wondered, Erik, if you could talk a bit about how businesses are leveraging audio in new ways for things like marketing campaigns and internal upskilling and training and areas like that?

Erik: Yeah. Well, one of the things I think we’ve all seen in marketing is that not everything is a high production value commercial anymore. And there’s still a place for that, for sure. But people tend to trust influencers that they follow. People search on TikTok, on YouTube for topics. Those can be the place that they start. And as technology’s gotten more accessible, not just audio, but of course, the video technology too, content creators can produce satisfying content on their own or with just a couple of people with them. And Brian’s study shows that it doesn’t really matter what the origins of the content are for it to be compelling.

For the person delivering the message to be compelling, the audio quality does have to hit a certain level. But because the tools are simpler to use and you need less things to connect and pull together a decent production system, creator-driven content is becoming even more and more integral to a marketing campaign. And so not just what they maybe post on their Instagram page or post on LinkedIn, for example, but us as a brand being able to take that content and use that actually in paid media and things like that is all entirely possible because of the overall quality of the content. So that’s something that’s been a trend that’s been in process really, I would say, maybe since the advent of podcasts. But it’s been an evolution. And it’s come a long, long way.

Another thing, and this is really interesting, and this hits home personally, but I remember when I first entered the workforce, and I hope I’m not showing my age too badly here, but I remember the word processing department. And you would write down on a piece of paper, like a memo, and you would give it to the word processing department and somebody would type it up for you. That was a thing. And these days, we’re seeing actually more and more video production with audio, of course, transfer to the actual producers of the content.

In my company, at Shure, we make videos for different purposes to talk about different initiatives or product launches or things that we’re doing just for internal use. And right now, everybody, including our CEO, she makes these videos just at her own desk. She has a little software tool and she can show a PowerPoint and herself and speak to things. And with very, very limited amount of editing, you can put that out there. And I’ve seen friends and colleagues at other companies in very high-level roles just kind of doing their own production. Being able to buy a very high quality microphone with really advanced signal processing built right in, but just plug it in via USB and have it be handled as simply as any consumer device, has made it possible to do really very useful production where you are going to actually sound good and get your message across, but without having to make such a big production out of it, which is kind of cool.

Megan: Yeah. Really democratizes access to sort of creating high quality content, doesn’t it? And of course, no technology discussion is complete without a mention of return on investment, particularly nowadays. Erik, what are some ways companies can get returns on their audio tech investments as well? Where are the most common places you see cost savings?

Erik: Yeah. Well, we collaborated on a study with IDC Research. And they came up with some really interesting findings on this. And one of them was, no surprise, two-thirds or more of companies have taken action on improving their communication and collaboration technology, and even more have additional or initial investments still planned. But the ROI of those initiatives isn’t really tied to the initiative itself. It’s not like when you come out with a new product, you look at how that product performs, and that’s the driver of your ROI. The benefits of smoother collaboration come in the form of shorter meetings, more productive meetings, better decision-making, faster decision-making, stronger teamwork. And so to build an ROI model, what IDC concluded was that you have to build your model to account for those advantages really across the enterprise or across your university, or whatever it may be, and kind of up and down the different set of activities where they’re actually going to be utilized.

So that can be complex. Quantifying things can always be a challenge. But like I said, companies do seem to understand this. And I think that’s because, this is just my hunch, but because everybody, including the CEO and the CFO and the whole finance department, uses and benefits from collaboration technology too. Perhaps that’s one reason why the value is easier to convey. Even if they have not taken the time to articulate things like we’re doing here today, they know when a meeting is good and when it’s not good. And maybe that’s one of the things that’s helping companies to justify these investments. But it’s always tricky to do ROI on projects like that. But again, focusing on the broader benefits of collaboration and breaking it down into what it means for specific activities and types of meetings, I think, is the way to go about doing that.

Megan: Absolutely. And Brian, what kinds of advancements are you seeing in the lab that perhaps one day might contribute to those cost savings?

Brian: Well, I don’t know anything about cost savings, Megan. I’m a college professor. I live a pure life of the mind.

Megan: Of course.

Brian: ROI does not compute for me. No, I would say we are in an extremely exciting frontier right now because of AI and many different technologies. The studies that we talked about earlier, in one sense, they were broad. We explored many different traits from dating to hiring to credibility. And we isolated them in all sorts of ways we didn’t talk about. We showed that it wasn’t due to overall affect or pessimism or something like that. But in those studies, we really only tested one very particular set of dimensions along which an audio signal can vary, which is some sort of model of clarity. But in reality, the audio signal is so multi-dimensional. And as we’re getting more and more tools these days, we can not only change audio along the lines of clarity, as we’ve been talking about, but we can potentially manipulate it in all sorts of ways.

We’re very interested in pushing these studies forward and in exploring how people’s sort of brute impressions that they make are affected by all sorts of things. Meg and Erik, we walk around the world all the time making these judgments about people, right? You meet someone and you’re like, “Wow, I could really be friends with them. They seem like a great person.” And you know that you’re making that judgment, but you have no idea why, right? It just seems kind of intuitive. Well, in an audio signal, when you’re talking to someone, you can think of, “What if their signal is more bass heavy? What if it’s a little more treble heavy? What if we manipulate it in this way? In that way?”

When we talked about the faculty meeting that motivated this whole research program, I mentioned that my colleague, who was speaking from his home recording studio, he actually didn’t sound clear like in real life. He sounded better than in real life. He sounded like he was all around us. What is the implication of that? I think there’s so many different dimensions of an audio signal that we’re just being able to readily control and manipulate that it’s going to be very exciting to see how all of these sorts of things impact our impressions of each other.

Megan: And there may be some overlap with this as well, but I wondered if we could close with a future forward look, Brian. What are you looking forward to in emerging audio technology? What are some exciting opportunities on the horizon, perhaps related to what you were just talking about there?

Brian: Well, we’re interested in studying this from a scientific perspective. Erik, you talked about how when you started. When I started doing this science, we didn’t have a word processing department. We had a stone tablet department. But I hear tell that the current generation, when they send photos back and forth to each other, that they, as a matter, of course, they apply all sorts of filters-

Erik: Oh, yes.

Brian: … to those video signals, those video or just photographic signals. We’re all familiar with that. That hasn’t quite happened with the audio signals yet, but I think that’s coming up as well. You can imagine that you record yourself saying a little message and then you filter it this way or that way. And that’s going to become the Wild West about the kinds of impressions we make on each other, especially if and when you don’t know that those filters have been operating in the first place.

Megan: That’s so interesting. Erik, what are you looking forward to in audio technology as well?

Erik: Well, I’m still thinking about what Brian said.

Megan: Yeah. That’s-

Erik: That’s very interesting.

Megan: It’s terrifying.

Erik: I have to go back again. I’ll go back to the past, maybe 15 to 20 years. And I remember at work, we had meeting rooms with the Starfish phones in the middle of the table. And I remember that we would have international meetings with our partners there that were selling our products in different countries, including in Japan and in China, and the people actually in our own company in those countries. We knew the time zone was bad. And we knew that English wasn’t their native language, and tried to be as courteous as possible with written materials and things like that. But I went over to China, and I had to actually be on the other end of one of those calls. And I’m a native English speaker, or at least a native Chicago dialect of American English speaker. And really understanding how challenging it was for them to participate in those meetings just hit me right between the eyes.

We’ve come so far, which is wonderful. But I think of a scenario, and this is not far off, there are many companies working on this right now, where not only can you get a real time captioning in your native language, no matter what the language of the participant, you can actually hear the person who’s speaking’s voice manipulated into your native language.

I’m never going to be a fluent Japanese or Chinese speaker, that’s for sure. But I love the thought that I could actually talk with people and they could understand me as though I were speaking their native language, and that they could communicate to me and I could understand them in the way that they want to be understood. I think there’s a future out there where this technology can really be something that helps bring people together. Now that we have so many years of history with the internet, we know there’s usually two sides to the coin of technology, but there’s definitely going to be a positive side to this, and I’m really looking forward to it.

Megan: Gosh, that sounds absolutely fascinating. Thank you both so much for such an interesting discussion.

That was Erik Vaveris, the VP of product management and chief marketing officer at Shure, and Brian Scholl, director of the Perception & Cognition Laboratory at Yale University, whom I spoke with from Brighton in England.

That’s it for this episode of Business Lab. I’m your host, Megan Tatum. I’m a contributing editor at Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology. And you can find us in print on the web and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

This show is available wherever you get your podcasts. If you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review. And this episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Why chatbots are starting to check your age

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How do tech companies check if their users are kids?

This question has taken on new urgency recently thanks to growing concern about the dangers that can arise when children talk to AI chatbots. For years Big Tech asked for birthdays (that one could make up) to avoid violating child privacy laws, but they weren’t required to moderate content accordingly. Two developments over the last week show how quickly things are changing in the US and how this issue is becoming a new battleground, even among parents and child-safety advocates.

In one corner is the Republican Party, which has supported laws passed in several states that require sites with adult content to verify users’ ages. Critics say this provides cover to block anything deemed “harmful to minors,” which could include sex education. Other states, like California, are coming after AI companies with laws to protect kids who talk to chatbots (by requiring them to verify who’s a kid). Meanwhile, President Trump is attempting to keep AI regulation a national issue rather than allowing states to make their own rules. Support for various bills in Congress is constantly in flux.

So what might happen? The debate is quickly moving away from whether age verification is necessary and toward who will be responsible for it. This responsibility is a hot potato that no company wants to hold.

In a blog post last Tuesday, OpenAI revealed that it plans to roll out automatic age prediction. In short, the company will apply a model that uses factors like the time of day, among others, to predict whether a person chatting is under 18. For those identified as teens or children, ChatGPT will apply filters to “reduce exposure” to content like graphic violence or sexual role-play. YouTube launched something similar last year. 

If you support age verification but are concerned about privacy, this might sound like a win. But there’s a catch. The system is not perfect, of course, so it could classify a child as an adult or vice versa. People who are wrongly labeled under 18 can verify their identity by submitting a selfie or government ID to a company called Persona. 

Selfie verifications have issues: They fail more often for people of color and those with certain disabilities. Sameer Hinduja, who co-directs the Cyberbullying Research Center, says the fact that Persona will need to hold millions of government IDs and masses of biometric data is another weak point. “When those get breached, we’ve exposed massive populations all at once,” he says. 

Hinduja instead advocates for device-level verification, where a parent specifies a child’s age when setting up the child’s phone for the first time. This information is then kept on the device and shared securely with apps and websites. 

That’s more or less what Tim Cook, the CEO of Apple, recently lobbied US lawmakers to call for. Cook was fighting lawmakers who wanted to require app stores to verify ages, which would saddle Apple with lots of liability. 

More signals of where this is all headed will come on Wednesday, when the Federal Trade Commission—the agency that would be responsible for enforcing these new laws—is holding an all-day workshop on age verification. Apple’s head of government affairs, Nick Rossi, will be there. He’ll be joined by higher-ups in child safety at Google and Meta, as well as a company that specializes in marketing to children.

The FTC has become increasingly politicized under President Trump (his firing of the sole Democratic commissioner was struck down by a federal court, a decision that is now pending review by the US Supreme Court). In July, I wrote about signals that the agency is softening its stance toward AI companies. Indeed, in December, the FTC overturned a Biden-era ruling against an AI company that allowed people to flood the internet with fake product reviews, writing that it clashed with President Trump’s AI Action Plan.

Wednesday’s workshop may shed light on how partisan the FTC’s approach to age verification will be. Red states favor laws that require porn websites to verify ages (but critics warn this could be used to block a much wider range of content). Bethany Soye, a Republican state representative who is leading an effort to pass such a bill in her state of South Dakota, is scheduled to speak at the FTC meeting. The ACLU generally opposes laws requiring IDs to visit websites and has instead advocated for an expansion of existing parental controls.

While all this gets debated, though, AI has set the world of child safety on fire. We’re dealing with increased generation of child sexual abuse material, concerns (and lawsuits) about suicides and self-harm following chatbot conversations, and troubling evidence of kids’ forming attachments to AI companions. Colliding stances on privacy, politics, free expression, and surveillance will complicate any effort to find a solution. Write to me with your thoughts. 

Inside OpenAI’s big play for science 

In the three years since ChatGPT’s explosive debut, OpenAI’s technology has upended a remarkable range of everyday activities at home, at work, in schools—anywhere people have a browser open or a phone out, which is everywhere.

Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them.

The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community.

And yet OpenAI is also late to the party. Google DeepMind, the rival firm behind groundbreaking scientific models such as AlphaFold and AlphaEvolve, has had an AI-for-science team for years. (When I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that team, he told me: “This is the reason I started DeepMind … In fact, it’s why I’ve worked my whole career in AI.”)

So why now? How does a push into science fit with OpenAI’s wider mission? And what exactly is the firm hoping to achieve?

I put these questions to Kevin Weil, a vice president at OpenAI who leads the new OpenAI for Science team, in an exclusive interview last week.

On mission

Weil is a product guy. He joined OpenAI a couple of years ago as chief product officer after being head of product at Twitter and Instagram. But he started out as a scientist. He got two-thirds of the way through a PhD in particle physics at Stanford University before ditching academia for the Silicon Valley dream. Weil is keen to highlight his pedigree: “I thought I was going to be a physics professor for the rest of my life,” he says. “I still read math books on vacation.”

Asked how OpenAI for Science fits with the firm’s existing lineup of white-collar productivity tools or the viral video app Sora, Weil recites the company mantra: “The mission of OpenAI is to try and build artificial general intelligence and, you know, make it beneficial for all of humanity.”

Just imagine the future impact this technology could have on science he says: New medicines, new materials, new devices. “Think about it helping us understand the nature of reality, helping us think through open problems. Maybe the biggest, most positive impact we’re going to see from AGI will actually be from its ability to accelerate science.”

He adds: “With GPT-5, we saw that becoming possible.” 

As Weil tells it, LLMs are now good enough to be useful scientific collaborators. They can spitball ideas, suggest novel directions to explore, and find fruitful parallels between new problems and old solutions published in obscure journals decades ago or in foreign languages.

That wasn’t the case a year or so ago. Since it announced its first so-called reasoning model—a type of LLM that can break down problems into multiple steps and work through them one by one—in December 2024, OpenAI has been pushing the envelope of what the technology can do. Reasoning models have made LLMs far better at solving math and logic problems than they used to be. “You go back a few years and we were all collectively mind-blown that the models could get an 800 on the SAT,” says Weil.

But soon LLMs were acing math competitions and solving graduate-level physics problems. Last year, OpenAI and Google DeepMind both announced that their LLMs had achieved gold-medal-level performance in the International Math Olympiad, one of the toughest math contests in the world. “These models are no longer just better than 90% of grad students,” says Weil. “They’re really at the frontier of human abilities.”

That’s a huge claim, and it comes with caveats. Still, there’s no doubt that GPT-5, which includes a reasoning model, is a big improvement on GPT-4 when it comes to complicated problem-solving. Measured against an industry benchmark known as GPQA, which includes more than 400 multiple-choice questions that test PhD-level knowledge in biology, physics, and chemistry, GPT-4 scores 39%, well below the human-expert baseline of around 70%. According to OpenAI, GPT-5.2 (the latest update to the model, released in December) scores 92%. 

Overhyped

The excitement is evident—and perhaps excessive. In October, senior figures at OpenAI, including Weil, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians were quick to point out that in fact what GPT-5 appeared to have done was dig up existing solutions in old research papers, including at least one written in German. That was still useful, but it wasn’t the achievement OpenAI seemed to have claimed. Weil and his colleagues deleted their posts.

Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don’t spend time struggling on a problem that is already solved, that’s an acceleration all of its own.”

He plays down the idea that LLMs are about to come up with a game-changing new discovery. “I don’t think models are there yet,” he says. “Maybe they’ll get there. I’m optimistic that they will.”

But, he insists, that’s not the mission: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”

For Weil, the question is this: “Does science actually happen faster because scientists plus models can do much more, and do it more quickly, than scientists alone? I think we’re already seeing that.”

In November, OpenAI published a series of anecdotal case studies contributed by scientists, both inside and outside the company, that illustrated how they had used GPT-5 and how it had helped. “Most of the cases were scientists that were already using GPT-5 directly in their research and had come to us one way or another saying, ‘Look at what I’m able to do with these tools,’” says Weil.

The key things that GPT-5 seems to be good at are finding references and connections to existing work that scientists were not aware of, which sometimes sparks new ideas; helping scientists sketch mathematical proofs; and suggesting ways for scientists to test hypotheses in the lab.  

“GPT 5.2 has read substantially every paper written in the last 30 years,” says Weil. “And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields.”

“That’s incredibly powerful,” he continues. “You can always find a human collaborator in an adjacent field, but it’s difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter. And in addition to that, I can work with the model late at night—it doesn’t sleep—and I can ask it 10 things in parallel, which is kind of awkward to do to a human.”

Solving problems

Most of the scientists OpenAI reached out to back up Weil’s position.

Robert Scherrer, a professor of physics and astronomy at Vanderbilt University, only played around with ChatGPT for fun (“I used to it rewrite the theme song for Gilligan’s Island in the style of Beowulf, which it did very well,” he tells me) until his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, told him that GPT-5 had helped solve a problem he’d been working on.

Lupsasca gave Scherrer access to GPT-5 Pro, OpenAI’s $200-a-month premium subscription. “It managed to solve a problem that I and my graduate student could not solve despite working on it for several months,” says Scherrer.

It’s not perfect, he says: “GTP-5 still makes dumb mistakes. Of course, I do too, but the mistakes GPT-5 makes are even dumber.” And yet it keeps getting better, he says: “If current trends continue—and that’s a big if—I suspect that all scientists will be using LLMs soon.”

Derya Unutmaz, a professor of biology at the Jackson Laboratory, a nonprofit research institute, uses GPT-5 to brainstorm ideas, summarize papers, and plan experiments in his work studying the immune system. In the case study he shared with OpenAI, Unutmaz used GPT-5 to analyze an old data set that his team had previously looked at. The model came up with fresh insights and interpretations.  

“LLMs are already essential for scientists,” he says. “When you can complete analysis of data sets that used to take months, not using them is not an option anymore.”

Nikita Zhivotovskiy, a statistician at the University of California, Berkeley, says he has been using LLMs in his research since the first version of ChatGPT came out.

Like Scherrer, he finds LLMs most useful when they highlight unexpected connections between his own work and existing results he did not know about. “I believe that LLMs are becoming an essential technical tool for scientists, much like computers and the internet did before,” he says. “I expect a long-term disadvantage for those who do not use them.”

But he does not expect LLMs to make novel discoveries anytime soon. “I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own,” he says. “So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches.”

I also contacted a handful of scientists who are not connected to OpenAI.

Andy Cooper, a professor of chemistry at the University of Liverpool and director of the Leverhulme Research Centre for Functional Materials Design, is less enthusiastic. “We have not found, yet, that LLMs are fundamentally changing the way that science is done,” he says. “But our recent results suggest that they do have a place.”

Cooper is leading a project to develop a so-called AI scientist that can fully automate parts of the scientific workflow. He says that his team doesn’t use LLMs to come up with ideas. But the tech is starting to prove useful as part of a wider automated system where an LLM can help direct robots, for example.

“My guess is that LLMs might stick more in robotic workflows, at least initially, because I’m not sure that people are ready to be told what to do by an LLM,” says Cooper. “I’m certainly not.”

Making errors

LLMs may be becoming more and more useful, but caution is still key. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that had made its way into a scientific journal. “OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea—possibly the first peer-reviewed paper where an LLM generated the core contribution,” Oppenheim posted on X. “One small problem: GPT-5’s idea tests the wrong thing.”

He continued: “GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. Related-sounding, but different. It’s like asking for a COVID test, and the LLM cheerfully hands you a test for chickenpox.”

It is clear that a lot of scientists are finding innovative and intuitive ways to engage with LLMs. It is also clear that the technology makes mistakes that can be so subtle even experts miss them.

Part of the problem is the way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core issue is that LLMs are being trained to validate the user, while science needs tools that challenge us.” In an extreme case, one individual (who was not a scientist) was persuaded by ChatGPT into thinking for months that he’d invented a new branch of mathematics.

Of course, Weil is well aware of the problem of hallucination. But he insists that newer models are hallucinating less and less. Even so, focusing on hallucination might be missing the point, he says.

“One of my teammates here, an ex math professor, said something that stuck with me,” says Weil. “He said: ‘When I’m doing research, if I’m bouncing ideas off a colleague, I’m wrong 90% of the time and that’s kind of the point. We’re both spitballing ideas and trying to find something that works.’”

“That’s actually a desirable place to be,” says Weil. “If you say enough wrong things and then somebody stumbles on a grain of truth and then the other person seizes on it and says, ‘Oh, yeah, that’s not quite right, but what if we—’ You gradually kind of find your trail through the woods.”

This is Weil’s core vision for OpenAI for Science. GPT-5 is good, but it is not an oracle. The value of this technology is in pointing people in new directions, not coming up with definitive answers, he says.

In fact, one of the things OpenAI is now looking at is making GPT-5 dial down its confidence when it delivers a response. Instead of saying Here’s the answer, it might tell scientists: Here’s something to consider.

“That’s actually something that we are spending a bunch of time on,” says Weil. “Trying to make sure that the model has some sort of epistemological humility.”

Watching the watchers

Another thing OpenAI is looking at is how to use GPT-5 to fact-check GPT-5. It’s often the case that if you feed one of GPT-5’s answers back into the model, it will pick it apart and highlight mistakes.

“You can kind of hook the model up as its own critic,” says Weil. “Then you can get a workflow where the model is thinking and then it goes to another model, and if that model finds things that it could improve, then it passes it back to the original model and says, ‘Hey, wait a minute—this part wasn’t right, but this part was interesting. Keep it.’ It’s almost like a couple of agents working together and you only see the output once it passes the critic.”

What Weil is describing also sounds a lot like what Google DeepMind did with AlphaEvolve, a tool that wrapped the firms LLM, Gemini, inside a wider system that filtered out the good responses from the bad and fed them back in again to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.

OpenAI faces stiff competition from rival firms, whose own LLMs can do most, if not all, of the things it claims for its own models. If that’s the case, why should scientists use GPT-5 instead of Gemini or Anthropic’s Claude, families of models that are themselves improving every year? Ultimately, OpenAI for Science may be as much an effort to plant a flag in new territory as anything else. The real innovations are still to come. 

“I think 2026 will be for science what 2025 was for software engineering,” says Weil. “At the beginning of 2025, if you were using AI to write most of your code, you were an early adopter. Whereas 12 months later, if you’re not using AI to write most of your code, you’re probably falling behind. We’re now seeing those same early flashes for science as we did for code.”

He continues: “I think that in a year, if you’re a scientist and you’re not heavily using AI, you’ll be missing an opportunity to increase the quality and pace of your thinking.”

New Microsoft Retail AI Guide Echoes SEO

Microsoft published a playbook early this month to help retailers increase visibility in AI search, browsers, and assistants.

“A guide to AEO and GEO” (PDF) from the heads of Microsoft Shopping and Copilot, and Microsoft Advertising, includes and confirms actionable tips worth the read.

Microsoft’s new guide aims to help retailers increase AI visibility.

GEO vs. AEO

The rise of AI platforms has created a proliferation of ill-defined acronyms. The guide attempts to clarify two of them:

  • GEO. Generative engine optimization. “Optimizes content for generative AI search environments (like LLM-powered engines) to make it discoverable, trustworthy, and authoritative.”
  • AEO. Answer/Agentic Engine Optimization. “Optimizes content for AI agents and assistants (like Copilot or ChatGPT) so they can find, understand, and present answers effectively.”

I question the need for new acronyms, as the concepts have existed for years in traditional search engine optimization. “GEO” is synonymous with “EEAT” — Experience, Expertise, Authoritativeness, Trustworthiness — Google’s term for instructing human quality raters.

“AEO” is akin to optimizing for featured snippets in traditional search results.

The key difference is that GEO and AEO focus on a product’s pre-training data to impact exposure in AI answers.

And GEO extends beyond a site’s content to include external resources such as reviews, Reddit mentions, product-comparison articles, and similar.

Intent-driven product data

To me, the most useful part of the guide reinforces my article on optimizing product feeds for AI. Product feeds and on-page descriptions should clearly address use cases, such as shoes “best for day hikes above 40 degrees.”

The guide also recommends:

  • Product page titles that are detailed and descriptive,
  • Front-loading product descriptions with benefits: who it’s for, the problem it solves, and how it’s better,
  •  Q&As,
  • Comparison tables,
  • Detailed alt text for product images,
  • Complementary products that match the intent,
  • Transcripts for videos.

Social proof

The guide emphasizes the importance of factual entities such as verified customer reviews, certifications, sustainability badges, and partnerships. It warns against using exaggerated or unverifiable claims, stating, “AI systems penalize low-trust language.”

It advises applying social proof consistently across your site and all channels, and verifying any subjective claims about your business or product. For example, if you assert a product is the best in a category, include why, such as “according to [XYZ’s] tests.”

Structured data

Per the guide, structured data markup, such as Schema.org, is key for AI visibility.

However, I’ve seen no evidence to support that recommendation. The guide does not explain how LLMs use Schema. To my knowledge, AI training data does not store Schema markup, and AI bots crawl text-only content.

Yet for live searches, Schema may be helpful because traditional search engines support it, and LLMs rely on those platforms.

Nonetheless, the guide recommends:

  • Schema Types: Product, Offer, AggregateRating, Review, Brand, ItemList, and FAQ.
  • Dynamic fields: price, availability, color, size, SKU, GTIN, and dateModified.
  • ItemList markup for collections and category pages to clarify product groupings.

While helpful, Microsoft’s “A guide to AEO and GEO” doesn’t introduce anything new. The recommendations align with longstanding SEO tactics and reinforce the views of industry pros.

Why Google Gemini Has No Ads Yet: ‘Trust In Your Assistant’ via @sejournal, @MattGSouthern

Google DeepMind CEO Demis Hassabis said Google doesn’t have any current plans to introduce advertising into its Gemini AI assistant, citing unresolved questions about user trust.

Speaking at the World Economic Forum in Davos, Hassabis said AI assistants represent a different product than search. He believes Gemini should be built for users first.

“In the realm of assistants, if you think of the chatbot as an assistant that’s meant to be helpful and ideally in my mind, as they become more powerful, the kind of technology that works for you as the individual,” Hassabis said in an interview with Axios. “That’s what I’d like to see with these systems.”

He said no one in the industry has figured out how advertising fits into that model.

“There is a question about how does ads fit into that model, where you want to have trust in your assistant,” Hassabis said. “I think no one’s really got a full answer to that yet.”

When asked directly about Google’s plans, Hassabis said: “We don’t have any current plans to do it ourselves.”

What Hassabis Said About OpenAI

The comments came days after OpenAI said it plans to begin testing ads in ChatGPT in the coming weeks for logged-in adults in the U.S. on free and Go tiers.

Hassabis said he was “a little bit surprised they’ve moved so early into that.”

He acknowledged advertising has funded much of the consumer internet and can be useful to users when done well. But he warned that poor execution in AI assistants could damage user relationships.

“I think it can be done right, but it can also be done in a way that’s not good,” Hassabis said. “In the end, what we want to do is be the most useful we can be to our users.”

Search Is Different

Hassabis drew a line between AI assistants and search when discussing advertising.

When asked whether his comments applied to Google Search, where the company already shows ads in AI Overviews, he said the two products work differently.

“But there it’s completely different use case because you’ve already just like how it’s always worked with search, you’ve already, you know, we know what your intent is basically and so we can be helpful there,” Hassabis said. “That’s a very different construct.”

Google began rolling out ads in AI Overviews in October 2024 and has continued expanding them since. The company claims AI Overviews generate ad revenue equal to traditional search results.

Why This Matters

This is the second time in two months that a Google executive has said Gemini ads aren’t currently planned.

In December, Google Ads VP Dan Taylor disputed an Adweek report claiming the company had told advertisers to expect Gemini ads in 2026. Taylor called that report “inaccurate” and said Google has “no current plans” to monetize the Gemini app.

Hassabis’s comments reinforce that position but go further by explaining the reasoning. His “technology that works for you” framing suggests Google sees a tension between advertising and the assistant relationship it wants Gemini to build.

Looking Ahead

Google is comfortable expanding ads where user intent is explicit, like search queries triggering AI Overviews. The company is holding back where intent is less defined and the relationship is more personal.

How long Google maintains its current position depends in part on how users respond to advertising in rival assistants.


Featured Image: Screenshot from: youtube.com/@axios, January 2026.