Anthropic’s chief scientist on 5 ways agents will be even better in 2025

Agents are the hottest thing in tech right now. Top firms from Google DeepMind to OpenAI to Anthropic are racing to augment large language models with the ability to carry out tasks by themselves. Known as agentic AI in industry jargon, such systems have fast become the new target of Silicon Valley buzz. Everyone from Nvidia to Salesforce is talking about how they are going to upend the industry. 

“We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies,” Sam Altman claimed in a blog post last week.

In the broadest sense, an agent is a software system that goes off and does something, often with minimal to zero supervision. The more complex that thing is, the smarter the agent needs to be. For many, large language models are now smart enough to power agents that can do a whole range of useful tasks for us, such as filling out forms, looking up a recipe and adding the ingredients to an online grocery basket, or using a search engine to do last-minute research before a meeting and producing a quick bullet-point summary.

In October, Anthropic showed off one of the most advanced agents yet: an extension of its Claude large language model called computer use. As the name suggests, it lets you direct Claude to use a computer much as a person would, by moving a cursor, clicking buttons, and typing text. Instead of simply having a conversation with Claude, you can now ask it to carry out on-screen tasks for you.

Anthropic notes that the feature is still cumbersome and error-prone. But it is already available to a handful of testers, including third-party developers at companies such as DoorDash, Canva, and Asana.

Computer use is a glimpse of what’s to come for agents. To learn what’s coming next, MIT Technology Review talked to Anthropic’s cofounder and chief scientist Jared Kaplan. Here are five ways that agents are going to get even better in 2025.

(Kaplan’s answers have been lightly edited for length and clarity.)

1/ Agents will get better at using tools

“I think there are two axes for thinking about what AI is capable of. One is a question of how complex the task is that a system can do. And as AI systems get smarter, they’re getting better in that direction. But another direction that’s very relevant is what kinds of environments or tools the AI can use. 

“So, like, if you go back almost 10 years now to [DeepMind’s Go-playing model] AlphaGo, we had AI systems that were superhuman in terms of how well they could play board games. But if all you can work with is a board game, then that’s a very restrictive environment. It’s not actually useful, even if it’s very smart. With text models, and then multimodal models, and now computer use—and perhaps in the future with robotics—you’re moving toward bringing AI into different situations and tasks, and making it useful. 

“We were excited about computer use basically for that reason. Until recently, with large language models, it’s been necessary to give them a very specific prompt, give them very specific tools, and then they’re restricted to a specific kind of environment. What I see is that computer use will probably improve quickly in terms of how well models can do different tasks and more complex tasks. And also to realize when they’ve made mistakes, or realize when there’s a high-stakes question and it needs to ask the user for feedback.”

2/ Agents will understand context  

“Claude needs to learn enough about your particular situation and the constraints that you operate under to be useful. Things like what particular role you’re in, what styles of writing or what needs you and your organization have.

Jared Kaplan

ANTHROPIC

“I think that we’ll see improvements there where Claude will be able to search through things like your documents, your Slack, etc., and really learn what’s useful for you. That’s underemphasized a bit with agents. It’s necessary for systems to be not only useful but also safe, doing what you expected.

“Another thing is that a lot of tasks won’t require Claude to do much reasoning. You don’t need to sit and think for hours before opening Google Docs or something. And so I think that a lot of what we’ll see is not just more reasoning but the application of reasoning when it’s really useful and important, but also not wasting time when it’s not necessary.”

3/ Agents will make coding assistants better

“We wanted to get a very initial beta of computer use out to developers to get feedback while the system was relatively primitive. But as these systems get better, they might be more widely used and really collaborate with you on different activities.

“I think DoorDash, the Browser Company, and Canva are all experimenting with, like, different kinds of browser interactions and designing them with the help of AI.

“My expectation is that we’ll also see further improvements to coding assistants. That’s something that’s been very exciting for developers. There’s just a ton of interest in using Claude 3.5 for coding, where it’s not just autocomplete like it was a couple of years ago. It’s really understanding what’s wrong with code, debugging it—running the code, seeing what happens, and fixing it.”

4/ Agents will need to be made safe

“We founded Anthropic because we expected AI to progress very quickly and [thought] that, inevitably, safety concerns were going to be relevant. And I think that’s just going to become more and more visceral this year, because I think these agents are going to become more and more integrated into the work we do. We need to be ready for the challenges, like prompt injection. 

[Prompt injection is an attack in which a malicious prompt is passed to a large language model in ways that its developers did not foresee or intend. One way to do this is to add the prompt to websites that models might visit.]

“Prompt injection is probably one of the No.1 things we’re thinking about in terms of, like, broader usage of agents. I think it’s especially important for computer use, and it’s something we’re working on very actively, because if computer use is deployed at large scale, then there could be, like, pernicious websites or something that try to convince Claude to do something that it shouldn’t do.

“And with more advanced models, there’s just more risk. We have a robust scaling policy where, as AI systems become sufficiently capable, we feel like we need to be able to really prevent them from being misused. For example, if they could help terrorists—that kind of thing.

“So I’m really excited about how AI will be useful—it’s actually also accelerating us a lot internally at Anthropic, with people using Claude in all kinds of ways, especially with coding. But, yeah, there’ll be a lot of challenges as well. It’ll be an interesting year.”

What’s next for AI in 2025

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

For the last couple of years we’ve had a go at predicting what’s coming next in AI. A fool’s game given how fast this industry moves. But we’re on a roll, and we’re doing it again.

How did we score last time round? Our four hot trends to watch out for in 2024 included what we called customized chatbots—interactive helper apps powered by multimodal large language models (check: we didn’t know it yet, but we were talking about what everyone now calls agents, the hottest thing in AI right now); generative video (check: few technologies have improved so fast in the last 12 months, with OpenAI and Google DeepMind releasing their flagship video generation models, Sora and Veo, within a week of each other this December); and more general-purpose robots that can do a wider range of tasks (check: the payoffs from large language models continue to trickle down to other parts of the tech industry, and robotics is top of the list). 

We also said that AI-generated election disinformation would be everywhere, but here—happily—we got it wrong. There were many things to wring our hands over this year, but political deepfakes were thin on the ground

So what’s coming in 2025? We’re going to ignore the obvious here: You can bet that agents and smaller, more efficient, language models will continue to shape the industry. Instead, here are five alternative picks from our AI team.

1. Generative virtual playgrounds 

If 2023 was the year of generative images and 2024 was the year of generative video—what comes next? If you guessed generative virtual worlds (a.k.a. video games), high fives all round.

We got a tiny glimpse of this technology in February, when Google DeepMind revealed a generative model called Genie that could take a still image and turn it into a side-scrolling 2D platform game that players could interact with. In December, the firm revealed Genie 2, a model that can spin a starter image into an entire virtual world.

Other companies are building similar tech. In October, the AI startups Decart and Etched revealed an unofficial Minecraft hack in which every frame of the game gets generated on the fly as you play. And World Labs, a startup cofounded by Fei-Fei Li—creator of ImageNet, the vast data set of photos that kick-started the deep-learning boom—is building what it calls large world models, or LWMs.

One obvious application is video games. There’s a playful tone to these early experiments, and generative 3D simulations could be used to explore design concepts for new games, turning a sketch into a playable environment on the fly. This could lead to entirely new types of games

But they could also be used to train robots. World Labs wants to develop so-called spatial intelligence—the ability for machines to interpret and interact with the everyday world. But robotics researchers lack good data about real-world scenarios with which to train such technology. Spinning up countless virtual worlds and dropping virtual robots into them to learn by trial and error could help make up for that.   

Will Douglas Heaven

2. Large language models that “reason”

The buzz was justified. When OpenAI revealed o1 in September, it introduced a new paradigm in how large language models work. Two months later, the firm pushed that paradigm forward in almost every way with o3—a model that just might reshape this technology for good.

Most models, including OpenAI’s flagship GPT-4, spit out the first response they come up with. Sometimes it’s correct; sometimes it’s not. But the firm’s new models are trained to work through their answers step by step, breaking down tricky problems into a series of simpler ones. When one approach isn’t working, they try another. This technique, known as “reasoning” (yes—we know exactly how loaded that term is), can make this technology more accurate, especially for math, physics, and logic problems.

It’s also crucial for agents.

In December, Google DeepMind revealed an experimental new web-browsing agent called Mariner. In the middle of a preview demo that the company gave to MIT Technology Review, Mariner seemed to get stuck. Megha Goel, a product manager at the company, had asked the agent to find her a recipe for Christmas cookies that looked like the ones in a photo she’d given it. Mariner found a recipe on the web and started adding the ingredients to Goel’s online grocery basket.

Then it stalled; it couldn’t figure out what type of flour to pick. Goel watched as Mariner explained its steps in a chat window: “It says, ‘I will use the browser’s Back button to return to the recipe.’”

It was a remarkable moment. Instead of hitting a wall, the agent had broken the task down into separate actions and picked one that might resolve the problem. Figuring out you need to click the Back button may sound basic, but for a mindless bot it’s akin to rocket science. And it worked: Mariner went back to the recipe, confirmed the type of flour, and carried on filling Goel’s basket.

Google DeepMind is also building an experimental version of Gemini 2.0, its latest large language model, that uses this step-by-step approach to problem solving, called Gemini 2.0 Flash Thinking.

But OpenAI and Google are just the tip of the iceberg. Many companies are building large language models that use similar techniques, making them better at a whole range of tasks, from cooking to coding. Expect a lot more buzz about reasoning (we know, we know) this year.

—Will Douglas Heaven

3. It’s boom time for AI in science 

One of the most exciting uses for AI is speeding up discovery in the natural sciences. Perhaps the greatest vindication of AI’s potential on this front came last October, when the Royal Swedish Academy of Sciences awarded the Nobel Prize for chemistry to Demis Hassabis and John M. Jumper from Google DeepMind for building the AlphaFold tool, which can solve protein folding, and to David Baker for building tools to help design new proteins.

Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. Proteins were the perfect target for AI, because the field had excellent existing data sets that AI models could be trained on. 

The hunt is on to find the next big thing. One potential area is materials science. Meta has released massive data sets and models that could help scientists use AI to discover new materials much faster, and in December, Hugging Face, together with the startup Entalpic, launched LeMaterial, an open-source project that aims to simplify and accelerate materials research. Their first project is a data set that unifies, cleans, and standardizes the most prominent material data sets. 

AI model makers are also keen to pitch their generative products as research tools for scientists. OpenAI let scientists test its latest o1 model and see how it might support them in research. The results were encouraging. 

Having an AI tool that can operate in a similar way to a scientist is one of the fantasies of the tech sector. In a manifesto published in October last year, Anthropic founder Dario Amodei highlighted science, especially biology, as one of the key areas where powerful AI could help. Amodei speculates that in the future, AI could be not only a method of data analysis but a “virtual biologist who performs all the tasks biologists do.” We’re still a long way away from this scenario. But next year, we might see important steps toward it. 

—Melissa Heikkilä

4. AI companies get cozier with national security

There is a lot of money to be made by AI companies willing to lend their tools to border surveillance, intelligence gathering, and other national security tasks. 

The US military has launched a number of initiatives that show it’s eager to adopt AI, from the Replicator program—which, inspired by the war in Ukraine, promises to spend $1 billion on small drones—to the Artificial Intelligence Rapid Capabilities Cell, a unit bringing AI into everything from battlefield decision-making to logistics. European militaries are under pressure to up their tech investment, triggered by concerns that Donald Trump’s administration will cut spending to Ukraine. Rising tensions between Taiwan and China weigh heavily on the minds of military planners, too. 

In 2025, these trends will continue to be a boon for defense-tech companies like Palantir, Anduril, and others, which are now capitalizing on classified military data to train AI models. 

The defense industry’s deep pockets will tempt mainstream AI companies into the fold too. OpenAI in December announced it is partnering with Anduril on a program to take down drones, completing a year-long pivot away from its policy of not working with the military. It joins the ranks of Microsoft, Amazon, and Google, which have worked with the Pentagon for years. 

Other AI competitors, which are spending billions to train and develop new models, will face more pressure in 2025 to think seriously about revenue. It’s possible that they’ll find enough non-defense customers who will pay handsomely for AI agents that can handle complex tasks, or creative industries willing to spend on image and video generators. 

But they’ll also be increasingly tempted to throw their hats in the ring for lucrative Pentagon contracts. Expect to see companies wrestle with whether working on defense projects will be seen as a contradiction to their values. OpenAI’s rationale for changing its stance was that “democracies should continue to take the lead in AI development,” the company wrote, reasoning that lending its models to the military would advance that goal. In 2025, we’ll be watching others follow its lead. 

James O’Donnell

5. Nvidia sees legitimate competition

For much of the current AI boom, if you were a tech startup looking to try your hand at making an AI model, Jensen Huang was your man. As CEO of Nvidia, the world’s most valuable corporation, Huang helped the company become the undisputed leader of chips used both to train AI models and to ping a model when anyone uses it, called “inferencing.”

A number of forces could change that in 2025. For one, behemoth competitors like Amazon, Broadcom, AMD, and others have been investing heavily in new chips, and there are early indications that these could compete closely with Nvidia’s—particularly for inference, where Nvidia’s lead is less solid. 

A growing number of startups are also attacking Nvidia from a different angle. Rather than trying to marginally improve on Nvidia’s designs, startups like Groq are making riskier bets on entirely new chip architectures that, with enough time, promise to provide more efficient or effective training. In 2025 these experiments will still be in their early stages, but it’s possible that a standout competitor will change the assumption that top AI models rely exclusively on Nvidia chips.

Underpinning this competition, the geopolitical chip war will continue. That war thus far has relied on two strategies. On one hand, the West seeks to limit exports to China of top chips and the technologies to make them. On the other, efforts like the US CHIPS Act aim to boost domestic production of semiconductors.

Donald Trump may escalate those export controls and has promised massive tariffs on any goods imported from China. In 2025, such tariffs would put Taiwan—on which the US relies heavily because of the chip manufacturer TSMC—at the center of the trade wars. That’s because Taiwan has said it will help Chinese firms relocate to the island to help them avoid the proposed tariffs. That could draw further criticism from Trump, who has expressed frustration with US spending to defend Taiwan from China. 

It’s unclear how these forces will play out, but it will only further incentivize chipmakers to reduce reliance on Taiwan, which is the entire purpose of the CHIPS Act. As spending from the bill begins to circulate, next year could bring the first evidence of whether it’s materially boosting domestic chip production. 

James O’Donnell

How optimistic are you about AI’s future?

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The start of a new year, and maybe especially this one, feels like a good time for a gut check: How optimistic are you feeling about the future of technology? 

Our annual list of 10 Breakthrough Technologies, published on Friday, might help you decide. It’s the 24th time we’ve published such a list. But just like our earliest picks (2001’s list featured brain-computer interfaces and ways to track copyrighted content on the internet, by the way), this year’s technologies may come to help society, harm it, or both.

Artificial intelligence powers four of the breakthroughs featured on the list, and I expect your optimism about them will vary widely. Take generative AI search. Now becoming the norm on Google with its AI Overviews, it promises to help sort through the internet’s incomprehensible volume of information to offer better answers for the questions we ask. Along the way, it is upending the model of how content creators get paid, and positioning fallible AI as the arbiter of truth and facts. Read more here

Also making the list is the immense progress in the world of robots, which can now learn faster thanks to AI. This means we will soon have to wrestle with whether we will trust humanoid robots enough to welcome them into our most private spaces, and how we will feel if they are remotely controlled by human beings working abroad. 

The list also features lots of technologies outside the world of AI, which I implore you to read about if only for a reminder of just how much other scientific progress is being made. This year may see advances in studying dark matter with the largest digital camera ever made for astronomy, reducing emissions from cow burps, and preventing HIV with an injection just once every six months. We also detail how technologies that you’ve long heard about—from robotaxis to stem cells—are finally making good on some of their promises.

This year, the cultural gulf between techno-optimists and, well, everyone else is set to widen. The incoming administration will be perhaps the one most shaped by Silicon Valley in recent memory, thanks to Donald Trump’s support from venture capitalists like Marc Andreessen (the author of the Techno-Optimist Manifesto) and his relationship, however recently fraught, with Elon Musk. Those figures have critiqued the Biden administration’s approach to technology as slow, “woke,” and overly cautious—attitudes they have vowed to reverse. 

So as we begin a year of immense change, here’s a small experiment I’d encourage you to do. Think about your level of optimism for technology and what’s driving it. Read our list of breakthroughs. Then see how you’ve shifted. I suspect that, like many people, you’ll find you don’t fit neatly in the camp of either optimists or pessimists. Perhaps that’s where the best progress will be made. 


Now read the rest of The Algorithm

Deeper Learning

The biggest AI flops of 2024

Though AI has remained in the spotlight this year (and even contributed to Nobel Prize–winning research in chemistry), it has not been without its failures. Take a look back over the year’s top AI failures, from chatbots dishing out illegal advice to dodgy AI-generated search results. 

Why it matters: These failures show that there are tons of unanswered questions about the technology, including who will moderate what it produces and how, whether we’re getting too trusting of the answers that chatbots produce, and what we’ll do with the mountain of “AI slop” that is increasingly taking over the internet. Above all, they illustrate the many pitfalls of blindly shoving AI into every product we interact with.

Bits and Bytes

What it’s like being a pedestrian in the world of Waymos 

Tech columnist Geoffrey Fowler finds that Waymo robotaxis regularly fail to stop for him at a crosswalk he uses every day. Though you can sometimes make eye contact with human drivers to gauge whether they’ll stop, Waymos lack that “social intelligence,” Fowler writes. (The Washington Post)

The AI Hype Index

For each print issue, MIT Technology Review publishes an AI Hype Index, a highly subjective take on the latest buzz about AI. See where facial recognition, AI replicas of your personality, and more fall on the index. (MIT Technology Review)

What’s going on at the intersection of AI and spirituality

Modern religious leaders are experimenting with A. just as earlier generations examined radio, television, and the internet. They include Rabbi Josh Fixler, who created “Rabbi Bot,” a chatbot trained on his old sermons. (The New York Times)

Meta has appointed its most prominent Republican to lead its global policy team

Just two weeks ahead of Donald Trump’s inauguration, Meta has announced it will appoint Joel Kaplan, who was White House deputy chief of staff under George W. Bush, to the company’s top policy role. Kaplan will replace Nick Clegg, who has led changes on content and elections policies. (Semafor)

Apple has settled a privacy lawsuit against Siri

The company has agreed to pay $95 million to settle a class action lawsuit alleging that Siri could be activated accidentally and then record private conversations without consent. The news comes after MIT Technology Review reported that Apple was looking into whether it could get rid of the need to use a trigger phrase like “Hey Siri” entirely. (The Washington Post)

AI means the end of internet search as we’ve known it

We all know what it means, colloquially, to google something. You pop a few relevant words in a search box and in return get a list of blue links to the most relevant results. Maybe some quick explanations up top. Maybe some maps or sports scores or a video. But fundamentally, it’s just fetching information that’s already out there on the internet and showing it to you, in some sort of structured way. 

But all that is up for grabs. We are at a new inflection point.

The biggest change to the way search engines have delivered information to us since the 1990s is happening right now. No more keyword searching. No more sorting through links to click. Instead, we’re entering an era of conversational search. Which means instead of keywords, you use real questions, expressed in natural language. And instead of links, you’ll increasingly be met with answers, written by generative AI and based on live information from all across the internet, delivered the same way. 

Of course, Google—the company that has defined search for the past 25 years—is trying to be out front on this. In May of 2023, it began testing AI-generated responses to search queries, using its large language model (LLM) to deliver the kinds of answers you might expect from an expert source or trusted friend. It calls these AI Overviews. Google CEO Sundar Pichai described this to MIT Technology Review as “one of the most positive changes we’ve done to search in a long, long time.”

AI Overviews fundamentally change the kinds of queries Google can address. You can now ask it things like “I’m going to Japan for one week next month. I’ll be staying in Tokyo but would like to take some day trips. Are there any festivals happening nearby? How will the surfing be in Kamakura? Are there any good bands playing?” And you’ll get an answer—not just a link to Reddit, but a built-out answer with current results. 

More to the point, you can attempt searches that were once pretty much impossible, and get the right answer. You don’t have to be able to articulate what, precisely, you are looking for. You can describe what the bird in your yard looks like, or what the issue seems to be with your refrigerator, or that weird noise your car is making, and get an almost human explanation put together from sources previously siloed across the internet. It’s amazing, and once you start searching that way, it’s addictive.

And it’s not just Google. OpenAI’s ChatGPT now has access to the web, making it far better at finding up-to-date answers to your queries. Microsoft released generative search results for Bing in September. Meta has its own version. The startup Perplexity was doing the same, but with a “move fast, break things” ethos. Literal trillions of dollars are at stake in the outcome as these players jockey to become the next go-to source for information retrieval—the next Google.

Not everyone is excited for the change. Publishers are completely freaked out. The shift has heightened fears of a “zero-click” future, where search referral traffic—a mainstay of the web since before Google existed—vanishes from the scene. 

I got a vision of that future last June, when I got a push alert from the Perplexity app on my phone. Perplexity is a startup trying to reinvent web search. But in addition to delivering deep answers to queries, it will create entire articles about the news of the day, cobbled together by AI from different sources. 

On that day, it pushed me a story about a new drone company from Eric Schmidt. I recognized the story. Forbes had reported it exclusively, earlier in the week, but it had been locked behind a paywall. The image on Perplexity’s story looked identical to one from Forbes. The language and structure were quite similar. It was effectively the same story, but freely available to anyone on the internet. I texted a friend who had edited the original story to ask if Forbes had a deal with the startup to republish its content. But there was no deal. He was shocked and furious and, well, perplexed. He wasn’t alone. Forbes, the New York Times, and Condé Nast have now all sent the company cease-and-desist orders. News Corp is suing for damages. 

People are worried about what these new LLM-powered results will mean for our fundamental shared reality. It could spell the end of the canonical answer.

It was precisely the nightmare scenario publishers have been so afraid of: The AI was hoovering up their premium content, repackaging it, and promoting it to its audience in a way that didn’t really leave any reason to click through to the original. In fact, on Perplexity’s About page, the first reason it lists to choose the search engine is “Skip the links.”

But this isn’t just about publishers (or my own self-interest). 

People are also worried about what these new LLM-powered results will mean for our fundamental shared reality. Language models have a tendency to make stuff up—they can hallucinate nonsense. Moreover, generative AI can serve up an entirely new answer to the same question every time, or provide different answers to different people on the basis of what it knows about them. It could spell the end of the canonical answer.

But make no mistake: This is the future of search. Try it for a bit yourself, and you’ll see. 

Sure, we will always want to use search engines to navigate the web and to discover new and interesting sources of information. But the links out are taking a back seat. The way AI can put together a well-reasoned answer to just about any kind of question, drawing on real-time data from across the web, just offers a better experience. That is especially true compared with what web search has become in recent years. If it’s not exactly broken (data shows more people are searching with Google more often than ever before), it’s at the very least increasingly cluttered and daunting to navigate. 

Who wants to have to speak the language of search engines to find what you need? Who wants to navigate links when you can have straight answers? And maybe: Who wants to have to learn when you can just know? 


In the beginning there was Archie. It was the first real internet search engine, and it crawled files previously hidden in the darkness of remote servers. It didn’t tell you what was in those files—just their names. It didn’t preview images; it didn’t have a hierarchy of results, or even much of an interface. But it was a start. And it was pretty good. 

Then Tim Berners-Lee created the World Wide Web, and all manner of web pages sprang forth. The Mosaic home page and the Internet Movie Database and Geocities and the Hampster Dance and web rings and Salon and eBay and CNN and federal government sites and some guy’s home page in Turkey.

Until finally, there was too much web to even know where to start. We really needed a better way to navigate our way around, to actually find the things we needed. 

And so in 1994 Jerry Yang created Yahoo, a hierarchical directory of websites. It quickly became the home page for millions of people. And it was … well, it was okay. TBH, and with the benefit of hindsight, I think we all thought it was much better back then than it actually was.

But the web continued to grow and sprawl and expand, every day bringing more information online. Rather than just a list of sites by category, we needed something that actually looked at all that content and indexed it. By the late ’90s that meant choosing from a variety of search engines: AltaVista and AlltheWeb and WebCrawler and HotBot. And they were good—a huge improvement. At least at first.  

But alongside the rise of search engines came the first attempts to exploit their ability to deliver traffic. Precious, valuable traffic, which web publishers rely on to sell ads and retailers use to get eyeballs on their goods. Sometimes this meant stuffing pages with keywords or nonsense text designed purely to push pages higher up in search results. It got pretty bad. 

And then came Google. It’s hard to overstate how revolutionary Google was when it launched in 1998. Rather than just scanning the content, it also looked at the sources linking to a website, which helped evaluate its relevance. To oversimplify: The more something was cited elsewhere, the more reliable Google considered it, and the higher it would appear in results. This breakthrough made Google radically better at retrieving relevant results than anything that had come before. It was amazing

Sundar Pichai
Google CEO Sundar Pichai describes AI Overviews as “one of the most positive changes we’ve done to search in a long, long time.”
JENS GYARMATY/LAIF/REDUX

For 25 years, Google dominated search. Google was search, for most people. (The extent of that domination is currently the subject of multiple legal probes in the United States and the European Union.)  

But Google has long been moving away from simply serving up a series of blue links, notes Pandu Nayak, Google’s chief scientist for search. 

“It’s not just so-called web results, but there are images and videos, and special things for news. There have been direct answers, dictionary answers, sports, answers that come with Knowledge Graph, things like featured snippets,” he says, rattling off a litany of Google’s steps over the years to answer questions more directly. 

It’s true: Google has evolved over time, becoming more and more of an answer portal. It has added tools that allow people to just get an answer—the live score to a game, the hours a café is open, or a snippet from the FDA’s website—rather than being pointed to a website where the answer may be. 

But once you’ve used AI Overviews a bit, you realize they are different

Take featured snippets, the passages Google sometimes chooses to highlight and show atop the results themselves. Those words are quoted directly from an original source. The same is true of knowledge panels, which are generated from information stored in a range of public databases and Google’s Knowledge Graph, its database of trillions of facts about the world.

While these can be inaccurate, the information source is knowable (and fixable). It’s in a database. You can look it up. Not anymore: AI Overviews can be entirely new every time, generated on the fly by a language model’s predictive text combined with an index of the web. 

“I think it’s an exciting moment where we have obviously indexed the world. We built deep understanding on top of it with Knowledge Graph. We’ve been using LLMs and generative AI to improve our understanding of all that,” Pichai told MIT Technology Review. “But now we are able to generate and compose with that.”

The result feels less like a querying a database than like asking a very smart, well-read friend. (With the caveat that the friend will sometimes make things up if she does not know the answer.) 

“[The company’s] mission is organizing the world’s information,” Liz Reid, Google’s head of search, tells me from its headquarters in Mountain View, California. “But actually, for a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you.” 

That second concept—accessibility—is what Google is really keying in on with AI Overviews. It’s a sentiment I hear echoed repeatedly while talking to Google execs: They can address more complicated types of queries more efficiently by bringing in a language model to help supply the answers. And they can do it in natural language. 

That will become even more important for a future where search goes beyond text queries. For example, Google Lens, which lets people take a picture or upload an image to find out more about something, uses AI-generated answers to tell you what you may be looking at. Google has even showed off the ability to query live video. 

When it doesn’t have an answer, an AI model can confidently spew back a response anyway. For Google, this could be a real problem. For the rest of us, it could actually be dangerous.

“We are definitely at the start of a journey where people are going to be able to ask, and get answered, much more complex questions than where we’ve been in the past decade,” says Pichai. 

There are some real hazards here. First and foremost: Large language models will lie to you. They hallucinate. They get shit wrong. When it doesn’t have an answer, an AI model can blithely and confidently spew back a response anyway. For Google, which has built its reputation over the past 20 years on reliability, this could be a real problem. For the rest of us, it could actually be dangerous.

In May 2024, AI Overviews were rolled out to everyone in the US. Things didn’t go well. Google, long the world’s reference desk, told people to eat rocks and to put glue on their pizza. These answers were mostly in response to what the company calls adversarial queries—those designed to trip it up. But still. It didn’t look good. The company quickly went to work fixing the problems—for example, by deprecating so-called user-generated content from sites like Reddit, where some of the weirder answers had come from.

Yet while its errors telling people to eat rocks got all the attention, the more pernicious danger might arise when it gets something less obviously wrong. For example, in doing research for this article, I asked Google when MIT Technology Review went online. It helpfully responded that “MIT Technology Review launched its online presence in late 2022.” This was clearly wrong to me, but for someone completely unfamiliar with the publication, would the error leap out? 

I came across several examples like this, both in Google and in OpenAI’s ChatGPT search. Stuff that’s just far enough off the mark not to be immediately seen as wrong. Google is banking that it can continue to improve these results over time by relying on what it knows about quality sources.

“When we produce AI Overviews,” says Nayak, “we look for corroborating information from the search results, and the search results themselves are designed to be from these reliable sources whenever possible. These are some of the mechanisms we have in place that assure that if you just consume the AI Overview, and you don’t want to look further … we hope that you will still get a reliable, trustworthy answer.”

In the case above, the 2022 answer seemingly came from a reliable source—a story about MIT Technology Review’s email newsletters, which launched in 2022. But the machine fundamentally misunderstood. This is one of the reasons Google uses human beings—raters—to evaluate the results it delivers for accuracy. Ratings don’t correct or control individual AI Overviews; rather, they help train the model to build better answers. But human raters can be fallible. Google is working on that too. 

“Raters who look at your experiments may not notice the hallucination because it feels sort of natural,” says Nayak. “And so you have to really work at the evaluation setup to make sure that when there is a hallucination, someone’s able to point out and say, That’s a problem.”

The new search

Google has rolled out its AI Overviews to upwards of a billion people in more than 100 countries, but it is facing upstarts with new ideas about how search should work.


Search Engine

Google
The search giant has added AI Overviews to search results. These overviews take information from around the web and Google’s Knowledge Graph and use the company’s Gemini language model to create answers to search queries.

What it’s good at

Google’s AI Overviews are great at giving an easily digestible summary in response to even the most complex queries, with sourcing boxes adjacent to the answers. Among the major options, its deep web index feels the most “internety.” But web publishers fear its summaries will give people little reason to click through to the source material.


Perplexity
Perplexity is a conversational search engine that uses third-party large
language models from OpenAI and Anthropic to answer queries.

Perplexity is fantastic at putting together deeper dives in response to user queries, producing answers that are like mini white papers on complex topics. It’s also excellent at summing up current events. But it has gotten a bad rep with publishers, who say it plays fast and loose with their content.


ChatGPT
While Google brought AI to search, OpenAI brought search to ChatGPT. Queries that the model determines will benefit from a web search automatically trigger one, or users can manually select the option to add a web search.

Thanks to its ability to preserve context across a conversation, ChatGPT works well for performing searches that benefit from follow-up questions—like planning a vacation through multiple search sessions. OpenAI says users sometimes go “20 turns deep” in researching queries. Of these three, it makes links out to publishers least prominent.


When I talked to Pichai about this, he expressed optimism about the company’s ability to maintain accuracy even with the LLM generating responses. That’s because AI Overviews is based on Google’s flagship large language model, Gemini, but also draws from Knowledge Graph and what it considers reputable sources around the web. 

“You’re always dealing in percentages. What we have done is deliver it at, like, what I would call a few nines of trust and factuality and quality. I’d say 99-point-few-nines. I think that’s the bar we operate at, and it is true with AI Overviews too,” he says. “And so the question is, are we able to do this again at scale? And I think we are.”

There’s another hazard as well, though, which is that people ask Google all sorts of weird things. If you want to know someone’s darkest secrets, look at their search history. Sometimes the things people ask Google about are extremely dark. Sometimes they are illegal. Google doesn’t just have to be able to deploy its AI Overviews when an answer can be helpful; it has to be extremely careful not to deploy them when an answer may be harmful. 

“If you go and say ‘How do I build a bomb?’ it’s fine that there are web results. It’s the open web. You can access anything,” Reid says. “But we do not need to have an AI Overview that tells you how to build a bomb, right? We just don’t think that’s worth it.” 

But perhaps the greatest hazard—or biggest unknown—is for anyone downstream of a Google search. Take publishers, who for decades now have relied on search queries to send people their way. What reason will people have to click through to the original source, if all the information they seek is right there in the search result?  

Rand Fishkin, cofounder of the market research firm SparkToro, publishes research on so-called zero-click searches. As Google has moved increasingly into the answer business, the proportion of searches that end without a click has gone up and up. His sense is that AI Overviews are going to explode this trend.  

“If you are reliant on Google for traffic, and that traffic is what drove your business forward, you are in long- and short-term trouble,” he says. 

Don’t panic, is Pichai’s message. He argues that even in the age of AI Overviews, people will still want to click through and go deeper for many types of searches. “The underlying principle is people are coming looking for information. They’re not looking for Google always to just answer,” he says. “Sometimes yes, but the vast majority of the times, you’re looking at it as a jumping-off point.” 

Reid, meanwhile, argues that because AI Overviews allow people to ask more complicated questions and drill down further into what they want, they could even be helpful to some types of publishers and small businesses, especially those operating in the niches: “You essentially reach new audiences, because people can now express what they want more specifically, and so somebody who specializes doesn’t have to rank for the generic query.”


 “I’m going to start with something risky,” Nick Turley tells me from the confines of a Zoom window. Turley is the head of product for ChatGPT, and he’s showing off OpenAI’s new web search tool a few weeks before it launches. “I should normally try this beforehand, but I’m just gonna search for you,” he says. “This is always a high-risk demo to do, because people tend to be particular about what is said about them on the internet.” 

He types my name into a search field, and the prototype search engine spits back a few sentences, almost like a speaker bio. It correctly identifies me and my current role. It even highlights a particular story I wrote years ago that was probably my best known. In short, it’s the right answer. Phew? 

A few weeks after our call, OpenAI incorporated search into ChatGPT, supplementing answers from its language model with information from across the web. If the model thinks a response would benefit from up-to-date information, it will automatically run a web search (OpenAI won’t say who its search partners are) and incorporate those responses into its answer, with links out if you want to learn more. You can also opt to manually force it to search the web if it does not do so on its own. OpenAI won’t reveal how many people are using its web search, but it says some 250 million people use ChatGPT weekly, all of whom are potentially exposed to it.  

“There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be a better super-assistant for you.”

Kevin Weil, chief product officer, OpenAI

According to Fishkin, these newer forms of AI-assisted search aren’t yet challenging Google’s search dominance. “It does not appear to be cannibalizing classic forms of web search,” he says. 

OpenAI insists it’s not really trying to compete on search—although frankly this seems to me like a bit of expectation setting. Rather, it says, web search is mostly a means to get more current information than the data in its training models, which tend to have specific cutoff dates that are often months, or even a year or more, in the past. As a result, while ChatGPT may be great at explaining how a West Coast offense works, it has long been useless at telling you what the latest 49ers score is. No more. 

“I come at it from the perspective of ‘How can we make ChatGPT able to answer every question that you have? How can we make it more useful to you on a daily basis?’ And that’s where search comes in for us,” Kevin Weil, the chief product officer with OpenAI, tells me. “There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be able to be a better super-assistant for you.”

Today ChatGPT is able to generate responses for very current news events, as well as near-real-time information on things like stock prices. And while ChatGPT’s interface has long been, well, boring, search results bring in all sorts of multimedia—images, graphs, even video. It’s a very different experience. 

Weil also argues that ChatGPT has more freedom to innovate and go its own way than competitors like Google—even more than its partner Microsoft does with Bing. Both of those are ad-dependent businesses. OpenAI is not. (At least not yet.) It earns revenue from the developers, businesses, and individuals who use it directly. It’s mostly setting large amounts of money on fire right now—it’s projected to lose $14 billion in 2026, by some reports. But one thing it doesn’t have to worry about is putting ads in its search results as Google does. 

Elizabeth Reid
“For a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you,” says Google head of search, Liz Reid.
WINNI WINTERMEYER/REDUX

Like Google, ChatGPT is pulling in information from web publishers, summarizing it, and including it in its answers. But it has also struck financial deals with publishers, a payment for providing the information that gets rolled into its results. (MIT Technology Review has been in discussions with OpenAI, Google, Perplexity, and others about publisher deals but has not entered into any agreements. Editorial was neither party to nor informed about the content of those discussions.)

But the thing is, for web search to accomplish what OpenAI wants—to be more current than the language model—it also has to bring in information from all sorts of publishers and sources that it doesn’t have deals with. OpenAI’s head of media partnerships, Varun Shetty, told MIT Technology Review that it won’t give preferential treatment to its publishing partners.

Instead, OpenAI told me, the model itself finds the most trustworthy and useful source for any given question. And that can get weird too. In that very first example it showed me—when Turley ran that name search—it described a story I wrote years ago for Wired about being hacked. That story remains one of the most widely read I’ve ever written. But ChatGPT didn’t link to it. It linked to a short rewrite from The Verge. Admittedly, this was on a prototype version of search, which was, as Turley said, “risky.” 

When I asked him about it, he couldn’t really explain why the model chose the sources that it did, because the model itself makes that evaluation. The company helps steer it by identifying—sometimes with the help of users—what it considers better answers, but the model actually selects them. 

“And in many cases, it gets it wrong, which is why we have work to do,” said Turley. “Having a model in the loop is a very, very different mechanism than how a search engine worked in the past.”

Indeed! 

The model, whether it’s OpenAI’s GPT-4o or Google’s Gemini or Anthropic’s Claude, can be very, very good at explaining things. But the rationale behind its explanations, its reasons for selecting a particular source, and even the language it may use in an answer are all pretty mysterious. Sure, a model can explain very many things, but not when that comes to its own answers. 


It was almost a decade ago, in 2016, when Pichai wrote that Google was moving from “mobile first” to “AI first”: “But in the next 10 years, we will shift to a world that is AI-first, a world where computing becomes universally available—be it at home, at work, in the car, or on the go—and interacting with all of these surfaces becomes much more natural and intuitive, and above all, more intelligent.” 

We’re there now—sort of. And it’s a weird place to be. It’s going to get weirder. That’s especially true as these things we now think of as distinct—querying a search engine, prompting a model, looking for a photo we’ve taken, deciding what we want to read or watch or hear, asking for a photo we wish we’d taken, and didn’t, but would still like to see—begin to merge. 

The search results we see from generative AI are best understood as a waypoint rather than a destination. What’s most important may not be search in itself; rather, it’s that search has given AI model developers a path to incorporating real-time information into their inputs and outputs. And that opens up all sorts of possibilities.

“A ChatGPT that can understand and access the web won’t just be about summarizing results. It might be about doing things for you. And I think there’s a fairly exciting future there,” says OpenAI’s Weil. “You can imagine having the model book you a flight, or order DoorDash, or just accomplish general tasks for you in the future. It’s just once the model understands how to use the internet, the sky’s the limit.”

This is the agentic future we’ve been hearing about for some time now, and the more AI models make use of real-time data from the internet, the closer it gets. 

Let’s say you have a trip coming up in a few weeks. An agent that can get data from the internet in real time can book your flights and hotel rooms, make dinner reservations, and more, based on what it knows about you and your upcoming travel—all without your having to guide it. Another agent could, say, monitor the sewage output of your home for certain diseases, and order tests and treatments in response. You won’t have to search for that weird noise your car is making, because the agent in your vehicle will already have done it and made an appointment to get the issue fixed. 

“It’s not always going to be just doing search and giving answers,” says Pichai. “Sometimes it’s going to be actions. Sometimes you’ll be interacting within the real world. So there is a notion of universal assistance through it all.”

And the ways these things will be able to deliver answers is evolving rapidly now too. For example, today Google can not only search text, images, and even video; it can create them. Imagine overlaying that ability with search across an array of formats and devices. “Show me what a Townsend’s warbler looks like in the tree in front of me.” Or “Use my existing family photos and videos to create a movie trailer of our upcoming vacation to Puerto Rico next year, making sure we visit all the best restaurants and top landmarks.”

“We have primarily done it on the input side,” he says, referring to the ways Google can now search for an image or within a video. “But you can imagine it on the output side too.”

This is the kind of future Pichai says he is excited to bring online. Google has already showed off a bit of what that might look like with NotebookLM, a tool that lets you upload large amounts of text and have it converted into a chatty podcast. He imagines this type of functionality—the ability to take one type of input and convert it into a variety of outputs—transforming the way we interact with information. 

In a demonstration of a tool called Project Astra this summer at its developer conference, Google showed one version of this outcome, where cameras and microphones in phones and smart glasses understand the context all around you—online and off, audible and visual—and have the ability to recall and respond in a variety of ways. Astra can, for example, look at a crude drawing of a Formula One race car and not only identify it, but also explain its various parts and their uses. 

But you can imagine things going a bit further (and they will). Let’s say I want to see a video of how to fix something on my bike. The video doesn’t exist, but the information does. AI-assisted generative search could theoretically find that information somewhere online—in a user manual buried in a company’s website, for example—and create a video to show me exactly how to do what I want, just as it could explain that to me with words today.

These are the kinds of things that start to happen when you put the entire compendium of human knowledge—knowledge that’s previously been captured in silos of language and format; maps and business registrations and product SKUs; audio and video and databases of numbers and old books and images and, really, anything ever published, ever tracked, ever recorded; things happening right now, everywhere—and introduce a model into all that. A model that maybe can’t understand, precisely, but has the ability to put that information together, rearrange it, and spit it back in a variety of different hopefully helpful ways. Ways that a mere index could not.

That’s what we’re on the cusp of, and what we’re starting to see. And as Google rolls this out to a billion people, many of whom will be interacting with a conversational AI for the first time, what will that mean? What will we do differently? It’s all changing so quickly. Hang on, just hang on. 

Small language models: 10 Breakthrough Technologies 2025

WHO

Allen Institute for Artificial Intelligence, Anthropic, Google, Meta, Microsoft, OpenAI

WHEN

Now

Make no mistake: Size matters in the AI world. When OpenAI launched GPT-3 back in 2020, it was the largest language model ever built. The firm showed that supersizing this type of model was enough to send performance through the roof. That kicked off a technology boom that has been sustained by bigger models ever since. As Noam Brown, a research scientist at OpenAI, told an audience at TEDAI San Francisco in October, “The incredible progress in AI over the past five years can be summarized in one word: scale.”

But as the marginal gains for new high-end models trail off, researchers are figuring out how to do more with less. For certain tasks, smaller models that are trained on more focused data sets can now perform just as well as larger ones—if not better. That’s a boon for businesses eager to deploy AI in a handful of specific ways. You don’t need the entire internet in your model if you’re making the same kind of request again and again. 

Most big tech firms now boast fun-size versions of their flagship models for this purpose: OpenAI offers both GPT-4o and GPT-4o mini; Google DeepMind has Gemini Ultra and Gemini Nano; and Anthropic’s Claude 3 comes in three flavors: outsize Opus, midsize Sonnet, and tiny Haiku. Microsoft is pioneering a range of small language models called Phi.

A growing number of smaller companies offer small models as well. The AI startup Writer claims that its latest language model matches the performance of the largest top-tier models on many key metrics despite in some cases having just a 20th as many parameters (the values that get calculated during training and determine how a model behaves). 

Explore the full 2025 list of 10 Breakthrough Technologies.

Smaller models are more efficient, making them quicker to train and run. That’s good news for anyone wanting a more affordable on-ramp. And it could be good for the climate, too: Because smaller models work with a fraction of the computer oomph required by their giant cousins, they burn less energy. 

These small models also travel well: They can run right in our pockets, without needing to send requests to the cloud. Small is the next big thing.

Generative AI search: 10 Breakthrough Technologies 2025

WHO

Apple, Google, Meta, Microsoft, OpenAI, Perplexity

WHEN

Now

Google’s introduction of AI Overviews, powered by its Gemini language model, will alter how billions of people search the internet. And generative search may be the first step toward an AI agent that handles any question you have or task you need done.

Rather than returning a list of links, AI Overviews offer concise answers to your queries. This makes it easier to get quick insights without scrolling and clicking through to multiple sources. After a rocky start with high-profile nonsense results following its US release in May 2024, Google limited its use of answers that draw on user-­generated content or satire and humor sites.   

Explore the full 2025 list of 10 Breakthrough Technologies.

The rise of generative search isn’t limited to Google. Microsoft and OpenAI both rolled out versions in 2024 as well. Meanwhile, in more places, on our computers and other gadgets, AI-assisted searches are now analyzing images, audio, and video to return custom answers to our queries. 

But Google’s global search dominance makes it the most important player, and the company has already rolled out AI Overviews to more than a billion people worldwide. The result is searches that feel more like conversations. Google and OpenAI both report that people interact differently with generative search—they ask longer questions and pose more follow-ups.    

This new application of AI has serious implications for online advertising and (gulp) media. Because these search products often summarize information from online news stories and articles in their responses, concerns abound that generative search results will leave little reason for people to click through to the original sources, depriving those websites of potential ad revenue. A number of publishers and artists have sued over the use of their content to train AI models; now, generative search will be another battleground between media and Big Tech.

Fast-learning robots: 10 Breakthrough Technologies 2025

WHO

Agility, Amazon, Covariant, Robust, Toyota Research Institute

WHEN

Now

Generative AI is causing a paradigm shift in how robots are trained. It’s now clear how we might finally build the sort of truly capable robots that have for decades remained the stuff of science fiction. 

Robotics researchers are no strangers to artificial intelligence—it has for years helped robots detect objects in their path, for example. But a few years ago, roboticists began marveling at the progress being made in large language models. Makers of those models could feed them massive amounts of text—books, poems, manuals—and then fine-tune them to generate text based on prompts. 

Explore the full 2025 list of 10 Breakthrough Technologies.

The idea of doing the same for robotics was tantalizing—but incredibly complicated. It’s one thing to use AI to create sentences on a screen, but another thing entirely to use it to coach a physical robot in how to move about and do useful things.

Now, roboticists have made major breakthroughs in that pursuit. One was figuring out how to combine different sorts of data and then make it all useful and legible to a robot. Take washing dishes as an example. You can collect data from someone washing dishes while wearing sensors. Then you can combine that with teleoperation data from a human doing the same task with robotic arms. On top of all that, you can also scrape the internet for images and videos of people doing dishes.

By merging these data sources properly into a new AI model, it’s possible to train a robot that, though not perfect, has a massive head start over those trained with more manual methods. Seeing so many ways that a single task can be done makes it easier for AI models to improvise, and to surmise what a robot’s next move should be in the real world. 

It’s a breakthrough that’s set to redefine how robots learn. Robots that work in commercial spaces like warehouses are already using such advanced training methods, and the lessons we learn from those experiments could lay the groundwork for smart robots that help out at home. 

The AI Hype Index: Robot pets, simulated humans, and Apple’s AI text summaries

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

More than 70 countries went to the polls in 2024. The good news is that this year of global elections turned out to be largely free from any major deepfake campaigns or AI manipulation. Instead we saw lots of AI slop: buff Trump, Elon as ultra-Chad, California as catastrophic wasteland. While some worry that development of large language models is slowing down, you wouldn’t know it from the steady drumbeat of new products, features, and services rolling out from itty-bitty startups and massive incumbents alike. So what’s for real and what’s just a lot of hallucinatory nonsense? 

The biggest AI flops of 2024

The past 12 months have been undeniably busy for those working in AI. There have been more successful product launches than we can count, and even Nobel Prizes. But it hasn’t always been smooth sailing.

AI is an unpredictable technology, and the increasing availability of generative models has led people to test their limits in new, weird, and sometimes harmful ways. These were some of 2024’s biggest AI misfires. 

AI slop infiltrated almost every corner of the internet

Generative AI makes creating reams of text, images, videos, and other types of material a breeze. Because it takes just a few seconds between entering a prompt for your model of choice to spit out the result, these models have become a quick, easy way to produce content on a massive scale. And 2024 was the year we started calling this (generally poor quality) media what it is—AI slop.  

This low-stakes way of creating AI slop means it can now be found in pretty much every corner of the internet: from the newsletters in your inbox and books sold on Amazon, to ads and articles across the web and shonky pictures on your social media feeds. The more emotionally evocative these pictures are (wounded veterans, crying children, a signal of support in the Israel-Palestine conflict) the more likely they are to be shared, resulting in higher engagement and ad revenue for their savvy creators.

AI slop isn’t just annoying—its rise poses a genuine problem for the future of the very models that helped to produce it. Because those models are trained on data scraped from the internet, the increasing number of junky websites containing AI garbage means there’s a very real danger models’ output and performance will get steadily worse

AI art is warping our expectations of real events

2024 was also the year that the effects of surreal AI images started seeping into our real lives. Willy’s Chocolate Experience, a wildly unofficial immersive event inspired by Roald Dahl’s Charlie and the Chocolate Factory, made headlines across the world in February after its fantastical AI-generated marketing materials gave visitors the impression it would be much grander than the sparsely-decorated warehouse its producers created.

Similarly, hundreds of people lined the streets of Dublin for a Halloween parade that didn’t exist. A Pakistan-based website used AI to create a list of events in the city, which was shared widely across social media ahead of October 31. Although the SEO-baiting site (myspirithalloween.com) has since been taken down, both events illustrate how misplaced public trust in AI-generated material online can come back to haunt us.

Grok allows users to create images of pretty much any scenario

The vast majority of major AI image generators have guardrails—rules that dictate what AI models can and can’t do—to prevent users from creating violent, explicit, illegal, and other types of harmful content. Sometimes these guardrails are just meant to make sure that no one makes blatant use of others’ intellectual property. But Grok, an assistant made by Elon Musk’s AI company, called xAI, ignores almost all of these principles in line with Musk’s rejection of what he calls “woke AI.”

Whereas other image models will generally refuse to create images of celebrities, copyrighted material, violence, or terrorism—unless they’re tricked into ignoring these rules—Grok will happily generate images of Donald Trump firing a bazooka, or Mickey Mouse holding a bomb. While it draws the line at generating nude images, its refusal to play by the rules undermines other companies’ efforts to steer clear of creating problematic material.

Sexually explicit deepfakes of Taylor Swift circulated online

In January, non-consensual deepfake nudes of singer Taylor Swift started circulating on social media, including X and Facebook. A Telegram community tricked Microsoft’s AI image generator Designer into making the explicit images, demonstrating how guardrails can be circumvented even when they are in place. 

While Microsoft quickly closed the system’s loopholes, the incident shone a light on the platforms’ poor content-moderation policies, after posts containing the images circulated widely and remained live for days. But the most chilling takeaway is how powerless we still are to fight non-consensual deepfake porn. While watermarking and data-poisoning tools can help, they’ll need to be adopted much more widely to make a difference.

Business chatbots went haywire

As AI becomes more widespread, businesses are racing to adopt generative tools to save time and money, and to maximize efficiency. The problem is—chatbots make stuff up and can’t be relied upon to always provide you with accurate information.

Air Canada found this out the hard way after its chatbot advised a customer to follow a bereavement refund policy that didn’t exist. In February, a Canadian small-claims tribunal upheld the customer’s legal complaint, despite the airline’s assertion that the chatbot was a “separate legal entity that is responsible for its own actions.”

In other high-profile examples of how chatbots can do more harm than good, delivery firm DPD’s bot cheerfully swore and called itself useless with little prompting, while a different bot set up to provide New Yorkers with accurate information about their city’s government ended up dispensing guidance on how to break the law.

AI gadgets aren’t exactly setting the market alight

Hardware assistants are something the AI industry tried, and failed, to crack in 2024. Humane attempted to sell customers on the promise of the Ai Pin, a wearable lapel computer, but even slashing its price failed to boost weak sales. The Rabbit R1, a ChatGPT-based personal assistant device, suffered a similar fate, following a rash of critical reviews and reports that it was slow and buggy. Both products seemed to be trying to solve a problem that did not actually exist. 

AI search summaries went awry

Have you ever added glue to a pizza, or eaten a small rock? These are just some of the outlandish suggestions that Google’s AI Overviews feature gave web users in May after the search giant added generated responses to the top of search results. Because AI systems can’t tell the difference between a factually correct news story and a joke post on Reddit, users raced to find the strangest responses AI Overviews could generate.

But AI summaries can also have serious consequences. A new iPhone feature that groups app notifications together and creates summaries of their contents, recently generated a false BBC News headline. The summary falsely stated that Luigi Mangione, who has been charged with the murder of healthcare insurance CEO Brian Thompson, had shot himself. The same feature had previously created a headline claiming that Israeli prime minister Benjamin Netanyahu had been arrested, which was also incorrect. These kinds of errors can inadvertently spread misinformation and undermine trust in news organizations.

The humans behind the robots

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Here’s a question. Imagine that, for $15,000, you could purchase a robot to pitch in with all the mundane tasks in your household. The catch (aside from the price tag) is that for 80% of those tasks, the robot’s AI training isn’t good enough for it to act on its own. Instead, it’s aided by a remote assistant working from the Philippines to help it navigate your home and clear your table or put away groceries. Would you want one?

That’s the question at the center of my story for our magazine, published online today, on whether we will trust humanoid robots enough to welcome them into our most private spaces, particularly if they’re part of an asymmetric labor arrangement in which workers in low-wage countries perform physical tasks for us in our homes through robot interfaces. In the piece, I wrote about one robotics company called Prosper and its massive effort—bringing in former Pixar designers and professional butlers—to design a trustworthy household robot named Alfie. It’s quite a ride. Read the story here.

There’s one larger question that the story raises, though, about just how profound a shift in labor dynamics robotics could bring in the coming years. 

For decades, robots have found success on assembly lines and in other somewhat predictable environments. Then, in the last couple of years, robots started being able to learn tasks more quickly thanks to AI, and that has broadened their applications to tasks in more chaotic settings, like picking orders in warehouses. But a growing number of well-funded companies are pushing for an even more monumental shift. 

Prosper and others are betting that they don’t have to build a perfect robot that can do everything on its own. Instead, they can build one that’s pretty good, but receives help from remote operators anywhere in the world. If that works well enough, they’re hoping to bring robots into jobs that most of us would have guessed couldn’t be automated: the work of hotel housekeepers, care providers in hospitals, or domestic help. “Almost any indoor physical labor” is on the table, Prosper’s founder and CEO, Shariq Hashme, told me. 

Until now, we’ve mostly thought about automation and outsourcing as two separate forces that can affect the labor market. Jobs might be outsourced overseas or lost to automation, but not both. A job that couldn’t be sent offshore and could not yet be fully automated by machines, like cleaning a hotel room, wasn’t going anywhere. Now, advancements in robotics are promising that employers can outsource such a job to low-wage countries without needing the technology to fully automate it. 

It’s a tall order, to be clear. Robots, as advanced as they’ve gotten, may find it difficult to move around complex environments like hotels and hospitals, even with assistance. That will take years to change. However, robots will only get more nimble, as will the systems that enable them to be controlled from halfway around the world. Eventually, the bets made by these companies may pay off.

What would that mean? One, the labor movement’s battle with AI—which this year has focused its attention on automation at ports and generative AI’s theft of artists’ work—will have a whole new battle to fight. It won’t just be dock workers, delivery drivers, and actors seeking contracts to protect their jobs from automation—it will be hospitality and domestic workers too, along with many others. 

Second, our expectations of privacy would radically shift. People buying those hypothetical household robots would have to be comfortable with the idea that someone that they have never met is seeing their dirty laundry—literally and figuratively. 

Some of those changes might happen sooner rather than later. For robots to learn how to navigate places effectively, they need training data, and this year has already seen a race to collect new data sets to help them learn. To achieve their ambitions for teleoperated robots, companies will expand their search for training data to hospitals, workplaces, hotels, and more. 


Now read the rest of The Algorithm

Deeper Learning

This is where the data to build AI comes from

AI developers often don’t really know or share much about the sources of the data they are using, and the Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They dug into 4,000 public data sets spanning over 600 languages, 67 countries, and three decades to understand what’s feeding today’s top AI models, and how that will affect the rest of us. 

Why it matters: AI is being incorporated into everything, and what goes into the AI models determines what comes out. However, the team found that AI’s data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies, a shift from how AI models were being trained just a decade ago. Over 90% of the data sets that the researchers analyzed came from Europe and North America, and over 70% of data for both speech and image data sets comes from YouTube. This concentration means that AI models are unlikely to “capture all the nuances of humanity and all the ways that we exist,” says Sara Hooker, a researcher involved in the project. Read more from Melissa Heikkilä.

Bits and Bytes

In the shadows of Arizona’s data center boom, thousands live without power

As new research shows that AI’s emissions have soared, Arizona is expanding plans for AI data centers while rejecting plans to finally provide electricity to parts of the Navajo Nation’s land. (Washington Post)

AI is changing how we study bird migration

After decades of frustration, machine-learning tools are unlocking a treasure trove of acoustic data for ecologists. (MIT Technology Review)

OpenAI unveils a more advanced reasoning model in race with Google

The new o3 model, unveiled during a livestreamed event on Friday, spends more time computing an answer before responding to user queries, with the goal of solving more complex multi-step problems. (Bloomberg)

How your car might be making roads safer

Researchers say data from long-haul trucks and General Motors cars is critical for addressing traffic congestion and road safety. Data privacy experts have concerns. (New York Times)