OpenAI has upped its lobbying efforts nearly sevenfold

OpenAI spent $1.76 million on government lobbying in 2024 and $510,000 in the last three months of the year alone, according to a new disclosure filed on Tuesday—a significant jump from 2023, when the company spent just $260,000 on Capitol Hill. The company also disclosed a new in-house lobbyist, Meghan Dorn, who worked for five years for Senator Lindsey Graham and started at OpenAI in October. The filing also shows activity related to two new pieces of legislation in the final months of the year: the House’s AI Advancement and Reliability Act, which would set up a government center for AI research, and the Senate’s Future of Artificial Intelligence Innovation Act, which would create shared benchmark tests for AI models. 

OpenAI did not respond to questions about its lobbying efforts.

But perhaps more important, the disclosure is a clear signal of the company’s arrival as a political player, as its first year of serious lobbying ends and Republican control of Washington begins. While OpenAI’s lobbying spending is still dwarfed by its peers’—Meta tops the list of Big Tech spenders, with more than $24 million in 2024—the uptick comes as it and other AI companies have helped redraw the shape of AI policy. 

For the past few years, AI policy has been something like a whack-a-mole response to the risks posed by deepfakes and misinformation. But over the last year, AI companies have started to position the success of the technology as pivotal to national security and American competitiveness, arguing that the government must therefore support the industry’s growth. As a result, OpenAI and others now seem poised to gain access to cheaper energy, lucrative national security contracts, and a more lax regulatory environment that’s unconcerned with the minutiae of AI safety.

While the big players seem more or less aligned on this grand narrative, messy divides on other issues are still threatening to break through the harmony on display at President Trump’s inauguration this week.

AI regulation really began in earnest after ChatGPT launched in November 2022. At that point, “a lot of the conversation was about responsibility,” says Liana Keesing, campaigns manager for technology reform at Issue One, a democracy nonprofit that tracks Big Tech’s influence. 

Companies were asked what they’d do about sexually abusive deepfake images and election disinformation. “Sam Altman did a very good job coming in and painting himself early as a supporter of that process,” Keesing says. 

OpenAI started its official lobbying effort around October 2023, hiring Chan Park—a onetime Senate Judiciary Committee counsel and Microsoft lobbyist—to lead the effort. Lawmakers, particularly then Senate majority leader Chuck Schumer, were vocal about wanting to curb these particular harms; OpenAI hired Schumer’s former legal counsel, Reginald Babin, as a lobbyist, according to data from OpenSecrets. This past summer, the company hired the veteran political operative Chris Lehane as its head of global policy. 

OpenAI’s previous disclosures confirm that the company’s lobbyists subsequently focused much of last year on legislation like the No Fakes Act and the Protect Elections from Deceptive AI Act. The bills did not materialize into law. But as the year went on, the regulatory goals of AI companies began to change. “One of the biggest shifts that we’ve seen,” Keesing says, “is that they’ve really started to focus on energy.” 

In September, Altman, along with leaders from Nvidia, Anthropic, and Google, visited the White House and pitched the vision that US competitiveness in AI will depend on subsidized energy infrastructure to train the best models. Altman proposed to the Biden administration the construction of multiple five-gigawatt data centers, which would each consume as much electricity as New York City. 

Around the same time, companies like Meta and Microsoft started to say that nuclear energy will provide the path forward for AI, announcing deals aimed at firing up new nuclear power plants

It seems likely OpenAI’s policy team was already planning for this particular shift. In April, the company hired lobbyist Matthew Rimkunas, who worked for Bill Gates’s sustainable energy effort Breakthrough Energies and, before that, spent 16 years working for Senator Graham; the South Carolina Republican serves on the Senate subcommittee that manages nuclear safety. 

This new AI energy race is inseparable from the positioning of AI as essential for national security and US competitiveness with China. OpenAI laid out its position in a blog post in October, writing, “AI is a transformational technology that can be used to strengthen democratic values or to undermine them. That’s why we believe democracies should continue to take the lead in AI development.” Then in December, the company went a step further and reversed its policy against working with the military, announcing it would develop AI models with the defense-tech company Anduril to help take down drones around military bases. 

That same month, Sam Altman said during an interview with The Free Press that the Biden administration was “not that effective” in shepherding AI: “The things that I think should have been the administration’s priorities, and I hope will be the next administration’s priorities, are building out massive AI infrastructure in the US, having a supply chain in the US, things like that.”

That characterization glosses over the CHIPS Act, a $52 billion stimulus to the domestic chips industry that is, at least on paper, aligned with Altman’s vision. (It also preceded an executive order Biden issued just last week, to lease federal land to host the type of gigawatt-scale data centers that Altman had been asking for.)

Intentionally or not, Altman’s posture aligned him with the growing camaraderie between President Trump and Silicon Valley. Mark Zuckerberg, Elon Musk, Jeff Bezos, and Sundar Pichai all sat directly behind Trump’s family at the inauguration on Monday, and Altman also attended. Many of them had also made sizable donations to Trump’s inaugural fund, with Altman personally throwing in $1 million.

It’s easy to view the inauguration as evidence that these tech leaders are aligned with each other, and with other players in Trump’s orbit. But there are still some key dividing lines that will be worth watching. Notably, there’s the clash over H-1B visas, which allow many noncitizen AI researchers to work in the US. Musk and Vivek Ramaswamy (who is, as of this week, no longer a part of the so-called Department of Government Efficiency) have been pushing for that visa program to be expanded. This sparked backlash from some allies of the Trump administration, perhaps most loudly Steve Bannon

Another fault line is the battle between open- and closed-source AI. Google and OpenAI prevent anyone from knowing exactly what’s in their most powerful models, often arguing that this keeps them from being used improperly by bad actors. Musk has sued OpenAI and Microsoft over the issue, alleging that closed-source models are antithetical to OpenAI’s hybrid nonprofit structure. Meta, whose Llama model is open-source, recently sided with Musk in that lawsuit. Venture capitalist and Trump ally Marc Andreessen echoed these criticisms of OpenAI on X just hours after the inauguration. (Andreessen has also said that making AI models open-source “makes overbearing regulations unnecessary.”) 

Finally, there are the battles over bias and free speech. The vastly different approaches that social media companies have taken to moderating content—including Meta’s recent announcement that it would end its US fact-checking program—raise questions about whether the way AI models are moderated will continue to splinter too. Musk has lamented what he calls the “wokeness” of many leading models, and Andreessen said on Tuesday that “Chinese LLMs are much less censored than American LLMs” (though that’s not quite true, given that many Chinese AI models have government-mandated censorship in place that forbids particular topics). Altman has been more equivocal: “No two people are ever going to agree that one system is perfectly unbiased,” he told The Free Press.

It’s only the start of a new era in Washington, but the White House has been busy. It has repealed many executive orders signed by President Biden, including the landmark order on AI that imposed rules for government use of the technology (while it appears to have kept Biden’s order on leasing land for more data centers). Altman is busy as well. OpenAI, Oracle, and SoftBank reportedly plan to spend up to $500 billion on a joint venture for new data centers; the project was announced by President Trump, with Altman standing alongside. And according to Axios, Altman will also be part of a closed-door briefing with government officials on January 30, reportedly about OpenAI’s development of a powerful new AI agent.

The second wave of AI coding is here

Ask people building generative AI what generative AI is good for right now—what they’re really fired up about—and many will tell you: coding. 

“That’s something that’s been very exciting for developers,” Jared Kaplan, chief scientist at Anthropic, told MIT Technology Review this month: “It’s really understanding what’s wrong with code, debugging it.”

Copilot, a tool built on top of OpenAI’s large language models and launched by Microsoft-backed GitHub in 2022, is now used by millions of developers around the world. Millions more turn to general-purpose chatbots like Anthropic’s Claude, OpenAI’s ChatGPT, and Google DeepMind’s Gemini for everyday help.

“Today, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers,” Alphabet CEO Sundar Pichai claimed on an earnings call in October: “This helps our engineers do more and move faster.” Expect other tech companies to catch up, if they haven’t already.

It’s not just the big beasts rolling out AI coding tools. A bunch of new startups have entered this buzzy market too. Newcomers such as Zencoder, Merly, Cosine, Tessl (valued at $750 million within months of being set up), and Poolside (valued at $3 billion before it even released a product) are all jostling for their slice of the pie. “It actually looks like developers are willing to pay for copilots,” says Nathan Benaich, an analyst at investment firm Air Street Capital: “And so code is one of the easiest ways to monetize AI.”

Such companies promise to take generative coding assistants to the next level. Instead of providing developers with a kind of supercharged autocomplete, like most existing tools, this next generation can prototype, test, and debug code for you. The upshot is that developers could essentially turn into managers, who may spend more time reviewing and correcting code written by a model than writing it from scratch themselves. 

But there’s more. Many of the people building generative coding assistants think that they could be a fast track to artificial general intelligence (AGI), the hypothetical superhuman technology that a number of top firms claim to have in their sights.

“The first time we will see a massively economically valuable activity to have reached human-level capabilities will be in software development,” says Eiso Kant, CEO and cofounder of Poolside. (OpenAI has already boasted that its latest o3 model beat the company’s own chief scientist in a competitive coding challenge.)

Welcome to the second wave of AI coding. 

Correct code 

Software engineers talk about two types of correctness. There’s the sense in which a program’s syntax (its grammar) is correct—meaning all the words, numbers, and mathematical operators are in the right place. This matters a lot more than grammatical correctness in natural language. Get one tiny thing wrong in thousands of lines of code and none of it will run.

The first generation of coding assistants are now pretty good at producing code that’s correct in this sense. Trained on billions of pieces of code, they have assimilated the surface-level structures of many types of programs.  

But there’s also the sense in which a program’s function is correct: Sure, it runs, but does it actually do what you wanted it to? It’s that second level of correctness that the new wave of generative coding assistants are aiming for—and this is what will really change the way software is made.

“Large language models can write code that compiles, but they may not always write the program that you wanted,” says Alistair Pullen, a cofounder of Cosine. “To do that, you need to re-create the thought processes that a human coder would have gone through to get that end result.”

The problem is that the data most coding assistants have been trained on—the billions of pieces of code taken from online repositories—doesn’t capture those thought processes. It represents a finished product, not what went into making it. “There’s a lot of code out there,” says Kant. “But that data doesn’t represent software development.”

What Pullen, Kant, and others are finding is that to build a model that does a lot more than autocomplete—one that can come up with useful programs, test them, and fix bugs—you need to show it a lot more than just code. You need to show it how that code was put together.  

In short, companies like Cosine and Poolside are building models that don’t just mimic what good code looks like—whether it works well or not—but mimic the process that produces such code in the first place. Get it right and the models will come up with far better code and far better bug fixes. 

Breadcrumbs

But you first need a data set that captures that process—the steps that a human developer might take when writing code. Think of these steps as a breadcrumb trail that a machine could follow to produce a similar piece of code itself.

Part of that is working out what materials to draw from: Which sections of the existing codebase are needed for a given programming task? “Context is critical,” says Zencoder founder Andrew Filev. “The first generation of tools did a very poor job on the context, they would basically just look at your open tabs. But your repo [code repository] might have 5000 files and they’d miss most of it.”

Zencoder has hired a bunch of search engine veterans to help it build a tool that can analyze large codebases and figure out what is and isn’t relevant. This detailed context reduces hallucinations and improves the quality of code that large language models can produce, says Filev: “We call it repo grokking.”

Cosine also thinks context is key. But it draws on that context to create a new kind of data set. The company has asked dozens of coders to record what they were doing as they worked through hundreds of different programming tasks. “We asked them to write down everything,” says Pullen: “Why did you open that file? Why did you scroll halfway through? Why did you close it?” They also asked coders to annotate finished pieces of code, marking up sections that would have required knowledge of other pieces of code or specific documentation to write.

Cosine then takes all that information and generates a large synthetic data set that maps the typical steps coders take, and the sources of information they draw on, to finished pieces of code. They use this data set to train a model to figure out what breadcrumb trail it might need to follow to produce a particular program, and then how to follow it.  

Poolside, based in San Francisco, is also creating a synthetic data set that captures the process of coding, but it leans more on a technique called RLCE—reinforcement learning from code execution. (Cosine uses this too, but to a lesser degree.)

RLCE is analogous to the technique used to make chatbots like ChatGPT slick conversationalists, known as RLHF—reinforcement learning from human feedback. With RLHF, a model is trained to produce text that’s more like the kind human testers say they favor. With RLCE, a model is trained to produce code that’s more like the kind that does what it is supposed to do when it is run (or executed).  

Gaming the system

Cosine and Poolside both say they are inspired by the approach DeepMind took with its game-playing model AlphaZero. AlphaZero was given the steps it could take—the moves in a game—and then left to play against itself over and over again, figuring out via trial and error what sequence of moves were winning moves and which were not.  

“They let it explore moves at every possible turn, simulate as many games as you can throw compute at—that led all the way to beating Lee Sedol,” says Pengming Wang, a founding scientist at Poolside, referring to the Korean Go grandmaster that AlphaZero beat in 2016. Before Poolside, Wang worked at Google DeepMind on applications of AlphaZero beyond board games, including FunSearch, a version trained to solve advanced math problems.

When that AlphaZero approach is applied to coding, the steps involved in producing a piece of code—the breadcrumbs—become the available moves in a game, and a correct program becomes winning that game. Left to play by itself, a model can improve far faster than a human could. “A human coder tries and fails one failure at a time,” says Kant. “Models can try things 100 times at once.”

A key difference between Cosine and Poolside is that Cosine is using a custom version of GPT-4o provided by OpenAI, which makes it possible to train on a larger data set than the base model can cope with, but Poolside is building its own large language model from scratch.

Poolside’s Kant thinks that training a model on code from the start will give better results than adapting an existing model that has sucked up not only billions of pieces of code but most of the internet. “I’m perfectly fine with our model forgetting about butterfly anatomy,” he says.  

Cosine claims that its generative coding assistant, called Genie, tops the leaderboard on SWE-Bench, a standard set of tests for coding models. Poolside is still building its model but claims that what it has so far already matches the performance of GitHub’s Copilot.

“I personally have a very strong belief that large language models will get us all the way to being as capable as a software developer,” says Kant.

Not everyone takes that view, however.

Illogical LLMs

To Justin Gottschlich, the CEO and founder of Merly, large language models are the wrong tool for the job—period. He invokes his dog: “No amount of training for my dog will ever get him to be able to code, it just won’t happen,” he says. “He can do all kinds of other things, but he’s just incapable of that deep level of cognition.”  

Having worked on code generation for more than a decade, Gottschlich has a similar sticking point with large language models. Programming requires the ability to work through logical puzzles with unwavering precision. No matter how well large language models may learn to mimic what human programmers do, at their core they are still essentially statistical slot machines, he says: “I can’t train an illogical system to become logical.”

Instead of training a large language model to generate code by feeding it lots of examples, Merly does not show its system human-written code at all. That’s because to really build a model that can generate code, Gottschlich argues, you need to work at the level of the underlying logic that code represents, not the code itself. Merly’s system is therefore trained on an intermediate representation—something like the machine-readable notation that most programming languages get translated into before they are run.

Gottschlich won’t say exactly what this looks like or how the process works. But he throws out an analogy: There’s this idea in mathematics that the only numbers that have to exist are prime numbers, because you can calculate all other numbers using just the primes. “Take that concept and apply it to code,” he says.

Not only does this approach get straight to the logic of programming; it’s also fast, because millions of lines of code are reduced to a few thousand lines of intermediate language before the system analyzes them.

Shifting mindsets

What you think of these rival approaches may depend on what you want generative coding assistants to be.  

In November, Cosine banned its engineers from using tools other than its own products. It is now seeing the impact of Genie on its own engineers, who often find themselves watching the tool as it comes up with code for them. “You now give the model the outcome you would like, and it goes ahead and worries about the implementation for you,” says Yang Li, another Cosine cofounder.

Pullen admits that it can be baffling, requiring a switch of mindset. “We have engineers doing multiple tasks at once, flitting between windows,” he says. “While Genie is running code in one, they might be prompting it to do something else in another.”

These tools also make it possible to protype multiple versions of a system at once. Say you’re developing software that needs a payment system built in. You can get a coding assistant to simultaneously try out several different options—Stripe, Mango, Checkout—instead of having to code them by hand one at a time.

Genie can be left to fix bugs around the clock. Most software teams use bug-reporting tools that let people upload descriptions of errors they have encountered. Genie can read these descriptions and come up with fixes. Then a human just needs to review them before updating the code base.

No single human understands the trillions of lines of code in today’s biggest software systems, says Li, “and as more and more software gets written by other software, the amount of code will only get bigger.”

This will make coding assistants that maintain that code for us essential. “The bottleneck will become how fast humans can review the machine-generated code,” says Li.

How do Cosine’s engineers feel about all this? According to Pullen, at least, just fine. “If I give you a hard problem, you’re still going to think about how you want to describe that problem to the model,” he says. “Instead of writing the code, you have to write it in natural language. But there’s still a lot of thinking that goes into that, so you’re not really taking the joy of engineering away. The itch is still scratched.”

Some may adapt faster than others. Cosine likes to invite potential hires to spend a few days coding with its team. A couple of months ago it asked one such candidate to build a widget that would let employees share cool bits of software they were working on to social media. 

The task wasn’t straightforward, requiring working knowledge of multiple sections of Cosine’s millions of lines of code. But the candidate got it done in a matter of hours. “This person who had never seen our code base turned up on Monday and by Tuesday afternoon he’d shipped something,” says Li. “We thought it would take him all week.” (They hired him.)

But there’s another angle too. Many companies will use this technology to cut down on the number of programmers they hire. Li thinks we will soon see tiers of software engineers. At one end there will be elite developers with million-dollar salaries who can diagnose problems when the AI goes wrong. At the other end, smaller teams of 10 to 20 people will do a job that once required hundreds of coders. “It will be like how ATMs transformed banking,” says Li.

“Anything you want to do will be determined by compute and not head count,” he says. “I think it’s generally accepted that the era of adding another few thousand engineers to your organization is over.”

Warp drives

Indeed, for Gottschlich, machines that can code better than humans are going to be essential. For him, that’s the only way we will build the vast, complex software systems that he thinks we will eventually need. Like many in Silicon Valley, he anticipates a future in which humans move to other planets. That’s only going to be possible if we get AI to build the software required, he says: “Merly’s real goal is to get us to Mars.”

Gottschlich prefers to talk about “machine programming” rather than “coding assistants,” because he thinks that term frames the problem the wrong way. “I don’t think that these systems should be assisting humans—I think humans should be assisting them,” he says. “They can move at the speed of AI. Why restrict their potential?”

“There’s this cartoon called The Flintstones where they have these cars, but they only move when the drivers use their feet,” says Gottschlich. “This is sort of how I feel most people are doing AI for software systems.”

“But what Merly’s building is, essentially, spaceships,” he adds. He’s not joking. “And I don’t think spaceships should be powered by humans on a bicycle. Spaceships should be powered by a warp engine.”

If that sounds wild—it is. But there’s a serious point to be made about what the people building this technology think the end goal really is.

Gottschlich is not an outlier with his galaxy-brained take. Despite their focus on products that developers will want to use today, most of these companies have their sights on a far bigger payoff. Visit Cosine’s website and the company introduces itself as a “Human Reasoning Lab.” It sees coding as just the first step toward a more general-purpose model that can mimic human problem-solving in a number of domains.

Poolside has similar goals: The company states upfront that it is building AGI. “Code is a way of formalizing reasoning,” says Kant.

Wang invokes agents. Imagine a system that can spin up its own software to do any task on the fly, he says. “If you get to a point where your agent can really solve any computational task that you want through the means of software—that is a display of AGI, essentially.”

Down here on Earth, such systems may remain a pipe dream. And yet software engineering is changing faster than many at the cutting edge expected. 

“We’re not at a point where everything’s just done by machines, but we’re definitely stepping away from the usual role of a software engineer,” says Cosine’s Pullen. “We’re seeing the sparks of that new workflow—what it means to be a software engineer going into the future.”

Meta’s new AI model can translate speech from more than 100 languages

Meta has released a new AI model that can translate speech from 101 different languages. It represents a step toward real-time, simultaneous interpretation, where words are translated as soon as they come out of someone’s mouth. 

Typically, translation models for speech use a multistep approach. First they translate speech into text. Then they translate that text into text in another language. Finally, that translated text is turned into speech in the new language. This method can be inefficient, and at each step, errors and mistranslations can creep in. But Meta’s new model, called SeamlessM4T, enables more direct translation from speech in one language to speech in another. The model is described in a paper published today in Nature

Seamless can translate text with 23% more accuracy than the top existing models. And although another model, Google’s AudioPaLM, can technically translate more languages—113 of them, versus 101 for Seamless—it can translate them only into English. SeamlessM4T can translate into 36 other languages.

The key is a process called parallel data mining, which finds instances when the sound in a video or audio matches a subtitle in another language from crawled web data. The model learned to associate those sounds in one language with the matching pieces of text in another. This opened up a whole new trove of examples of translations for their model.

“Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition,” says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. “The mere number of languages they are supporting is a tremendous achievement.”

Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker, Canada Research Chair in Translation, Technologies and Society at Université Laval in Quebec, who didn’t work on Seamless. “Languages are a reflection of cultures, and cultures have their own ways of knowing things,” she says. 

When it comes to applications like medicine or law, machine translations need to be thoroughly checked by a human, she says. If not, misunderstandings can result. For example, when Google Translate was used to translate public health information about the covid-19 vaccine from the Virginia Department of Health in January 2021, it translated “not mandatory” in English into “not necessary” in Spanish, changing the whole meaning of the message.

AI models have much more examples to train on in some languages than others. This means current speech-to-speech models may be able to translate a language like Greek into English, where there may be many examples, but cannot translate from Swahili to Greek. The team behind Seamless aimed to solve this problem by pre-training the model on millions of hours of spoken audio in different languages. This pre-training allowed it to recognize general patterns in language, making it easier to process less widely spoken languages because it already had some baseline for what spoken language is supposed to sound like.  

The system is open-source, which the researchers hope will encourage others to build upon its current capabilities. But some are skeptical of how useful it may be compared with available alternatives. “Google’s translation model is not as open-source as Seamless, but it’s way more responsive and fast, and it doesn’t cost anything as an academic,” says Jaiswal.

The most exciting thing about Meta’s system is that it points to the possibility of instant interpretation across languages in the not-too-distant future—like the Babel fish in Douglas Adams’ cult novel The Hitchhiker’s Guide to the Galaxy. SeamlessM4T is faster than existing models but still not instant. That said, Meta claims to have a newer version of Seamless that’s as fast as human interpreters. 

“While having this kind of delayed translation is okay and useful, I think simultaneous translation will be even more useful,” says Kenny Zhu, director of the Arlington Computational Linguistics Lab at the University of Texas at Arlington, who is not affiliated with the new research.

Here’s our forecast for AI this year

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In December, our small but mighty AI reporting team was asked by our editors to make a prediction: What’s coming next for AI? 

In 2024, AI contributed both to Nobel Prize–winning chemistry breakthroughs and a mountain of cheaply made content that few people asked for but that nonetheless flooded the internet. Take AI-generated Shrimp Jesus images, among other examples. There was also a spike in greenhouse-gas emissions last year that can be attributed partly to the surge in energy-intensive AI. Our team got to thinking about how all of this will shake out in the year to come. 

As we look ahead, certain things are a given. We know that agents—AI models that do more than just converse with you and can actually go off and complete tasks for you—are the focus of many AI companies right now. Building them will raise lots of privacy questions about how much of our data and preferences we’re willing to give up in exchange for tools that will (allegedly) save us time. Similarly, the need to make AI faster and more energy efficient is putting so-called small language models in the spotlight. 

We instead wanted to focus on less obvious predictions. Mine were about how AI companies that previously shunned work in defense and national security might be tempted this year by contracts from the Pentagon, and how Donald Trump’s attitudes toward China could escalate the global race for the best semiconductors. Read the full list.

What’s not evident in that story is that the other predictions were not so clear-cut. Arguments ensued about whether or not 2025 will be the year of intimate relationships with chatbots, AI throuples, or traumatic AI breakups. To witness the fallout from our team’s lively debates (and hear more about what didn’t make the list), you can join our upcoming LinkedIn Live this Thursday, January 16. I’ll be talking it all over with Will Douglas Heaven, our senior editor for AI, and our news editor, Charlotte Jee. 

There are a couple other things I’ll be watching closely in 2025. One is how little the major AI players—namely OpenAI, Microsoft, and Google—are disclosing about the environmental burden of their models. Lots of evidence suggests that asking an AI model like ChatGPT about knowable facts, like the capital of Mexico, consumes much more energy (and releases far more emissions) than simply asking a search engine. Nonetheless, OpenAI’s Sam Altman in recent interviews has spoken positively about the idea of ChatGPT replacing the googling that we’ve all learned to do in the past two decades. It’s already happening, in fact. 

The environmental cost of all this will be top of mind for me in 2025, as will the possible cultural cost. We will go from searching for information by clicking links and (hopefully) evaluating sources to simply reading the responses that AI search engines serve up for us. As our editor in chief, Mat Honan, said in his piece on the subject, “Who wants to have to learn when you can just know?”


Now read the rest of The Algorithm

Deeper Learning

What’s next for our privacy?

The US Federal Trade Commission has taken a number of enforcement actions against data brokers, some of which have  tracked and sold geolocation data from users at sensitive locations like churches, hospitals, and military installations without explicit consent. Though limited in nature, these actions may offer some new and improved protections for Americans’ personal information. 

Why it matters: A consensus is growing that Americans need better privacy protections—and that the best way to deliver them would be for Congress to pass comprehensive federal privacy legislation. Unfortunately, that’s not going to happen anytime soon. Enforcement actions from agencies like the FTC might be the next best thing in the meantime. Read more in Eileen Guo’s excellent story here.

Bits and Bytes

Meta trained its AI on a notorious piracy database

New court records, Wired reports, reveal that Meta used “a notorious so-called shadow library of pirated books that originated in Russia” to train its generative AI models. (Wired)

OpenAI’s top reasoning model struggles with the NYT Connections game

The game requires players to identify how groups of words are related. OpenAI’s o1 reasoning model had a hard time. (Mind Matters)

Anthropic’s chief scientist on 5 ways agents will be even better in 2025

The AI company Anthropic is now worth $60 billion. The company’s cofounder and chief scientist, Jared Kaplan, shared how AI agents will develop in the coming year. (MIT Technology Review)

A New York legislator attempts to regulate AI with a new bill

This year, a high-profile bill in California to regulate the AI industry was vetoed by Governor Gavin Newsom. Now, a legislator in New York is trying to revive the effort in his own state. (MIT Technology Review)

Training robots in the AI-powered industrial metaverse

Imagine the bustling floors of tomorrow’s manufacturing plant: Robots, well-versed in multiple disciplines through adaptive AI education, work seamlessly and safely alongside human counterparts. These robots can transition effortlessly between tasks—from assembling intricate electronic components to handling complex machinery assembly. Each robot’s unique education enables it to predict maintenance needs, optimize energy consumption, and innovate processes on the fly, dictated by real-time data analyses and learned experiences in their digital worlds.

Training for robots like this will happen in a “virtual school,” a meticulously simulated environment within the industrial metaverse. Here, robots learn complex skills on accelerated timeframes, acquiring in hours what might take humans months or even years.

Beyond traditional programming

Training for industrial robots was once like a traditional school: rigid, predictable, and limited to practicing the same tasks over and over. But now we’re at the threshold of the next era. Robots can learn in “virtual classrooms”—immersive environments in the industrial metaverse that use simulation, digital twins, and AI to mimic real-world conditions in detail. This digital world can provide an almost limitless training ground that mirrors real factories, warehouses, and production lines, allowing robots to practice tasks, encounter challenges, and develop problem-solving skills. 

What once took days or even weeks of real-world programming, with engineers painstakingly adjusting commands to get the robot to perform one simple task, can now be learned in hours in virtual spaces. This approach, known as simulation to reality (Sim2Real), blends virtual training with real-world application, bridging the gap between simulated learning and actual performance.

Although the industrial metaverse is still in its early stages, its potential to reshape robotic training is clear, and these new ways of upskilling robots can enable unprecedented flexibility.

Italian automation provider EPF found that AI shifted the company’s entire approach to developing robots. “We changed our development strategy from designing entire solutions from scratch to developing modular, flexible components that could be combined to create complete solutions, allowing for greater coherence and adaptability across different sectors,” says EPF’s chairman and CEO Franco Filippi.

Learning by doing

AI models gain power when trained on vast amounts of data, such as large sets of labeled examples, learning categories, or classes by trial and error. In robotics, however, this approach would require hundreds of hours of robot time and human oversight to train a single task. Even the simplest of instructions, like “grab a bottle,” for example, could result in many varied outcomes depending on the bottle’s shape, color, and environment. Training then becomes a monotonous loop that yields little significant progress for the time invested.

Building AI models that can generalize and then successfully complete a task regardless of the environment is key for advancing robotics. Researchers from New York University, Meta, and Hello Robot have introduced robot utility models that achieve a 90% success rate in performing basic tasks across unfamiliar environments without additional training. Large language models are used in combination with computer vision to provide continuous feedback to the robot on whether it has successfully completed the task. This feedback loop accelerates the learning process by combining multiple AI techniques—and avoids repetitive training cycles.

Robotics companies are now implementing advanced perception systems capable of training and generalizing across tasks and domains. For example, EPF worked with Siemens to integrate visual AI and object recognition into its robotics to create solutions that can adapt to varying product geometries and environmental conditions without mechanical reconfiguration.

Learning by imagining

Scarcity of training data is a constraint for AI, especially in robotics. However, innovations that use digital twins and synthetic data to train robots have significantly advanced on previously costly approaches.

For example, Siemens’ SIMATIC Robot Pick AI expands on this vision of adaptability, transforming standard industrial robots—once limited to rigid, repetitive tasks—into complex machines. Trained on synthetic data—virtual simulations of shapes, materials, and environments—the AI prepares robots to handle unpredictable tasks, like picking unknown items from chaotic bins, with over 98% accuracy. When mistakes happen, the system learns, improving through real-world feedback. Crucially, this isn’t just a one-robot fix. Software updates scale across entire fleets, upgrading robots to work more flexibly and meet the rising demand for adaptive production.

Another example is the robotics firm ANYbotics, which generates 3D models of industrial environments that function as digital twins of real environments. Operational data, such as temperature, pressure, and flow rates, are integrated to create virtual replicas of physical facilities where robots can train. An energy plant, for example, can use its site plans to generate simulations of inspection tasks it needs robots to perform in its facilities. This speeds the robots’ training and deployment, allowing them to perform successfully with minimal on-site setup.

Simulation also allows for the near-costless multiplication of robots for training. “In simulation, we can create thousands of virtual robots to practice tasks and optimize their behavior. This allows us to accelerate training time and share knowledge between robots,” says Péter Fankhauser, CEO and co-founder of ANYbotics.

Because robots need to understand their environment regardless of orientation or lighting, ANYbotics and partner Digica created a method of generating thousands of synthetic images for robot training. By removing the painstaking work of collecting huge numbers of real images from the shop floor, the time needed to teach robots what they need to know is drastically reduced.

Similarly, Siemens leverages synthetic data to generate simulated environments to train and validate AI models digitally before deployment into physical products. “By using synthetic data, we create variations in object orientation, lighting, and other factors to ensure the AI adapts well across different conditions,” says Vincenzo De Paola, project lead at Siemens. “We simulate everything from how the pieces are oriented to lighting conditions and shadows. This allows the model to train under diverse scenarios, improving its ability to adapt and respond accurately in the real world.”

Digital twins and synthetic data have proven powerful antidotes to data scarcity and costly robot training. Robots that train in artificial environments can be prepared quickly and inexpensively for wide varieties of visual possibilities and scenarios they may encounter in the real world. “We validate our models in this simulated environment before deploying them physically,” says De Paola. “This approach allows us to identify any potential issues early and refine the model with minimal cost and time.”

This technology’s impact can extend beyond initial robot training. If the robot’s real-world performance data is used to update its digital twin and analyze potential optimizations, it can create a dynamic cycle of improvement to systematically enhance the robot’s learning, capabilities, and performance over time.

The well-educated robot at work

With AI and simulation powering a new era in robot training, organizations will reap the benefits. Digital twins allow companies to deploy advanced robotics with dramatically reduced setup times, and the enhanced adaptability of AI-powered vision systems makes it easier for companies to alter product lines in response to changing market demands.

The new ways of schooling robots are transforming investment in the field by also reducing risk. “It’s a game-changer,” says De Paola. “Our clients can now offer AI-powered robotics solutions as services, backed by data and validated models. This gives them confidence when presenting their solutions to customers, knowing that the AI has been tested extensively in simulated environments before going live.”

Filippi envisions this flexibility enabling today’s robots to make tomorrow’s products. “The need in one or two years’ time will be for processing new products that are not known today. With digital twins and this new data environment, it is possible to design today a machine for products that are not known yet,” says Filippi.

Fankhauser takes this idea a step further. “I expect our robots to become so intelligent that they can independently generate their own missions based on the knowledge accumulated from digital twins,” he says. “Today, a human still guides the robot initially, but in the future, they’ll have the autonomy to identify tasks themselves.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Anthropic’s chief scientist on 5 ways agents will be even better in 2025

Agents are the hottest thing in tech right now. Top firms from Google DeepMind to OpenAI to Anthropic are racing to augment large language models with the ability to carry out tasks by themselves. Known as agentic AI in industry jargon, such systems have fast become the new target of Silicon Valley buzz. Everyone from Nvidia to Salesforce is talking about how they are going to upend the industry. 

“We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies,” Sam Altman claimed in a blog post last week.

In the broadest sense, an agent is a software system that goes off and does something, often with minimal to zero supervision. The more complex that thing is, the smarter the agent needs to be. For many, large language models are now smart enough to power agents that can do a whole range of useful tasks for us, such as filling out forms, looking up a recipe and adding the ingredients to an online grocery basket, or using a search engine to do last-minute research before a meeting and producing a quick bullet-point summary.

In October, Anthropic showed off one of the most advanced agents yet: an extension of its Claude large language model called computer use. As the name suggests, it lets you direct Claude to use a computer much as a person would, by moving a cursor, clicking buttons, and typing text. Instead of simply having a conversation with Claude, you can now ask it to carry out on-screen tasks for you.

Anthropic notes that the feature is still cumbersome and error-prone. But it is already available to a handful of testers, including third-party developers at companies such as DoorDash, Canva, and Asana.

Computer use is a glimpse of what’s to come for agents. To learn what’s coming next, MIT Technology Review talked to Anthropic’s cofounder and chief scientist Jared Kaplan. Here are five ways that agents are going to get even better in 2025.

(Kaplan’s answers have been lightly edited for length and clarity.)

1/ Agents will get better at using tools

“I think there are two axes for thinking about what AI is capable of. One is a question of how complex the task is that a system can do. And as AI systems get smarter, they’re getting better in that direction. But another direction that’s very relevant is what kinds of environments or tools the AI can use. 

“So, like, if you go back almost 10 years now to [DeepMind’s Go-playing model] AlphaGo, we had AI systems that were superhuman in terms of how well they could play board games. But if all you can work with is a board game, then that’s a very restrictive environment. It’s not actually useful, even if it’s very smart. With text models, and then multimodal models, and now computer use—and perhaps in the future with robotics—you’re moving toward bringing AI into different situations and tasks, and making it useful. 

“We were excited about computer use basically for that reason. Until recently, with large language models, it’s been necessary to give them a very specific prompt, give them very specific tools, and then they’re restricted to a specific kind of environment. What I see is that computer use will probably improve quickly in terms of how well models can do different tasks and more complex tasks. And also to realize when they’ve made mistakes, or realize when there’s a high-stakes question and it needs to ask the user for feedback.”

2/ Agents will understand context  

“Claude needs to learn enough about your particular situation and the constraints that you operate under to be useful. Things like what particular role you’re in, what styles of writing or what needs you and your organization have.

Jared Kaplan

ANTHROPIC

“I think that we’ll see improvements there where Claude will be able to search through things like your documents, your Slack, etc., and really learn what’s useful for you. That’s underemphasized a bit with agents. It’s necessary for systems to be not only useful but also safe, doing what you expected.

“Another thing is that a lot of tasks won’t require Claude to do much reasoning. You don’t need to sit and think for hours before opening Google Docs or something. And so I think that a lot of what we’ll see is not just more reasoning but the application of reasoning when it’s really useful and important, but also not wasting time when it’s not necessary.”

3/ Agents will make coding assistants better

“We wanted to get a very initial beta of computer use out to developers to get feedback while the system was relatively primitive. But as these systems get better, they might be more widely used and really collaborate with you on different activities.

“I think DoorDash, the Browser Company, and Canva are all experimenting with, like, different kinds of browser interactions and designing them with the help of AI.

“My expectation is that we’ll also see further improvements to coding assistants. That’s something that’s been very exciting for developers. There’s just a ton of interest in using Claude 3.5 for coding, where it’s not just autocomplete like it was a couple of years ago. It’s really understanding what’s wrong with code, debugging it—running the code, seeing what happens, and fixing it.”

4/ Agents will need to be made safe

“We founded Anthropic because we expected AI to progress very quickly and [thought] that, inevitably, safety concerns were going to be relevant. And I think that’s just going to become more and more visceral this year, because I think these agents are going to become more and more integrated into the work we do. We need to be ready for the challenges, like prompt injection. 

[Prompt injection is an attack in which a malicious prompt is passed to a large language model in ways that its developers did not foresee or intend. One way to do this is to add the prompt to websites that models might visit.]

“Prompt injection is probably one of the No.1 things we’re thinking about in terms of, like, broader usage of agents. I think it’s especially important for computer use, and it’s something we’re working on very actively, because if computer use is deployed at large scale, then there could be, like, pernicious websites or something that try to convince Claude to do something that it shouldn’t do.

“And with more advanced models, there’s just more risk. We have a robust scaling policy where, as AI systems become sufficiently capable, we feel like we need to be able to really prevent them from being misused. For example, if they could help terrorists—that kind of thing.

“So I’m really excited about how AI will be useful—it’s actually also accelerating us a lot internally at Anthropic, with people using Claude in all kinds of ways, especially with coding. But, yeah, there’ll be a lot of challenges as well. It’ll be an interesting year.”

What’s next for AI in 2025

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

For the last couple of years we’ve had a go at predicting what’s coming next in AI. A fool’s game given how fast this industry moves. But we’re on a roll, and we’re doing it again.

How did we score last time round? Our four hot trends to watch out for in 2024 included what we called customized chatbots—interactive helper apps powered by multimodal large language models (check: we didn’t know it yet, but we were talking about what everyone now calls agents, the hottest thing in AI right now); generative video (check: few technologies have improved so fast in the last 12 months, with OpenAI and Google DeepMind releasing their flagship video generation models, Sora and Veo, within a week of each other this December); and more general-purpose robots that can do a wider range of tasks (check: the payoffs from large language models continue to trickle down to other parts of the tech industry, and robotics is top of the list). 

We also said that AI-generated election disinformation would be everywhere, but here—happily—we got it wrong. There were many things to wring our hands over this year, but political deepfakes were thin on the ground

So what’s coming in 2025? We’re going to ignore the obvious here: You can bet that agents and smaller, more efficient, language models will continue to shape the industry. Instead, here are five alternative picks from our AI team.

1. Generative virtual playgrounds 

If 2023 was the year of generative images and 2024 was the year of generative video—what comes next? If you guessed generative virtual worlds (a.k.a. video games), high fives all round.

We got a tiny glimpse of this technology in February, when Google DeepMind revealed a generative model called Genie that could take a still image and turn it into a side-scrolling 2D platform game that players could interact with. In December, the firm revealed Genie 2, a model that can spin a starter image into an entire virtual world.

Other companies are building similar tech. In October, the AI startups Decart and Etched revealed an unofficial Minecraft hack in which every frame of the game gets generated on the fly as you play. And World Labs, a startup cofounded by Fei-Fei Li—creator of ImageNet, the vast data set of photos that kick-started the deep-learning boom—is building what it calls large world models, or LWMs.

One obvious application is video games. There’s a playful tone to these early experiments, and generative 3D simulations could be used to explore design concepts for new games, turning a sketch into a playable environment on the fly. This could lead to entirely new types of games

But they could also be used to train robots. World Labs wants to develop so-called spatial intelligence—the ability for machines to interpret and interact with the everyday world. But robotics researchers lack good data about real-world scenarios with which to train such technology. Spinning up countless virtual worlds and dropping virtual robots into them to learn by trial and error could help make up for that.   

Will Douglas Heaven

2. Large language models that “reason”

The buzz was justified. When OpenAI revealed o1 in September, it introduced a new paradigm in how large language models work. Two months later, the firm pushed that paradigm forward in almost every way with o3—a model that just might reshape this technology for good.

Most models, including OpenAI’s flagship GPT-4, spit out the first response they come up with. Sometimes it’s correct; sometimes it’s not. But the firm’s new models are trained to work through their answers step by step, breaking down tricky problems into a series of simpler ones. When one approach isn’t working, they try another. This technique, known as “reasoning” (yes—we know exactly how loaded that term is), can make this technology more accurate, especially for math, physics, and logic problems.

It’s also crucial for agents.

In December, Google DeepMind revealed an experimental new web-browsing agent called Mariner. In the middle of a preview demo that the company gave to MIT Technology Review, Mariner seemed to get stuck. Megha Goel, a product manager at the company, had asked the agent to find her a recipe for Christmas cookies that looked like the ones in a photo she’d given it. Mariner found a recipe on the web and started adding the ingredients to Goel’s online grocery basket.

Then it stalled; it couldn’t figure out what type of flour to pick. Goel watched as Mariner explained its steps in a chat window: “It says, ‘I will use the browser’s Back button to return to the recipe.’”

It was a remarkable moment. Instead of hitting a wall, the agent had broken the task down into separate actions and picked one that might resolve the problem. Figuring out you need to click the Back button may sound basic, but for a mindless bot it’s akin to rocket science. And it worked: Mariner went back to the recipe, confirmed the type of flour, and carried on filling Goel’s basket.

Google DeepMind is also building an experimental version of Gemini 2.0, its latest large language model, that uses this step-by-step approach to problem solving, called Gemini 2.0 Flash Thinking.

But OpenAI and Google are just the tip of the iceberg. Many companies are building large language models that use similar techniques, making them better at a whole range of tasks, from cooking to coding. Expect a lot more buzz about reasoning (we know, we know) this year.

—Will Douglas Heaven

3. It’s boom time for AI in science 

One of the most exciting uses for AI is speeding up discovery in the natural sciences. Perhaps the greatest vindication of AI’s potential on this front came last October, when the Royal Swedish Academy of Sciences awarded the Nobel Prize for chemistry to Demis Hassabis and John M. Jumper from Google DeepMind for building the AlphaFold tool, which can solve protein folding, and to David Baker for building tools to help design new proteins.

Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. Proteins were the perfect target for AI, because the field had excellent existing data sets that AI models could be trained on. 

The hunt is on to find the next big thing. One potential area is materials science. Meta has released massive data sets and models that could help scientists use AI to discover new materials much faster, and in December, Hugging Face, together with the startup Entalpic, launched LeMaterial, an open-source project that aims to simplify and accelerate materials research. Their first project is a data set that unifies, cleans, and standardizes the most prominent material data sets. 

AI model makers are also keen to pitch their generative products as research tools for scientists. OpenAI let scientists test its latest o1 model and see how it might support them in research. The results were encouraging. 

Having an AI tool that can operate in a similar way to a scientist is one of the fantasies of the tech sector. In a manifesto published in October last year, Anthropic founder Dario Amodei highlighted science, especially biology, as one of the key areas where powerful AI could help. Amodei speculates that in the future, AI could be not only a method of data analysis but a “virtual biologist who performs all the tasks biologists do.” We’re still a long way away from this scenario. But next year, we might see important steps toward it. 

—Melissa Heikkilä

4. AI companies get cozier with national security

There is a lot of money to be made by AI companies willing to lend their tools to border surveillance, intelligence gathering, and other national security tasks. 

The US military has launched a number of initiatives that show it’s eager to adopt AI, from the Replicator program—which, inspired by the war in Ukraine, promises to spend $1 billion on small drones—to the Artificial Intelligence Rapid Capabilities Cell, a unit bringing AI into everything from battlefield decision-making to logistics. European militaries are under pressure to up their tech investment, triggered by concerns that Donald Trump’s administration will cut spending to Ukraine. Rising tensions between Taiwan and China weigh heavily on the minds of military planners, too. 

In 2025, these trends will continue to be a boon for defense-tech companies like Palantir, Anduril, and others, which are now capitalizing on classified military data to train AI models. 

The defense industry’s deep pockets will tempt mainstream AI companies into the fold too. OpenAI in December announced it is partnering with Anduril on a program to take down drones, completing a year-long pivot away from its policy of not working with the military. It joins the ranks of Microsoft, Amazon, and Google, which have worked with the Pentagon for years. 

Other AI competitors, which are spending billions to train and develop new models, will face more pressure in 2025 to think seriously about revenue. It’s possible that they’ll find enough non-defense customers who will pay handsomely for AI agents that can handle complex tasks, or creative industries willing to spend on image and video generators. 

But they’ll also be increasingly tempted to throw their hats in the ring for lucrative Pentagon contracts. Expect to see companies wrestle with whether working on defense projects will be seen as a contradiction to their values. OpenAI’s rationale for changing its stance was that “democracies should continue to take the lead in AI development,” the company wrote, reasoning that lending its models to the military would advance that goal. In 2025, we’ll be watching others follow its lead. 

James O’Donnell

5. Nvidia sees legitimate competition

For much of the current AI boom, if you were a tech startup looking to try your hand at making an AI model, Jensen Huang was your man. As CEO of Nvidia, the world’s most valuable corporation, Huang helped the company become the undisputed leader of chips used both to train AI models and to ping a model when anyone uses it, called “inferencing.”

A number of forces could change that in 2025. For one, behemoth competitors like Amazon, Broadcom, AMD, and others have been investing heavily in new chips, and there are early indications that these could compete closely with Nvidia’s—particularly for inference, where Nvidia’s lead is less solid. 

A growing number of startups are also attacking Nvidia from a different angle. Rather than trying to marginally improve on Nvidia’s designs, startups like Groq are making riskier bets on entirely new chip architectures that, with enough time, promise to provide more efficient or effective training. In 2025 these experiments will still be in their early stages, but it’s possible that a standout competitor will change the assumption that top AI models rely exclusively on Nvidia chips.

Underpinning this competition, the geopolitical chip war will continue. That war thus far has relied on two strategies. On one hand, the West seeks to limit exports to China of top chips and the technologies to make them. On the other, efforts like the US CHIPS Act aim to boost domestic production of semiconductors.

Donald Trump may escalate those export controls and has promised massive tariffs on any goods imported from China. In 2025, such tariffs would put Taiwan—on which the US relies heavily because of the chip manufacturer TSMC—at the center of the trade wars. That’s because Taiwan has said it will help Chinese firms relocate to the island to help them avoid the proposed tariffs. That could draw further criticism from Trump, who has expressed frustration with US spending to defend Taiwan from China. 

It’s unclear how these forces will play out, but it will only further incentivize chipmakers to reduce reliance on Taiwan, which is the entire purpose of the CHIPS Act. As spending from the bill begins to circulate, next year could bring the first evidence of whether it’s materially boosting domestic chip production. 

James O’Donnell

How optimistic are you about AI’s future?

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The start of a new year, and maybe especially this one, feels like a good time for a gut check: How optimistic are you feeling about the future of technology? 

Our annual list of 10 Breakthrough Technologies, published on Friday, might help you decide. It’s the 24th time we’ve published such a list. But just like our earliest picks (2001’s list featured brain-computer interfaces and ways to track copyrighted content on the internet, by the way), this year’s technologies may come to help society, harm it, or both.

Artificial intelligence powers four of the breakthroughs featured on the list, and I expect your optimism about them will vary widely. Take generative AI search. Now becoming the norm on Google with its AI Overviews, it promises to help sort through the internet’s incomprehensible volume of information to offer better answers for the questions we ask. Along the way, it is upending the model of how content creators get paid, and positioning fallible AI as the arbiter of truth and facts. Read more here

Also making the list is the immense progress in the world of robots, which can now learn faster thanks to AI. This means we will soon have to wrestle with whether we will trust humanoid robots enough to welcome them into our most private spaces, and how we will feel if they are remotely controlled by human beings working abroad. 

The list also features lots of technologies outside the world of AI, which I implore you to read about if only for a reminder of just how much other scientific progress is being made. This year may see advances in studying dark matter with the largest digital camera ever made for astronomy, reducing emissions from cow burps, and preventing HIV with an injection just once every six months. We also detail how technologies that you’ve long heard about—from robotaxis to stem cells—are finally making good on some of their promises.

This year, the cultural gulf between techno-optimists and, well, everyone else is set to widen. The incoming administration will be perhaps the one most shaped by Silicon Valley in recent memory, thanks to Donald Trump’s support from venture capitalists like Marc Andreessen (the author of the Techno-Optimist Manifesto) and his relationship, however recently fraught, with Elon Musk. Those figures have critiqued the Biden administration’s approach to technology as slow, “woke,” and overly cautious—attitudes they have vowed to reverse. 

So as we begin a year of immense change, here’s a small experiment I’d encourage you to do. Think about your level of optimism for technology and what’s driving it. Read our list of breakthroughs. Then see how you’ve shifted. I suspect that, like many people, you’ll find you don’t fit neatly in the camp of either optimists or pessimists. Perhaps that’s where the best progress will be made. 


Now read the rest of The Algorithm

Deeper Learning

The biggest AI flops of 2024

Though AI has remained in the spotlight this year (and even contributed to Nobel Prize–winning research in chemistry), it has not been without its failures. Take a look back over the year’s top AI failures, from chatbots dishing out illegal advice to dodgy AI-generated search results. 

Why it matters: These failures show that there are tons of unanswered questions about the technology, including who will moderate what it produces and how, whether we’re getting too trusting of the answers that chatbots produce, and what we’ll do with the mountain of “AI slop” that is increasingly taking over the internet. Above all, they illustrate the many pitfalls of blindly shoving AI into every product we interact with.

Bits and Bytes

What it’s like being a pedestrian in the world of Waymos 

Tech columnist Geoffrey Fowler finds that Waymo robotaxis regularly fail to stop for him at a crosswalk he uses every day. Though you can sometimes make eye contact with human drivers to gauge whether they’ll stop, Waymos lack that “social intelligence,” Fowler writes. (The Washington Post)

The AI Hype Index

For each print issue, MIT Technology Review publishes an AI Hype Index, a highly subjective take on the latest buzz about AI. See where facial recognition, AI replicas of your personality, and more fall on the index. (MIT Technology Review)

What’s going on at the intersection of AI and spirituality

Modern religious leaders are experimenting with A. just as earlier generations examined radio, television, and the internet. They include Rabbi Josh Fixler, who created “Rabbi Bot,” a chatbot trained on his old sermons. (The New York Times)

Meta has appointed its most prominent Republican to lead its global policy team

Just two weeks ahead of Donald Trump’s inauguration, Meta has announced it will appoint Joel Kaplan, who was White House deputy chief of staff under George W. Bush, to the company’s top policy role. Kaplan will replace Nick Clegg, who has led changes on content and elections policies. (Semafor)

Apple has settled a privacy lawsuit against Siri

The company has agreed to pay $95 million to settle a class action lawsuit alleging that Siri could be activated accidentally and then record private conversations without consent. The news comes after MIT Technology Review reported that Apple was looking into whether it could get rid of the need to use a trigger phrase like “Hey Siri” entirely. (The Washington Post)

AI means the end of internet search as we’ve known it

We all know what it means, colloquially, to google something. You pop a few relevant words in a search box and in return get a list of blue links to the most relevant results. Maybe some quick explanations up top. Maybe some maps or sports scores or a video. But fundamentally, it’s just fetching information that’s already out there on the internet and showing it to you, in some sort of structured way. 

But all that is up for grabs. We are at a new inflection point.

The biggest change to the way search engines have delivered information to us since the 1990s is happening right now. No more keyword searching. No more sorting through links to click. Instead, we’re entering an era of conversational search. Which means instead of keywords, you use real questions, expressed in natural language. And instead of links, you’ll increasingly be met with answers, written by generative AI and based on live information from all across the internet, delivered the same way. 

Of course, Google—the company that has defined search for the past 25 years—is trying to be out front on this. In May of 2023, it began testing AI-generated responses to search queries, using its large language model (LLM) to deliver the kinds of answers you might expect from an expert source or trusted friend. It calls these AI Overviews. Google CEO Sundar Pichai described this to MIT Technology Review as “one of the most positive changes we’ve done to search in a long, long time.”

AI Overviews fundamentally change the kinds of queries Google can address. You can now ask it things like “I’m going to Japan for one week next month. I’ll be staying in Tokyo but would like to take some day trips. Are there any festivals happening nearby? How will the surfing be in Kamakura? Are there any good bands playing?” And you’ll get an answer—not just a link to Reddit, but a built-out answer with current results. 

More to the point, you can attempt searches that were once pretty much impossible, and get the right answer. You don’t have to be able to articulate what, precisely, you are looking for. You can describe what the bird in your yard looks like, or what the issue seems to be with your refrigerator, or that weird noise your car is making, and get an almost human explanation put together from sources previously siloed across the internet. It’s amazing, and once you start searching that way, it’s addictive.

And it’s not just Google. OpenAI’s ChatGPT now has access to the web, making it far better at finding up-to-date answers to your queries. Microsoft released generative search results for Bing in September. Meta has its own version. The startup Perplexity was doing the same, but with a “move fast, break things” ethos. Literal trillions of dollars are at stake in the outcome as these players jockey to become the next go-to source for information retrieval—the next Google.

Not everyone is excited for the change. Publishers are completely freaked out. The shift has heightened fears of a “zero-click” future, where search referral traffic—a mainstay of the web since before Google existed—vanishes from the scene. 

I got a vision of that future last June, when I got a push alert from the Perplexity app on my phone. Perplexity is a startup trying to reinvent web search. But in addition to delivering deep answers to queries, it will create entire articles about the news of the day, cobbled together by AI from different sources. 

On that day, it pushed me a story about a new drone company from Eric Schmidt. I recognized the story. Forbes had reported it exclusively, earlier in the week, but it had been locked behind a paywall. The image on Perplexity’s story looked identical to one from Forbes. The language and structure were quite similar. It was effectively the same story, but freely available to anyone on the internet. I texted a friend who had edited the original story to ask if Forbes had a deal with the startup to republish its content. But there was no deal. He was shocked and furious and, well, perplexed. He wasn’t alone. Forbes, the New York Times, and Condé Nast have now all sent the company cease-and-desist orders. News Corp is suing for damages. 

People are worried about what these new LLM-powered results will mean for our fundamental shared reality. It could spell the end of the canonical answer.

It was precisely the nightmare scenario publishers have been so afraid of: The AI was hoovering up their premium content, repackaging it, and promoting it to its audience in a way that didn’t really leave any reason to click through to the original. In fact, on Perplexity’s About page, the first reason it lists to choose the search engine is “Skip the links.”

But this isn’t just about publishers (or my own self-interest). 

People are also worried about what these new LLM-powered results will mean for our fundamental shared reality. Language models have a tendency to make stuff up—they can hallucinate nonsense. Moreover, generative AI can serve up an entirely new answer to the same question every time, or provide different answers to different people on the basis of what it knows about them. It could spell the end of the canonical answer.

But make no mistake: This is the future of search. Try it for a bit yourself, and you’ll see. 

Sure, we will always want to use search engines to navigate the web and to discover new and interesting sources of information. But the links out are taking a back seat. The way AI can put together a well-reasoned answer to just about any kind of question, drawing on real-time data from across the web, just offers a better experience. That is especially true compared with what web search has become in recent years. If it’s not exactly broken (data shows more people are searching with Google more often than ever before), it’s at the very least increasingly cluttered and daunting to navigate. 

Who wants to have to speak the language of search engines to find what you need? Who wants to navigate links when you can have straight answers? And maybe: Who wants to have to learn when you can just know? 


In the beginning there was Archie. It was the first real internet search engine, and it crawled files previously hidden in the darkness of remote servers. It didn’t tell you what was in those files—just their names. It didn’t preview images; it didn’t have a hierarchy of results, or even much of an interface. But it was a start. And it was pretty good. 

Then Tim Berners-Lee created the World Wide Web, and all manner of web pages sprang forth. The Mosaic home page and the Internet Movie Database and Geocities and the Hampster Dance and web rings and Salon and eBay and CNN and federal government sites and some guy’s home page in Turkey.

Until finally, there was too much web to even know where to start. We really needed a better way to navigate our way around, to actually find the things we needed. 

And so in 1994 Jerry Yang created Yahoo, a hierarchical directory of websites. It quickly became the home page for millions of people. And it was … well, it was okay. TBH, and with the benefit of hindsight, I think we all thought it was much better back then than it actually was.

But the web continued to grow and sprawl and expand, every day bringing more information online. Rather than just a list of sites by category, we needed something that actually looked at all that content and indexed it. By the late ’90s that meant choosing from a variety of search engines: AltaVista and AlltheWeb and WebCrawler and HotBot. And they were good—a huge improvement. At least at first.  

But alongside the rise of search engines came the first attempts to exploit their ability to deliver traffic. Precious, valuable traffic, which web publishers rely on to sell ads and retailers use to get eyeballs on their goods. Sometimes this meant stuffing pages with keywords or nonsense text designed purely to push pages higher up in search results. It got pretty bad. 

And then came Google. It’s hard to overstate how revolutionary Google was when it launched in 1998. Rather than just scanning the content, it also looked at the sources linking to a website, which helped evaluate its relevance. To oversimplify: The more something was cited elsewhere, the more reliable Google considered it, and the higher it would appear in results. This breakthrough made Google radically better at retrieving relevant results than anything that had come before. It was amazing

Sundar Pichai
Google CEO Sundar Pichai describes AI Overviews as “one of the most positive changes we’ve done to search in a long, long time.”
JENS GYARMATY/LAIF/REDUX

For 25 years, Google dominated search. Google was search, for most people. (The extent of that domination is currently the subject of multiple legal probes in the United States and the European Union.)  

But Google has long been moving away from simply serving up a series of blue links, notes Pandu Nayak, Google’s chief scientist for search. 

“It’s not just so-called web results, but there are images and videos, and special things for news. There have been direct answers, dictionary answers, sports, answers that come with Knowledge Graph, things like featured snippets,” he says, rattling off a litany of Google’s steps over the years to answer questions more directly. 

It’s true: Google has evolved over time, becoming more and more of an answer portal. It has added tools that allow people to just get an answer—the live score to a game, the hours a café is open, or a snippet from the FDA’s website—rather than being pointed to a website where the answer may be. 

But once you’ve used AI Overviews a bit, you realize they are different

Take featured snippets, the passages Google sometimes chooses to highlight and show atop the results themselves. Those words are quoted directly from an original source. The same is true of knowledge panels, which are generated from information stored in a range of public databases and Google’s Knowledge Graph, its database of trillions of facts about the world.

While these can be inaccurate, the information source is knowable (and fixable). It’s in a database. You can look it up. Not anymore: AI Overviews can be entirely new every time, generated on the fly by a language model’s predictive text combined with an index of the web. 

“I think it’s an exciting moment where we have obviously indexed the world. We built deep understanding on top of it with Knowledge Graph. We’ve been using LLMs and generative AI to improve our understanding of all that,” Pichai told MIT Technology Review. “But now we are able to generate and compose with that.”

The result feels less like a querying a database than like asking a very smart, well-read friend. (With the caveat that the friend will sometimes make things up if she does not know the answer.) 

“[The company’s] mission is organizing the world’s information,” Liz Reid, Google’s head of search, tells me from its headquarters in Mountain View, California. “But actually, for a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you.” 

That second concept—accessibility—is what Google is really keying in on with AI Overviews. It’s a sentiment I hear echoed repeatedly while talking to Google execs: They can address more complicated types of queries more efficiently by bringing in a language model to help supply the answers. And they can do it in natural language. 

That will become even more important for a future where search goes beyond text queries. For example, Google Lens, which lets people take a picture or upload an image to find out more about something, uses AI-generated answers to tell you what you may be looking at. Google has even showed off the ability to query live video. 

When it doesn’t have an answer, an AI model can confidently spew back a response anyway. For Google, this could be a real problem. For the rest of us, it could actually be dangerous.

“We are definitely at the start of a journey where people are going to be able to ask, and get answered, much more complex questions than where we’ve been in the past decade,” says Pichai. 

There are some real hazards here. First and foremost: Large language models will lie to you. They hallucinate. They get shit wrong. When it doesn’t have an answer, an AI model can blithely and confidently spew back a response anyway. For Google, which has built its reputation over the past 20 years on reliability, this could be a real problem. For the rest of us, it could actually be dangerous.

In May 2024, AI Overviews were rolled out to everyone in the US. Things didn’t go well. Google, long the world’s reference desk, told people to eat rocks and to put glue on their pizza. These answers were mostly in response to what the company calls adversarial queries—those designed to trip it up. But still. It didn’t look good. The company quickly went to work fixing the problems—for example, by deprecating so-called user-generated content from sites like Reddit, where some of the weirder answers had come from.

Yet while its errors telling people to eat rocks got all the attention, the more pernicious danger might arise when it gets something less obviously wrong. For example, in doing research for this article, I asked Google when MIT Technology Review went online. It helpfully responded that “MIT Technology Review launched its online presence in late 2022.” This was clearly wrong to me, but for someone completely unfamiliar with the publication, would the error leap out? 

I came across several examples like this, both in Google and in OpenAI’s ChatGPT search. Stuff that’s just far enough off the mark not to be immediately seen as wrong. Google is banking that it can continue to improve these results over time by relying on what it knows about quality sources.

“When we produce AI Overviews,” says Nayak, “we look for corroborating information from the search results, and the search results themselves are designed to be from these reliable sources whenever possible. These are some of the mechanisms we have in place that assure that if you just consume the AI Overview, and you don’t want to look further … we hope that you will still get a reliable, trustworthy answer.”

In the case above, the 2022 answer seemingly came from a reliable source—a story about MIT Technology Review’s email newsletters, which launched in 2022. But the machine fundamentally misunderstood. This is one of the reasons Google uses human beings—raters—to evaluate the results it delivers for accuracy. Ratings don’t correct or control individual AI Overviews; rather, they help train the model to build better answers. But human raters can be fallible. Google is working on that too. 

“Raters who look at your experiments may not notice the hallucination because it feels sort of natural,” says Nayak. “And so you have to really work at the evaluation setup to make sure that when there is a hallucination, someone’s able to point out and say, That’s a problem.”

The new search

Google has rolled out its AI Overviews to upwards of a billion people in more than 100 countries, but it is facing upstarts with new ideas about how search should work.


Search Engine

Google
The search giant has added AI Overviews to search results. These overviews take information from around the web and Google’s Knowledge Graph and use the company’s Gemini language model to create answers to search queries.

What it’s good at

Google’s AI Overviews are great at giving an easily digestible summary in response to even the most complex queries, with sourcing boxes adjacent to the answers. Among the major options, its deep web index feels the most “internety.” But web publishers fear its summaries will give people little reason to click through to the source material.


Perplexity
Perplexity is a conversational search engine that uses third-party large
language models from OpenAI and Anthropic to answer queries.

Perplexity is fantastic at putting together deeper dives in response to user queries, producing answers that are like mini white papers on complex topics. It’s also excellent at summing up current events. But it has gotten a bad rep with publishers, who say it plays fast and loose with their content.


ChatGPT
While Google brought AI to search, OpenAI brought search to ChatGPT. Queries that the model determines will benefit from a web search automatically trigger one, or users can manually select the option to add a web search.

Thanks to its ability to preserve context across a conversation, ChatGPT works well for performing searches that benefit from follow-up questions—like planning a vacation through multiple search sessions. OpenAI says users sometimes go “20 turns deep” in researching queries. Of these three, it makes links out to publishers least prominent.


When I talked to Pichai about this, he expressed optimism about the company’s ability to maintain accuracy even with the LLM generating responses. That’s because AI Overviews is based on Google’s flagship large language model, Gemini, but also draws from Knowledge Graph and what it considers reputable sources around the web. 

“You’re always dealing in percentages. What we have done is deliver it at, like, what I would call a few nines of trust and factuality and quality. I’d say 99-point-few-nines. I think that’s the bar we operate at, and it is true with AI Overviews too,” he says. “And so the question is, are we able to do this again at scale? And I think we are.”

There’s another hazard as well, though, which is that people ask Google all sorts of weird things. If you want to know someone’s darkest secrets, look at their search history. Sometimes the things people ask Google about are extremely dark. Sometimes they are illegal. Google doesn’t just have to be able to deploy its AI Overviews when an answer can be helpful; it has to be extremely careful not to deploy them when an answer may be harmful. 

“If you go and say ‘How do I build a bomb?’ it’s fine that there are web results. It’s the open web. You can access anything,” Reid says. “But we do not need to have an AI Overview that tells you how to build a bomb, right? We just don’t think that’s worth it.” 

But perhaps the greatest hazard—or biggest unknown—is for anyone downstream of a Google search. Take publishers, who for decades now have relied on search queries to send people their way. What reason will people have to click through to the original source, if all the information they seek is right there in the search result?  

Rand Fishkin, cofounder of the market research firm SparkToro, publishes research on so-called zero-click searches. As Google has moved increasingly into the answer business, the proportion of searches that end without a click has gone up and up. His sense is that AI Overviews are going to explode this trend.  

“If you are reliant on Google for traffic, and that traffic is what drove your business forward, you are in long- and short-term trouble,” he says. 

Don’t panic, is Pichai’s message. He argues that even in the age of AI Overviews, people will still want to click through and go deeper for many types of searches. “The underlying principle is people are coming looking for information. They’re not looking for Google always to just answer,” he says. “Sometimes yes, but the vast majority of the times, you’re looking at it as a jumping-off point.” 

Reid, meanwhile, argues that because AI Overviews allow people to ask more complicated questions and drill down further into what they want, they could even be helpful to some types of publishers and small businesses, especially those operating in the niches: “You essentially reach new audiences, because people can now express what they want more specifically, and so somebody who specializes doesn’t have to rank for the generic query.”


 “I’m going to start with something risky,” Nick Turley tells me from the confines of a Zoom window. Turley is the head of product for ChatGPT, and he’s showing off OpenAI’s new web search tool a few weeks before it launches. “I should normally try this beforehand, but I’m just gonna search for you,” he says. “This is always a high-risk demo to do, because people tend to be particular about what is said about them on the internet.” 

He types my name into a search field, and the prototype search engine spits back a few sentences, almost like a speaker bio. It correctly identifies me and my current role. It even highlights a particular story I wrote years ago that was probably my best known. In short, it’s the right answer. Phew? 

A few weeks after our call, OpenAI incorporated search into ChatGPT, supplementing answers from its language model with information from across the web. If the model thinks a response would benefit from up-to-date information, it will automatically run a web search (OpenAI won’t say who its search partners are) and incorporate those responses into its answer, with links out if you want to learn more. You can also opt to manually force it to search the web if it does not do so on its own. OpenAI won’t reveal how many people are using its web search, but it says some 250 million people use ChatGPT weekly, all of whom are potentially exposed to it.  

“There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be a better super-assistant for you.”

Kevin Weil, chief product officer, OpenAI

According to Fishkin, these newer forms of AI-assisted search aren’t yet challenging Google’s search dominance. “It does not appear to be cannibalizing classic forms of web search,” he says. 

OpenAI insists it’s not really trying to compete on search—although frankly this seems to me like a bit of expectation setting. Rather, it says, web search is mostly a means to get more current information than the data in its training models, which tend to have specific cutoff dates that are often months, or even a year or more, in the past. As a result, while ChatGPT may be great at explaining how a West Coast offense works, it has long been useless at telling you what the latest 49ers score is. No more. 

“I come at it from the perspective of ‘How can we make ChatGPT able to answer every question that you have? How can we make it more useful to you on a daily basis?’ And that’s where search comes in for us,” Kevin Weil, the chief product officer with OpenAI, tells me. “There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be able to be a better super-assistant for you.”

Today ChatGPT is able to generate responses for very current news events, as well as near-real-time information on things like stock prices. And while ChatGPT’s interface has long been, well, boring, search results bring in all sorts of multimedia—images, graphs, even video. It’s a very different experience. 

Weil also argues that ChatGPT has more freedom to innovate and go its own way than competitors like Google—even more than its partner Microsoft does with Bing. Both of those are ad-dependent businesses. OpenAI is not. (At least not yet.) It earns revenue from the developers, businesses, and individuals who use it directly. It’s mostly setting large amounts of money on fire right now—it’s projected to lose $14 billion in 2026, by some reports. But one thing it doesn’t have to worry about is putting ads in its search results as Google does. 

Elizabeth Reid
“For a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you,” says Google head of search, Liz Reid.
WINNI WINTERMEYER/REDUX

Like Google, ChatGPT is pulling in information from web publishers, summarizing it, and including it in its answers. But it has also struck financial deals with publishers, a payment for providing the information that gets rolled into its results. (MIT Technology Review has been in discussions with OpenAI, Google, Perplexity, and others about publisher deals but has not entered into any agreements. Editorial was neither party to nor informed about the content of those discussions.)

But the thing is, for web search to accomplish what OpenAI wants—to be more current than the language model—it also has to bring in information from all sorts of publishers and sources that it doesn’t have deals with. OpenAI’s head of media partnerships, Varun Shetty, told MIT Technology Review that it won’t give preferential treatment to its publishing partners.

Instead, OpenAI told me, the model itself finds the most trustworthy and useful source for any given question. And that can get weird too. In that very first example it showed me—when Turley ran that name search—it described a story I wrote years ago for Wired about being hacked. That story remains one of the most widely read I’ve ever written. But ChatGPT didn’t link to it. It linked to a short rewrite from The Verge. Admittedly, this was on a prototype version of search, which was, as Turley said, “risky.” 

When I asked him about it, he couldn’t really explain why the model chose the sources that it did, because the model itself makes that evaluation. The company helps steer it by identifying—sometimes with the help of users—what it considers better answers, but the model actually selects them. 

“And in many cases, it gets it wrong, which is why we have work to do,” said Turley. “Having a model in the loop is a very, very different mechanism than how a search engine worked in the past.”

Indeed! 

The model, whether it’s OpenAI’s GPT-4o or Google’s Gemini or Anthropic’s Claude, can be very, very good at explaining things. But the rationale behind its explanations, its reasons for selecting a particular source, and even the language it may use in an answer are all pretty mysterious. Sure, a model can explain very many things, but not when that comes to its own answers. 


It was almost a decade ago, in 2016, when Pichai wrote that Google was moving from “mobile first” to “AI first”: “But in the next 10 years, we will shift to a world that is AI-first, a world where computing becomes universally available—be it at home, at work, in the car, or on the go—and interacting with all of these surfaces becomes much more natural and intuitive, and above all, more intelligent.” 

We’re there now—sort of. And it’s a weird place to be. It’s going to get weirder. That’s especially true as these things we now think of as distinct—querying a search engine, prompting a model, looking for a photo we’ve taken, deciding what we want to read or watch or hear, asking for a photo we wish we’d taken, and didn’t, but would still like to see—begin to merge. 

The search results we see from generative AI are best understood as a waypoint rather than a destination. What’s most important may not be search in itself; rather, it’s that search has given AI model developers a path to incorporating real-time information into their inputs and outputs. And that opens up all sorts of possibilities.

“A ChatGPT that can understand and access the web won’t just be about summarizing results. It might be about doing things for you. And I think there’s a fairly exciting future there,” says OpenAI’s Weil. “You can imagine having the model book you a flight, or order DoorDash, or just accomplish general tasks for you in the future. It’s just once the model understands how to use the internet, the sky’s the limit.”

This is the agentic future we’ve been hearing about for some time now, and the more AI models make use of real-time data from the internet, the closer it gets. 

Let’s say you have a trip coming up in a few weeks. An agent that can get data from the internet in real time can book your flights and hotel rooms, make dinner reservations, and more, based on what it knows about you and your upcoming travel—all without your having to guide it. Another agent could, say, monitor the sewage output of your home for certain diseases, and order tests and treatments in response. You won’t have to search for that weird noise your car is making, because the agent in your vehicle will already have done it and made an appointment to get the issue fixed. 

“It’s not always going to be just doing search and giving answers,” says Pichai. “Sometimes it’s going to be actions. Sometimes you’ll be interacting within the real world. So there is a notion of universal assistance through it all.”

And the ways these things will be able to deliver answers is evolving rapidly now too. For example, today Google can not only search text, images, and even video; it can create them. Imagine overlaying that ability with search across an array of formats and devices. “Show me what a Townsend’s warbler looks like in the tree in front of me.” Or “Use my existing family photos and videos to create a movie trailer of our upcoming vacation to Puerto Rico next year, making sure we visit all the best restaurants and top landmarks.”

“We have primarily done it on the input side,” he says, referring to the ways Google can now search for an image or within a video. “But you can imagine it on the output side too.”

This is the kind of future Pichai says he is excited to bring online. Google has already showed off a bit of what that might look like with NotebookLM, a tool that lets you upload large amounts of text and have it converted into a chatty podcast. He imagines this type of functionality—the ability to take one type of input and convert it into a variety of outputs—transforming the way we interact with information. 

In a demonstration of a tool called Project Astra this summer at its developer conference, Google showed one version of this outcome, where cameras and microphones in phones and smart glasses understand the context all around you—online and off, audible and visual—and have the ability to recall and respond in a variety of ways. Astra can, for example, look at a crude drawing of a Formula One race car and not only identify it, but also explain its various parts and their uses. 

But you can imagine things going a bit further (and they will). Let’s say I want to see a video of how to fix something on my bike. The video doesn’t exist, but the information does. AI-assisted generative search could theoretically find that information somewhere online—in a user manual buried in a company’s website, for example—and create a video to show me exactly how to do what I want, just as it could explain that to me with words today.

These are the kinds of things that start to happen when you put the entire compendium of human knowledge—knowledge that’s previously been captured in silos of language and format; maps and business registrations and product SKUs; audio and video and databases of numbers and old books and images and, really, anything ever published, ever tracked, ever recorded; things happening right now, everywhere—and introduce a model into all that. A model that maybe can’t understand, precisely, but has the ability to put that information together, rearrange it, and spit it back in a variety of different hopefully helpful ways. Ways that a mere index could not.

That’s what we’re on the cusp of, and what we’re starting to see. And as Google rolls this out to a billion people, many of whom will be interacting with a conversational AI for the first time, what will that mean? What will we do differently? It’s all changing so quickly. Hang on, just hang on. 

Small language models: 10 Breakthrough Technologies 2025

WHO

Allen Institute for Artificial Intelligence, Anthropic, Google, Meta, Microsoft, OpenAI

WHEN

Now

Make no mistake: Size matters in the AI world. When OpenAI launched GPT-3 back in 2020, it was the largest language model ever built. The firm showed that supersizing this type of model was enough to send performance through the roof. That kicked off a technology boom that has been sustained by bigger models ever since. As Noam Brown, a research scientist at OpenAI, told an audience at TEDAI San Francisco in October, “The incredible progress in AI over the past five years can be summarized in one word: scale.”

But as the marginal gains for new high-end models trail off, researchers are figuring out how to do more with less. For certain tasks, smaller models that are trained on more focused data sets can now perform just as well as larger ones—if not better. That’s a boon for businesses eager to deploy AI in a handful of specific ways. You don’t need the entire internet in your model if you’re making the same kind of request again and again. 

Most big tech firms now boast fun-size versions of their flagship models for this purpose: OpenAI offers both GPT-4o and GPT-4o mini; Google DeepMind has Gemini Ultra and Gemini Nano; and Anthropic’s Claude 3 comes in three flavors: outsize Opus, midsize Sonnet, and tiny Haiku. Microsoft is pioneering a range of small language models called Phi.

A growing number of smaller companies offer small models as well. The AI startup Writer claims that its latest language model matches the performance of the largest top-tier models on many key metrics despite in some cases having just a 20th as many parameters (the values that get calculated during training and determine how a model behaves). 

Explore the full 2025 list of 10 Breakthrough Technologies.

Smaller models are more efficient, making them quicker to train and run. That’s good news for anyone wanting a more affordable on-ramp. And it could be good for the climate, too: Because smaller models work with a fraction of the computer oomph required by their giant cousins, they burn less energy. 

These small models also travel well: They can run right in our pockets, without needing to send requests to the cloud. Small is the next big thing.