Inside OpenAI’s big play for science 

In the three years since ChatGPT’s explosive debut, OpenAI’s technology has upended a remarkable range of everyday activities at home, at work, in schools—anywhere people have a browser open or a phone out, which is everywhere.

Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them.

The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community.

And yet OpenAI is also late to the party. Google DeepMind, the rival firm behind groundbreaking scientific models such as AlphaFold and AlphaEvolve, has had an AI-for-science team for years. (When I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that team, he told me: “This is the reason I started DeepMind … In fact, it’s why I’ve worked my whole career in AI.”)

So why now? How does a push into science fit with OpenAI’s wider mission? And what exactly is the firm hoping to achieve?

I put these questions to Kevin Weil, a vice president at OpenAI who leads the new OpenAI for Science team, in an exclusive interview last week.

On mission

Weil is a product guy. He joined OpenAI a couple of years ago as chief product officer after being head of product at Twitter and Instagram. But he started out as a scientist. He got two-thirds of the way through a PhD in particle physics at Stanford University before ditching academia for the Silicon Valley dream. Weil is keen to highlight his pedigree: “I thought I was going to be a physics professor for the rest of my life,” he says. “I still read math books on vacation.”

Asked how OpenAI for Science fits with the firm’s existing lineup of white-collar productivity tools or the viral video app Sora, Weil recites the company mantra: “The mission of OpenAI is to try and build artificial general intelligence and, you know, make it beneficial for all of humanity.”

Just imagine the future impact this technology could have on science he says: New medicines, new materials, new devices. “Think about it helping us understand the nature of reality, helping us think through open problems. Maybe the biggest, most positive impact we’re going to see from AGI will actually be from its ability to accelerate science.”

He adds: “With GPT-5, we saw that becoming possible.” 

As Weil tells it, LLMs are now good enough to be useful scientific collaborators. They can spitball ideas, suggest novel directions to explore, and find fruitful parallels between new problems and old solutions published in obscure journals decades ago or in foreign languages.

That wasn’t the case a year or so ago. Since it announced its first so-called reasoning model—a type of LLM that can break down problems into multiple steps and work through them one by one—in December 2024, OpenAI has been pushing the envelope of what the technology can do. Reasoning models have made LLMs far better at solving math and logic problems than they used to be. “You go back a few years and we were all collectively mind-blown that the models could get an 800 on the SAT,” says Weil.

But soon LLMs were acing math competitions and solving graduate-level physics problems. Last year, OpenAI and Google DeepMind both announced that their LLMs had achieved gold-medal-level performance in the International Math Olympiad, one of the toughest math contests in the world. “These models are no longer just better than 90% of grad students,” says Weil. “They’re really at the frontier of human abilities.”

That’s a huge claim, and it comes with caveats. Still, there’s no doubt that GPT-5, which includes a reasoning model, is a big improvement on GPT-4 when it comes to complicated problem-solving. Measured against an industry benchmark known as GPQA, which includes more than 400 multiple-choice questions that test PhD-level knowledge in biology, physics, and chemistry, GPT-4 scores 39%, well below the human-expert baseline of around 70%. According to OpenAI, GPT-5.2 (the latest update to the model, released in December) scores 92%. 

Overhyped

The excitement is evident—and perhaps excessive. In October, senior figures at OpenAI, including Weil, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians were quick to point out that in fact what GPT-5 appeared to have done was dig up existing solutions in old research papers, including at least one written in German. That was still useful, but it wasn’t the achievement OpenAI seemed to have claimed. Weil and his colleagues deleted their posts.

Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don’t spend time struggling on a problem that is already solved, that’s an acceleration all of its own.”

He plays down the idea that LLMs are about to come up with a game-changing new discovery. “I don’t think models are there yet,” he says. “Maybe they’ll get there. I’m optimistic that they will.”

But, he insists, that’s not the mission: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”

For Weil, the question is this: “Does science actually happen faster because scientists plus models can do much more, and do it more quickly, than scientists alone? I think we’re already seeing that.”

In November, OpenAI published a series of anecdotal case studies contributed by scientists, both inside and outside the company, that illustrated how they had used GPT-5 and how it had helped. “Most of the cases were scientists that were already using GPT-5 directly in their research and had come to us one way or another saying, ‘Look at what I’m able to do with these tools,’” says Weil.

The key things that GPT-5 seems to be good at are finding references and connections to existing work that scientists were not aware of, which sometimes sparks new ideas; helping scientists sketch mathematical proofs; and suggesting ways for scientists to test hypotheses in the lab.  

“GPT 5.2 has read substantially every paper written in the last 30 years,” says Weil. “And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields.”

“That’s incredibly powerful,” he continues. “You can always find a human collaborator in an adjacent field, but it’s difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter. And in addition to that, I can work with the model late at night—it doesn’t sleep—and I can ask it 10 things in parallel, which is kind of awkward to do to a human.”

Solving problems

Most of the scientists OpenAI reached out to back up Weil’s position.

Robert Scherrer, a professor of physics and astronomy at Vanderbilt University, only played around with ChatGPT for fun (“I used to it rewrite the theme song for Gilligan’s Island in the style of Beowulf, which it did very well,” he tells me) until his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, told him that GPT-5 had helped solve a problem he’d been working on.

Lupsasca gave Scherrer access to GPT-5 Pro, OpenAI’s $200-a-month premium subscription. “It managed to solve a problem that I and my graduate student could not solve despite working on it for several months,” says Scherrer.

It’s not perfect, he says: “GTP-5 still makes dumb mistakes. Of course, I do too, but the mistakes GPT-5 makes are even dumber.” And yet it keeps getting better, he says: “If current trends continue—and that’s a big if—I suspect that all scientists will be using LLMs soon.”

Derya Unutmaz, a professor of biology at the Jackson Laboratory, a nonprofit research institute, uses GPT-5 to brainstorm ideas, summarize papers, and plan experiments in his work studying the immune system. In the case study he shared with OpenAI, Unutmaz used GPT-5 to analyze an old data set that his team had previously looked at. The model came up with fresh insights and interpretations.  

“LLMs are already essential for scientists,” he says. “When you can complete analysis of data sets that used to take months, not using them is not an option anymore.”

Nikita Zhivotovskiy, a statistician at the University of California, Berkeley, says he has been using LLMs in his research since the first version of ChatGPT came out.

Like Scherrer, he finds LLMs most useful when they highlight unexpected connections between his own work and existing results he did not know about. “I believe that LLMs are becoming an essential technical tool for scientists, much like computers and the internet did before,” he says. “I expect a long-term disadvantage for those who do not use them.”

But he does not expect LLMs to make novel discoveries anytime soon. “I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own,” he says. “So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches.”

I also contacted a handful of scientists who are not connected to OpenAI.

Andy Cooper, a professor of chemistry at the University of Liverpool and director of the Leverhulme Research Centre for Functional Materials Design, is less enthusiastic. “We have not found, yet, that LLMs are fundamentally changing the way that science is done,” he says. “But our recent results suggest that they do have a place.”

Cooper is leading a project to develop a so-called AI scientist that can fully automate parts of the scientific workflow. He says that his team doesn’t use LLMs to come up with ideas. But the tech is starting to prove useful as part of a wider automated system where an LLM can help direct robots, for example.

“My guess is that LLMs might stick more in robotic workflows, at least initially, because I’m not sure that people are ready to be told what to do by an LLM,” says Cooper. “I’m certainly not.”

Making errors

LLMs may be becoming more and more useful, but caution is still key. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that had made its way into a scientific journal. “OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea—possibly the first peer-reviewed paper where an LLM generated the core contribution,” Oppenheim posted on X. “One small problem: GPT-5’s idea tests the wrong thing.”

He continued: “GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. Related-sounding, but different. It’s like asking for a COVID test, and the LLM cheerfully hands you a test for chickenpox.”

It is clear that a lot of scientists are finding innovative and intuitive ways to engage with LLMs. It is also clear that the technology makes mistakes that can be so subtle even experts miss them.

Part of the problem is the way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core issue is that LLMs are being trained to validate the user, while science needs tools that challenge us.” In an extreme case, one individual (who was not a scientist) was persuaded by ChatGPT into thinking for months that he’d invented a new branch of mathematics.

Of course, Weil is well aware of the problem of hallucination. But he insists that newer models are hallucinating less and less. Even so, focusing on hallucination might be missing the point, he says.

“One of my teammates here, an ex math professor, said something that stuck with me,” says Weil. “He said: ‘When I’m doing research, if I’m bouncing ideas off a colleague, I’m wrong 90% of the time and that’s kind of the point. We’re both spitballing ideas and trying to find something that works.’”

“That’s actually a desirable place to be,” says Weil. “If you say enough wrong things and then somebody stumbles on a grain of truth and then the other person seizes on it and says, ‘Oh, yeah, that’s not quite right, but what if we—’ You gradually kind of find your trail through the woods.”

This is Weil’s core vision for OpenAI for Science. GPT-5 is good, but it is not an oracle. The value of this technology is in pointing people in new directions, not coming up with definitive answers, he says.

In fact, one of the things OpenAI is now looking at is making GPT-5 dial down its confidence when it delivers a response. Instead of saying Here’s the answer, it might tell scientists: Here’s something to consider.

“That’s actually something that we are spending a bunch of time on,” says Weil. “Trying to make sure that the model has some sort of epistemological humility.”

Watching the watchers

Another thing OpenAI is looking at is how to use GPT-5 to fact-check GPT-5. It’s often the case that if you feed one of GPT-5’s answers back into the model, it will pick it apart and highlight mistakes.

“You can kind of hook the model up as its own critic,” says Weil. “Then you can get a workflow where the model is thinking and then it goes to another model, and if that model finds things that it could improve, then it passes it back to the original model and says, ‘Hey, wait a minute—this part wasn’t right, but this part was interesting. Keep it.’ It’s almost like a couple of agents working together and you only see the output once it passes the critic.”

What Weil is describing also sounds a lot like what Google DeepMind did with AlphaEvolve, a tool that wrapped the firms LLM, Gemini, inside a wider system that filtered out the good responses from the bad and fed them back in again to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.

OpenAI faces stiff competition from rival firms, whose own LLMs can do most, if not all, of the things it claims for its own models. If that’s the case, why should scientists use GPT-5 instead of Gemini or Anthropic’s Claude, families of models that are themselves improving every year? Ultimately, OpenAI for Science may be as much an effort to plant a flag in new territory as anything else. The real innovations are still to come. 

“I think 2026 will be for science what 2025 was for software engineering,” says Weil. “At the beginning of 2025, if you were using AI to write most of your code, you were an early adopter. Whereas 12 months later, if you’re not using AI to write most of your code, you’re probably falling behind. We’re now seeing those same early flashes for science as we did for code.”

He continues: “I think that in a year, if you’re a scientist and you’re not heavily using AI, you’ll be missing an opportunity to increase the quality and pace of your thinking.”

America’s coming war over AI regulation

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached a boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry. Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy, one that would position the US to win the global AI race. The move marked a qualified victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation.

In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead, buoyed by mounting public pressure to protect children from chatbots and rein in power-hungry data centers. Meanwhile, dueling super PACs bankrolled by tech moguls and AI-safety advocates will pour tens of millions into congressional and state elections to seat lawmakers who champion their competing visions for AI regulation. 

Trump’s executive order directs the Department of Justice to establish a task force that sues states whose AI laws clash with his vision for light-touch regulation. It also directs the Department of Commerce to starve states of federal broadband funding if their AI laws are “onerous.” In practice, the order may target a handful of laws in Democratic states, says James Grimmelmann, a law professor at Cornell Law School. “The executive order will be used to challenge a smaller number of provisions, mostly relating to transparency and bias in AI, which tend to be more liberal issues,” Grimmelmann says.

For now, many states aren’t flinching. On December 19, New York’s governor, Kathy Hochul, signed the Responsible AI Safety and Education (RAISE) Act, a landmark law requiring AI companies to publish the protocols used to ensure the safe development of their AI models and report critical safety incidents. On January 1, California debuted the nation’s first frontier AI safety law, SB 53—which the RAISE Act was modeled on—aimed at preventing catastrophic harms such as biological weapons or cyberattacks. While both laws were watered down from earlier iterations to survive bruising industry lobbying, they struck a rare, if fragile, compromise between tech giants and AI safety advocates.

If Trump targets these hard-won laws, Democratic states like California and New York will likely take the fight to court. Republican states like Florida with vocal champions for AI regulation might follow suit. Trump could face an uphill battle. “The Trump administration is stretching itself thin with some of its attempts to effectively preempt [legislation] via executive action,” says Margot Kaminski, a law professor at the University of Colorado Law School. “It’s on thin ice.”

But Republican states that are anxious to stay off Trump’s radar or can’t afford to lose federal broadband funding for their sprawling rural communities might retreat from passing or enforcing AI laws. Win or lose in court, the chaos and uncertainty could chill state lawmaking. Paradoxically, the Democratic states that Trump wants to rein in—armed with big budgets and emboldened by the optics of battling the administration—may be the least likely to budge.

In lieu of state laws, Trump promises to create a federal AI policy with Congress. But the gridlocked and polarized body won’t be delivering a bill this year. In July, the Senate killed a moratorium on state AI laws that had been inserted into a tax bill, and in November, the House scrapped an encore attempt in a defense bill. In fact, Trump’s bid to strong-arm Congress with an executive order may sour any appetite for a bipartisan deal. 

The executive order “has made it harder to pass responsible AI policy by hardening a lot of positions, making it a much more partisan issue,” says Brad Carson, a former Democratic congressman from Oklahoma who is building a network of super PACs backing candidates who support AI regulation. “It hardened Democrats and created incredible fault lines among Republicans,” he says. 

While AI accelerationists in Trump’s orbit—AI and crypto czar David Sacks among them—champion deregulation, populist MAGA firebrands like Steve Bannon warn of rogue superintelligence and mass unemployment. In response to Trump’s executive order, Republican state attorneys general signed a bipartisan letter urging the FCC not to supersede state AI laws.

With Americans increasingly anxious about how AI could harm mental health, jobs, and the environment, public demand for regulation is growing. If Congress stays paralyzed, states will be the only ones acting to keep the AI industry in check. In 2025, state legislators introduced more than 1,000 AI bills, and nearly 40 states enacted over 100 laws, according to the National Conference of State Legislatures.

Efforts to protect children from chatbots may inspire rare consensus. On January 7, Google and Character Technologies, a startup behind the companion chatbot Character.AI, settled several lawsuits with families of teenagers who killed themselves after interacting with the bot. Just a day later, the Kentucky attorney general sued Character Technologies, alleging that the chatbots drove children to suicide and other forms of self-harm. OpenAI and Meta face a barrage of similar suits. Expect more to pile up this year. Without AI laws on the books, it remains to be seen how product liability laws and free speech doctrines apply to these novel dangers. “It’s an open question what the courts will do,” says Grimmelmann. 

While litigation brews, states will move to pass child safety laws, which are exempt from Trump’s proposed ban on state AI laws. On January 9, OpenAI inked a deal with a former foe, the child-safety advocacy group Common Sense Media, to back a ballot initiative in California called the Parents & Kids Safe AI Act, setting guardrails around how chatbots interact with children. The measure proposes requiring AI companies to verify users’ age, offer parental controls, and undergo independent child-safety audits. If passed, it could be a blueprint for states across the country seeking to crack down on chatbots. 

Fueled by widespread backlash against data centers, states will also try to regulate the resources needed to run AI. That means bills requiring data centers to report on their power and water use and foot their own electricity bills. If AI starts to displace jobs at scale, labor groups might float AI bans in specific professions. A few states concerned about the catastrophic risks posed by AI may pass safety bills mirroring SB 53 and the RAISE Act. 

Meanwhile, tech titans will continue to use their deep pockets to crush AI regulations. Leading the Future, a super PAC backed by OpenAI president Greg Brockman and the venture capital firm Andreessen Horowitz, will try to elect candidates who endorse unfettered AI development to Congress and state legislatures. They’ll follow the crypto industry’s playbook for electing allies and writing the rules. To counter this, super PACs funded by Public First, an organization run by Carson and former Republican congressman Chris Stewart of Utah, will back candidates advocating for AI regulation. We might even see a handful of candidates running on anti-AI populist platforms.

In 2026, the slow, messy process of American democracy will grind on. And the rules written in state capitals could decide how the most disruptive technology of our generation develops far beyond America’s borders, for years to come.

Yann LeCun’s new venture is a contrarian bet against large language models  

Yann LeCun is a Turing Award recipient and a top AI researcher, but he has long been a contrarian figure in the tech world. He believes that the industry’s current obsession with large language models is wrong-headed and will ultimately fail to solve many pressing problems. 

Instead, he thinks we should be betting on world models—a different type of AI that accurately reflects the dynamics of the real world. He is also a staunch advocate for open-source AI and criticizes the closed approach of frontier labs like OpenAI and Anthropic. 

Perhaps it’s no surprise, then, that he recently left Meta, where he had served as chief scientist for FAIR (Fundamental AI Research), the company’s influential research lab that he founded. Meta has struggled to gain much traction with its open-source AI model Llama and has seen internal shake-ups, including the controversial acquisition of ScaleAI. 

LeCun sat down with MIT Technology Review in an exclusive online interview from his Paris apartment to discuss his new venture, life after Meta, the future of artificial intelligence, and why he thinks the industry is chasing the wrong ideas. 

Both the questions and answers below have been edited for clarity and brevity.

You’ve just announced a new company, Advanced Machine Intelligence (AMI).  Tell me about the big ideas behind it.

It is going to be a global company, but headquartered in Paris. You pronounce it “ami”—it means “friend” in French. I am excited. There is a very high concentration of talent in Europe, but it is not always given a proper environment to flourish. And there is certainly a huge demand from the industry and governments for a credible frontier AI company that is neither Chinese nor American. I think that is going to be to our advantage.

So an ambitious alternative to the US-China binary we currently have. What made you want to pursue that third path?

Well, there are sovereignty issues for a lot of countries, and they want some control over AI. What I’m advocating is that AI is going to become a platform, and most platforms tend to become open-source. Unfortunately, that’s not really the direction the American industry is taking. Right? As the competition increases, they feel like they have to be secretive. I think that is a strategic mistake.

It’s certainly true for OpenAI, which went from very open to very closed, and Anthropic has always been closed. Google was sort of a little open. And then Meta, we’ll see. My sense is that it’s not going in a positive direction at this moment.

Simultaneously, China has completely embraced this open approach. So all leading open-source AI platforms are Chinese, and the result is that academia and startups, outside of the US, have basically embraced Chinese models. There’s nothing wrong with that—you know, Chinese models are good. Chinese engineers and scientists are great. But you know, if there is a future in which all of our information diet is being mediated by AI assistance, and the choice is either English-speaking models produced by proprietary companies always close to the US or Chinese models which may be open-source but need to be fine-tuned so that they answer questions about Tiananmen Square in 1989—you know, it’s not a very pleasant and engaging future. 

They [the future models] should be able to be fine-tuned by anyone and produce a very high diversity of AI assistance, with different linguistic abilities and value systems and political biases and centers of interests. You need high diversity of assistance for the same reason that you need high diversity of press. 

That is certainly a compelling pitch. How are investors buying that idea so far?

They really like it. A lot of venture capitalists are very much in favor of this idea of open-source, because they know for a lot of small startups, they really rely on open-source models. They don’t have the means to train their own model, and it’s kind of dangerous for them strategically to embrace a proprietary model.

You recently left Meta. What’s your view on the company and Mark Zuckerberg’s leadership? There’s a perception that Meta has fumbled its AI advantage.

I think FAIR [LeCun’s lab at Meta] was extremely successful in the research part. Where Meta was less successful is in picking up on that research and pushing it into practical technology and products. Mark made some choices that he thought were the best for the company. I may not have agreed with all of them. For example, the robotics group at FAIR was let go, which I think was a strategic mistake. But I’m not the director of FAIR. People make decisions rationally, and there’s no reason to be upset.

So, no bad blood? Could Meta be a future client for AMI?

Meta might be our first client! We’ll see. The work we are doing is not in direct competition. Our focus on world models for the physical world is very different from their focus on generative AI and LLMs.

You were working on AI long before LLMs became a mainstream approach. But since ChatGPT broke out, LLMs have become almost synonymous with AI.

Yes, and we are going to change that. The public face of AI, perhaps, is mostly LLMs and chatbots of various types. But the latest ones of those are not pure LLMs. They are LLM plus a lot of things, like perception systems and code that solves particular problems. So we are going to see LLMs as kind of the orchestrator in systems, a little bit.

Beyond LLMs, there is a lot of AI that is behind the scenes that runs a big chunk of our society. There are assistance driving programs in a car, quick-turn MRI images, algorithms that drive social media—that’s all AI. 

You have been vocal in arguing that LLMs can only get us so far. Do you think LLMs are overhyped these days? Can you summarize to our readers why you believe that LLMs are not enough?

There is a sense in which they have not been overhyped, which is that they are extremely useful to a lot of people, particularly if you write text, do research, or write code. LLMs manipulate language really well. But people have had this illusion, or delusion, that it is a matter of time until we can scale them up to having human-level intelligence, and that is simply false.

The truly difficult part is understanding the real world. This is the Moravec Paradox (a phenomenon observed by the computer scientist Hans Moravec in 1988): What’s easy for us, like perception and navigation, is hard for computers, and vice versa. LLMs are limited to the discrete world of text. They can’t truly reason or plan, because they lack a model of the world. They can’t predict the consequences of their actions. This is why we don’t have a domestic robot that is as agile as a house cat, or a truly autonomous car.

We are going to have AI systems that have humanlike and human-level intelligence, but they’re  not going to be built on LLMs, and it’s not going to happen next year or two years from now. It’s going to take a while. There are major conceptual breakthroughs that have to happen before we have AI systems that have human-level intelligence. And that is what I’ve been working on. And this company, AMI Labs, is focusing on the next generation.

And your solution is world models and JEPA architecture (JEPA, or “joint embedding predictive architecture,” is a learning framework that trains AI models to understand the world, created by LeCun while he was at Meta). What’s the elevator pitch?

The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail.  JEPA is not generative AI. It is a system that learns to represent videos really well. The key is to learn an abstract representation of the world and make predictions in that abstract space, ignoring the details you can’t predict. That’s what JEPA does. It learns the underlying rules of the world from observation, like a baby learning about gravity. This is the foundation for common sense, and it’s the key to building truly intelligent systems that can reason and plan in the real world. The most exciting work so far on this is coming from academia, not the big industrial labs stuck in the LLM world.

The lack of non-text data has been a problem in taking AI systems further in understanding the physical world. JEPA is trained on videos. What other kinds of data will you be using?

Our systems will be trained on video, audio, and sensor data of all kinds—not just text. We are working with various modalities, from the position of a robot arm to lidar data to audio. I’m also involved in a project using JEPA to model complex physical and clinical phenomena. 

What are some of the concrete, real-world applications you envision for world models?

The applications are vast. Think about complex industrial processes where you have thousands of sensors, like in a jet engine, a steel mill, or a chemical factory. There is no technique right now to build a complete, holistic model of these systems. A world model could learn this from the sensor data and predict how the system will behave. Or think of smart glasses that can watch what you’re doing, identify your actions, and then predict what you’re going to do next to assist you. This is what will finally make agentic systems reliable. An agentic system that is supposed to take actions in the world cannot work reliably unless it has a world model to predict the consequences of its actions. Without it, the system will inevitably make mistakes. This is the key to unlocking everything from truly useful domestic robots to Level 5 autonomous driving.

Humanoid robots are all the rage recently, especially ones built by companies from China. What’s your take?

There are all these brute-force ways to get around the limitations of learning systems, which require inordinate amounts of training data to do anything. So the secret of all the companies getting robots to do kung fu or dance is they are all planned in advance. But frankly, nobody—absolutely nobody—knows how to make those robots smart enough to be useful. Take my word for it. 


You need an enormous amount of tele-operation training data for every single task, and when the environment changes a little bit, it doesn’t generalize very well. What this tells us is we are missing something very big. The reason why a 17-year-old can learn to drive in 20 hours is because they already know a lot about how the world behaves. If we want a generally useful domestic robot, we need systems to have a kind of good understanding of the physical world. That’s not going to happen until we have good world models and planning.

There’s a growing sentiment that it’s becoming harder to do foundational AI research in academia because of the massive computing resources required. Do you think the most important innovations will now come from industry?

No. LLMs are now technology development, not research. It’s true that it’s very difficult for academics to play an important role there because of the requirements for computation, data access, and engineering support. But it’s a product now. It’s not something academia should even be interested in. It’s like speech recognition in the early 2010s—it was a solved problem, and the progress was in the hands of industry. 

What academia should be working on is long-term objectives that go beyond the capabilities of current systems. That’s why I tell people in universities: Don’t work on LLMs. There is no point. You’re not going to be able to rival what’s going on in industry. Work on something else. Invent new techniques. The breakthroughs are not going to come from scaling up LLMs. The most exciting work on world models is coming from academia, not the big industrial labs. The whole idea of using attention circuits in neural nets came out of the University of Montreal. That research paper started the whole revolution. Now that the big companies are closing up, the breakthroughs are going to slow down. Academia needs access to computing resources, but they should be focused on the next big thing, not on refining the last one.

You wear many hats: professor, researcher, educator, public thinker … Now you just took on a new one. What is that going to look like for you?

I am going to be the executive chairman of the company, and Alex LeBrun [a former colleague from Meta AI] will be the CEO. It’s going to be LeCun and LeBrun—it’s nice if you pronounce it the French way.

I am going to keep my position at NYU. I teach one class per year, I have PhD students and postdocs, so I am going to be kept based in New York. But I go to Paris pretty often because of my lab. 

Does that mean that you won’t be very hands-on?

Well, there’s two ways to be hands-on. One is to manage people day to day, and another is to actually get your hands dirty in research projects, right? 

I can do management, but I don’t like doing it. This is not my mission in life. It’s really to make science and technology progress as far as we can, inspire other people to work on things that are interesting, and then contribute to those things. So that has been my role at Meta for the last seven years. I founded FAIR and led it for four to five years. I kind of hated being a director. I am not good at this career management thing. I’m much more visionary and a scientist.

What makes Alex LeBrun the right fit?

Alex is a serial entrepreneur; he’s built three successful AI companies. The first he sold to Microsoft; the second to Facebook, where he was head of the engineering division of FAIR in Paris. He then left to create Nabla, a very successful company in the health-care space. When I offered him the chance to join me in this effort, he accepted almost immediately. He has the experience to build the company, allowing me to focus on science and technology. 

You’re headquartered in Paris. Where else do you plan to have offices?

We are a global company. There’s going to be an office in North America.

New York, hopefully?

New York is great. That’s where I am, right? And it’s not Silicon Valley. Silicon Valley is a bit of a monoculture.

What about Asia? I’m guessing Singapore, too?

Probably, yeah. I’ll let you guess. 

And how are you attracting talent?

We don’t have any issue recruiting. There are a lot of people in the AI research community who think the future of AI is in world models. Those people, regardless of pay package, will be motivated to come work for us because they believe in the technological future we are building. We’ve already recruited people from places like OpenAI, Google DeepMind, and xAI.

I heard that Saining Xie, a prominent researcher from NYU and Google DeepMind, might be joining you as chief scientist. Any comments?

Saining is a brilliant researcher. I have a lot of admiration for him. I hired him twice already. I hired him at FAIR, and I convinced my colleagues at NYU that we should hire him there. Let’s just say I have a lot of respect for him.

When will you be ready to share more details about AMI Labs, like financial backing or other core members?

Soon—in February, maybe. I’ll let you know.

“Dr. Google” had its issues. Can ChatGPT Health do better?

<div data-chronoton-summary="

OpenAI’s health play The AI giant launched ChatGPT Health amid reports that 230 million people already ask ChatGPT health-related questions weekly. The new feature isn’t a separate model but rather a wrapper that can access medical records and fitness data when permitted.

  • Better than Dr. Google? Early research suggests LLMs might outperform traditional web searches for medical information. One study found GPT-4o, an earlier model, answered realistic health questions correctly about 85% of the time, potentially reducing misinformation compared to unfiltered internet searches.
  • Hallucination concerns persist Earlier versions of GPT have been shown to fabricate definitions for fake medical conditions and accept incorrect information in users’ prompts. This sycophantic tendency could be particularly dangerous when users seek to confirm biases against legitimate medical advice.
  • Trust vs. expertise The articulate, confident communication style of ChatGPT might lead users to trust it over qualified medical professionals. While OpenAI emphasizes the tool is meant to supplement rather than replace doctors, researchers worry some patients will rely too heavily on AI guidance.
  • ” data-chronoton-post-id=”1131692″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

    For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice was so common that it gained the pejorative moniker “Dr. Google.” But times are changing, and many medical-information seekers are now using LLMs. According to OpenAI, 230 million people ask ChatGPT health-related queries each week. 

    That’s the context around the launch of OpenAI’s new ChatGPT Health product, which debuted earlier this month. It landed at an inauspicious time: Two days earlier, the news website SFGate had broken the story of Sam Nelson, a teenager who died of an overdose last year after extensive conversations with ChatGPT about how best to combine various drugs. In the wake of both pieces of news, multiple journalists questioned the wisdom of relying for medical advice on a tool that could cause such extreme harm.

    Though ChatGPT Health lives in a separate sidebar tab from the rest of ChatGPT, it isn’t a new model. It’s more like a wrapper that provides one of OpenAI’s preexisting models with guidance and tools it can use to provide health advice—including some that allow it to access a user’s electronic medical records and fitness app data, if granted permission. There’s no doubt that ChatGPT and other large language models can make medical mistakes, and OpenAI emphasizes that ChatGPT Health is intended as an additional support, rather than a replacement for one’s doctor. But when doctors are unavailable or unable to help, people will turn to alternatives. 

    Some doctors see LLMs as a boon for medical literacy. The average patient might struggle to navigate the vast landscape of online medical information—and, in particular, to distinguish high-quality sources from polished but factually dubious websites—but LLMs can do that job for them, at least in theory. Treating patients who had searched for their symptoms on Google required “a lot of attacking patient anxiety [and] reducing misinformation,” says Marc Succi, an associate professor at Harvard Medical School and a practicing radiologist. But now, he says, “you see patients with a college education, a high school education, asking questions at the level of something an early med student might ask.”

    The release of ChatGPT Health, and Anthropic’s subsequent announcement of new health integrations for Claude, indicate that the AI giants are increasingly willing to acknowledge and encourage health-related uses of their models. Such uses certainly come with risks, given LLMs’ well-documented tendencies to agree with users and make up information rather than admit ignorance. 

    But those risks also have to be weighed against potential benefits. There’s an analogy here to autonomous vehicles: When policymakers consider whether to allow Waymo in their city, the key metric is not whether its cars are ever involved in accidents but whether they cause less harm than the status quo of relying on human drivers. If Dr. ChatGPT is an improvement over Dr. Google—and early evidence suggests it may be—it could potentially lessen the enormous burden of medical misinformation and unnecessary health anxiety that the internet has created.

    Pinning down the effectiveness of a chatbot such as ChatGPT or Claude for consumer health, however, is tricky. “It’s exceedingly difficult to evaluate an open-ended chatbot,” says Danielle Bitterman, the clinical lead for data science and AI at the Mass General Brigham health-care system. Large language models score well on medical licensing examinations, but those exams use multiple-choice questions that don’t reflect how people use chatbots to look up medical information.

    Sirisha Rambhatla, an assistant professor of management science and engineering at the University of Waterloo, attempted to close that gap by evaluating how GPT-4o responded to licensing exam questions when it did not have access to a list of possible answers. Medical experts who evaluated the responses scored only about half of them as entirely correct. But multiple-choice exam questions are designed to be tricky enough that the answer options don’t give them entirely away, and they’re still a pretty distant approximation for the sort of thing that a user would type into ChatGPT.

    A different study, which tested GPT-4o on more realistic prompts submitted by human volunteers, found that it answered medical questions correctly about 85% of the time. When I spoke with Amulya Yadav, an associate professor at Pennsylvania State University who runs the Responsible AI for Social Emancipation Lab and led the study, he made it clear that he wasn’t personally a fan of patient-facing medical LLMs. But he freely admits that, technically speaking, they seem up to the task—after all, he says, human doctors misdiagnose patients 10% to 15% of the time. “If I look at it dispassionately, it seems that the world is gonna change, whether I like it or not,” he says.

    For people seeking medical information online, Yadav says, LLMs do seem to be a better choice than Google. Succi, the radiologist, also concluded that LLMs can be a better alternative to web search when he compared GPT-4’s responses to questions about common chronic medical conditions with the information presented in Google’s knowledge panel, the information box that sometimes appears on the right side of the search results.

    Since Yadav’s and Succi’s studies appeared online, in the first half of 2025, OpenAI has released multiple new versions of GPT, and it’s reasonable to expect that GPT-5.2 would perform even better than its predecessors. But the studies do have important limitations: They focus on straightforward, factual questions, and they examine only brief interactions between users and chatbots or web search tools. Some of the weaknesses of LLMs—most notably their sycophancy and tendency to hallucinate—might be more likely to rear their heads in more extensive conversations and with people who are dealing with more complex problems. Reeva Lederman, a professor at the University of Melbourne who studies technology and health, notes that patients who don’t like the diagnosis or treatment recommendations that they receive from a doctor might seek out another opinion from an LLM—and the LLM, if it’s sycophantic, might encourage them to reject their doctor’s advice.

    Some studies have found that LLMs will hallucinate and exhibit sycophancy in response to health-related prompts. For example, one study showed that GPT-4 and GPT-4o will happily accept and run with incorrect drug information included in a user’s question. In another, GPT-4o frequently concocted definitions for fake syndromes and lab tests mentioned in the user’s prompt. Given the abundance of medically dubious diagnoses and treatments floating around the internet, these patterns of LLM behavior could contribute to the spread of medical misinformation, particularly if people see LLMs as trustworthy.

    OpenAI has reported that the GPT-5 series of models is markedly less sycophantic and prone to hallucination than their predecessors, so the results of these studies might not apply to ChatGPT Health. The company also evaluated the model that powers ChatGPT Health on its responses to health-specific questions, using their publicly available HeathBench benchmark. HealthBench rewards models that express uncertainty when appropriate, recommend that users seek medical attention when necessary, and refrain from causing users unnecessary stress by telling them their condition is more serious that it truly is. It’s reasonable to assume that the model underlying ChatGPT Health exhibited those behaviors in testing, though Bitterman notes that some of the prompts in HealthBench were generated by LLMs, not users, which could limit how well the benchmark translates into the real world.

    An LLM that avoids alarmism seems like a clear improvement over systems that have people convincing themselves they have cancer after a few minutes of browsing. And as large language models, and the products built around them, continue to develop, whatever advantage Dr. ChatGPT has over Dr. Google will likely grow. The introduction of ChatGPT Health is certainly a move in that direction: By looking through your medical records, ChatGPT can potentially gain far more context about your specific health situation than could be included in any Google search, although numerous experts have cautioned against giving ChatGPT that access for privacy reasons.

    Even if ChatGPT Health and other new tools do represent a meaningful improvement over Google searches, they could still conceivably have a negative effect on health overall. Much as automated vehicles, even if they are safer than human-driven cars, might still prove a net negative if they encourage people to use public transit less, LLMs could undermine users’ health if they induce people to rely on the internet instead of human doctors, even if they do increase the quality of health information available online.

    Lederman says that this outcome is plausible. In her research, she has found that members of online communities centered on health tend to put their trust in users who express themselves well, regardless of the validity of the information they are sharing. Because ChatGPT communicates like an articulate person, some people might trust it too much, potentially to the exclusion of their doctor. But LLMs are certainly no replacement for a human doctor—at least not yet.

    All anyone wants to talk about at Davos is AI and Donald Trump

    This story first appeared in The Debrief, our subscriber-only newsletter about the biggest news in tech by Mat Honan, Editor in Chief. Subscribe to read the next edition as soon as it lands.

    Hello from the World Economic Forum annual meeting in Davos, Switzerland. I’ve been here for two days now, attending meetings, speaking on panels, and basically trying to talk to anyone I can. And as far as I can tell, the only things anyone wants to talk about are AI and Trump. 

    Davos is physically defined by the Congress Center, where the official WEF sessions take place, and the Promenade, a street running through the center of the town lined with various “houses”—mostly retailers that are temporarily converted into meeting hubs for various corporate or national sponsors. So there is a Ukraine House, a Brazil House, Saudi House, and yes, a USA House (more on that tomorrow). There are a handful of media houses from the likes of CNBC and the Wall Street Journal. Some houses are devoted to specific topics; for example, there’s one for science and another for AI. 

    But like everything else in 2026, the Promenade is dominated by tech companies. At one point I realized that literally everything I could see, in a spot where the road bends a bit, was a tech company house. Palantir, Workday, Infosys, Cloudflare, C3.ai. Maybe this should go without saying, but their presence, both in the houses and on the various stages and parties and platforms here at the World Economic Forum, really drove home to me how utterly and completely tech has captured the global economy. 

    While the houses host events and serve as networking hubs, the big show is inside the Congress Center. On Tuesday morning, I kicked off my official Davos experience there by moderating a panel with the CEOs of Accenture, Aramco, Royal Philips, and Visa. The topic was scaling up AI within organizations. All of these leaders represented companies that have gone from pilot projects to large internal implementations. It was, for me, a fascinating conversation. You can watch the whole thing here, but my takeaway was that while there are plenty of stories about AI being overhyped (including from us), it is certainly having substantive effects at large companies.  

    Aramco CEO Amin Nasser, for example, described how that company has found $3 billion to $5 billion in cost savings by improving the efficiency of its operations. Royal Philips CEO Roy Jakobs described how it was allowing health-care practitioners to spend more time with patients by doing things such as automated note-taking. (This really resonated with me, as my wife is a pediatrics nurse, and for decades now I’ve heard her talk about how much of her time is devoted to charting.) And Visa CEO Ryan McInerney talked about his company’s push into agentic commerce and the way that will play out for consumers, small businesses, and the global payments industry. 

    To elaborate a little on that point, McInerney painted a picture of commerce where agents won’t just shop for things you ask them to, which will be basically step one, but will eventually be able to shop for things based on your preferences and previous spending patterns. This could be your regular grocery shopping, or even a vacation getaway. That’s going to require a lot of trust and authentication to protect both merchants and consumers, but it is clear that the steps into agentic commerce we saw in 2025 were just baby ones. There are much bigger ones coming for 2026. (Coincidentally, I had a discussion with a senior executive from Mastercard on Monday, who made several of the same points.) 

    But the thing that really resonated with me from the panel was a comment from Accenture CEO Julie Sweet, who has a view not only of her own large org but across a spectrum of companies: “It’s hard to trust something until you understand it.” 

    I felt that neatly summed up where we are as a society with AI. 

    Clearly, other people feel the same. Before the official start of the conference I was at AI House for a panel. The place was packed. There was a consistent, massive line to get in, and once inside, I literally had to muscle my way through the crowd. Everyone wanted to get in. Everyone wanted to talk about AI. 

    (A quick aside on what I was doing there: I sat on a panel called “Creativity and Identity in the Age of Memes and Deepfakes,” led by Atlantic CEO Nicholas Thompson; it featured the artist Emi Kusano, who works with AI, and Duncan Crabtree-Ireland, the chief negotiator for SAG-AFTRA, who has been at the center of a lot of the debates about AI in the film and gaming industries. I’m not going to spend much time describing it because I’m already running long, but it was a rip-roarer of a panel. Check it out.)

    And, okay. Sigh. Donald Trump. 

    The president is due here Wednesday, amid threats of seizing Greenland and fears that he’s about to permanently fracture the NATO alliance. While AI is all over the stages, Trump is dominating all the side conversations. There are lots of little jokes. Nervous laughter. Outright anger. Fear in the eyes. It’s wild. 

    These conversations are also starting to spill out into the public. Just after my panel on Tuesday, I headed to a pavilion outside the main hall in the Congress Center. I saw someone coming down the stairs with a small entourage, who was suddenly mobbed by cameras and phones. 

    Moments earlier in the same spot, the press had been surrounding David Beckham, shouting questions at him. So I was primed for it to be another celebrity—after all, captains of industry were everywhere you looked. I mean, I had just bumped into Eric Schmidt, who was literally standing in line in front of me at the coffee bar. Davos is weird. 

    But in fact, it was Gavin Newsom, the governor of California, who is increasingly seen as the leading voice of the Democratic opposition to President Trump, and a likely contender, or even front-runner, in the race to replace him. Because I live in San Francisco I’ve encountered Newsom many times, dating back to his early days as a city supervisor before he was even mayor. I’ve rarely, rarely, seen him quite so worked up as he was on Tuesday. 

    Among other things, he called Trump a narcissist who follows “the law of the jungle, the rule of Don” and compared him to a T-Rex, saying, “You mate with him or he devours you.” And he was just as harsh on the world leaders, many of whom are gathered in Davos, calling them “pathetic” and saying he should have brought knee pads for them. 

    Yikes.

    There was more of this sentiment, if in more measured tones, from Canadian prime minister Mark Carney during his address at Davos. While I missed his remarks, they had people talking. “If we’re not at the table, we’re on the menu,” he argued. 

    Everyone wants AI sovereignty. No one can truly have it.

    Governments plan to pour $1.3 trillion into AI infrastructure by 2030 to invest in “sovereign AI,” with the premise being that countries should be in control of their own AI capabilities. The funds include financing for domestic data centers, locally trained models, independent supply chains, and national talent pipelines. This is a response to real shocks: covid-era supply chain breakdowns, rising geopolitical tensions, and the war in Ukraine.  

    But the pursuit of absolute autonomy is running into reality. AI supply chains are irreducibly global: Chips are designed in the US and manufactured in East Asia; models are trained on data sets drawn from multiple countries; applications are deployed across dozens of jurisdictions.  

    If sovereignty is to remain meaningful, it must shift from a defensive model of self-reliance to a vision that emphasizes the concept of orchestration, balancing national autonomy with strategic partnership. 

    Why infrastructure-first strategies hit walls 

    A November survey by Accenture found that 62% of European organizations are now seeking sovereign AI solutions, driven primarily by geopolitical anxiety rather than technical necessity. That figure rises to 80% in Denmark and 72% in Germany. The European Union has appointed its first Commissioner for Tech Sovereignty. 

    This year, $475 billion is flowing into AI data centers globally. In the United States, AI data centers accounted for roughly one-fifth of GDP growth in the second quarter of 2025. But the obstacle for other nations hoping to follow suit isn’t just money. It’s energy and physics. Global data center capacity is projected to hit 130 gigawatts by 2030, and for every $1 billion spent on these facilities, $125 million is needed for electricity networks. More than $750 billion in planned investment is already facing grid delays. 

    And it’s also talent. Researchers and entrepreneurs are mobile, drawn to ecosystems with access to capital, competitive wages, and rapid innovation cycles. Infrastructure alone won’t attract or retain world-class talent.  

    What works: An orchestrated sovereignty

    What nations need isn’t sovereignty through isolation but through specialization and orchestration. This means choosing which capabilities you build, which you pursue through partnership, and where you can genuinely lead in shaping the global AI landscape. 

    The most successful AI strategies don’t try to replicate Silicon Valley; they identify specific advantages and build partnerships around them. 

    Singapore offers a model. Rather than seeking to duplicate massive infrastructure, it invested in governance frameworks, digital-identity platforms, and applications of AI in logistics and finance, areas where it can realistically compete. 

    Israel shows a different path. Its strength lies in a dense network of startups and military-adjacent research institutions delivering outsize influence despite the country’s small size. 

    South Korea is instructive too. While it has national champions like Samsung and Naver, these firms still partner with Microsoft and Nvidia on infrastructure. That’s deliberate collaboration reflecting strategic oversight, not dependence.  

    Even China, despite its scale and ambition, cannot secure full-stack autonomy. Its reliance on global research networks and on foreign lithography equipment, such as extreme ultraviolet systems needed to manufacture advanced chips and GPU architectures, shows the limits of techno-nationalism. 

    The pattern is clear: Nations that specialize and partner strategically can outperform those trying to do everything alone. 

    Three ways to align ambition with reality 

    1.  Measure added value, not inputs.  

    Sovereignty isn’t how many petaflops you own. It’s how many lives you improve and how fast the economy grows. Real sovereignty is the ability to innovate in support of national priorities such as productivity, resilience, and sustainability while maintaining freedom to shape governance and standards.  

    Nations should track the use of AI in health care and monitor how the technology’s adoption correlates with manufacturing productivity, patent citations, and international research collaborations. The goal is to ensure that AI ecosystems generate inclusive and lasting economic and social value.  

    2. Cultivate a strong AI innovation ecosystem. 

    Build infrastructure, but also build the ecosystem around it: research institutions, technical education, entrepreneurship support, and public-private talent development. Infrastructure without skilled talent and vibrant networks cannot deliver a lasting competitive advantage.   

    3. Build global partnerships.  

    Strategic partnerships enable nations to pool resources, lower infrastructure costs, and access complementary expertise. Singapore’s work with global cloud providers and the EU’s collaborative research programs show how nations advance capabilities faster through partnership than through isolation. Rather than competing to set dominant standards, nations should collaborate on interoperable frameworks for transparency, safety, and accountability.  

    What’s at stake 

    Overinvesting in independence fragments markets and slows cross-border innovation, which is the foundation of AI progress. When strategies focus too narrowly on control, they sacrifice the agility needed to compete. 

    The cost of getting this wrong isn’t just wasted capital—it’s a decade of falling behind. Nations that double down on infrastructure-first strategies risk ending up with expensive data centers running yesterday’s models, while competitors that choose strategic partnerships iterate faster, attract better talent, and shape the standards that matter. 

    The winners will be those who define sovereignty not as separation, but as participation plus leadership—choosing who they depend on, where they build, and which global rules they shape. Strategic interdependence may feel less satisfying than independence, but it’s real, it is achievable, and it will separate the leaders from the followers over the next decade. 

    The age of intelligent systems demands intelligent strategies—ones that measure success not by infrastructure owned, but by problems solved. Nations that embrace this shift won’t just participate in the AI economy; they’ll shape it. That’s sovereignty worth pursuing. 

    Cathy Li is head of the Centre for AI Excellence at the World Economic Forum.

    Rethinking AI’s future in an augmented workplace

    There are many paths AI evolution could take. On one end of the spectrum, AI is dismissed as a marginal fad, another bubble fueled by notoriety and misallocated capital. On the other end, it’s cast as a dystopian force, destined to eliminate jobs on a large scale and destabilize economies. Markets oscillate between skepticism and the fear of missing out, while the technology itself evolves quickly and investment dollars flow at a rate not seen in decades. 

    All the while, many of today’s financial and economic thought leaders hold to the consensus that the financial landscape will stay the same as it has been for the last several years. Two years ago, Joseph Davis, global chief economist at Vanguard, and his team felt the same but wanted to develop their perspective on AI technology with a deeper foundation built on history and data. Based on a proprietary data set covering the last 130 years, Davis and his team developed a new framework, The Vanguard Megatrends Model, from research that suggested a more nuanced path than hype extremes: that AI has the potential to be a general purpose technology that lifts productivity, reshapes industries, and augments human work rather than displaces it. In short, AI will be neither marginal nor dystopian. 

    “Our findings suggest that the continuation of the status quo, the basic expectation of most economists, is actually the least likely outcome,” Davis says. “We project that AI will have an even greater effect on productivity than the personal computer did. And we project that a scenario where AI transforms the economy is far more likely than one where AI disappoints and fiscal deficits dominate. The latter would likely lead to slower economic growth, higher inflation, and increased interest rates.”

    Implications for business leaders and workers

    Davis does not sugar-coat it, however. Although AI promises economic growth and productivity, it will be disruptive, especially for business leaders and workers in knowledge sectors. “AI is likely to be the most disruptive technology to alter the nature of our work since the personal computer,” says Davis. “Those of a certain age might recall how the broad availability of PCs remade many jobs. It didn’t eliminate jobs as much as it allowed people to focus on higher value activities.” 

    The team’s framework allowed them to examine AI automation risks to over 800 different occupations. The research indicated that while the potential for job loss exists in upwards of 20% of occupations as a result of AI-driven automation, the majority of jobs—likely four out of five—will result in a mixture of innovation and automation. Workers’ time will increasingly shift to higher value and uniquely human tasks. 

    This introduces the idea that AI could serve as a copilot to various roles, performing repetitive tasks and generally assisting with responsibilities. Davis argues that traditional economic models often underestimate the potential of AI because they fail to examine the deeper structural effects of technological change. “Most approaches for thinking about future growth, such as GDP, don’t adequately account for AI,” he explains. “They fail to link short-term variations in productivity with the three dimensions of technological change: automation, augmentation, and the emergence of new industries.” Automation enhances worker productivity by handling routine tasks; augmentation allows technology to act as a copilot, amplifying human skills; and the creation of new industries creates new sources of growth.

    Implications for the economy 

    Ironically, Davis’s research suggests that a reason for the relatively low productivity growth in recent years may be a lack of automation. Despite a decade of rapid innovation in digital and automation technologies, productivity growth has lagged since the 2008 financial crisis, hitting 50-year lows. This appears to support the view that AI’s impact will be marginal. But Davis believes that automation has been adopted in the wrong places. “What surprised me most was how little automation there has been in services like finance, health care, and education,” he says. “Outside of manufacturing, automation has been very limited. That’s been holding back growth for at least two decades.” The services sector accounts for more than 60% of US GDP and 80% of the workforce and has experienced some of the lowest productivity growth. It is here, Davis argues, that AI will make the biggest difference.

    One of the biggest challenges facing the economy is demographics, as the Baby Boomer generation retires, immigration slows, and birth rates decline. These demographic headwinds reinforce the need for technological acceleration. “There are concerns about AI being dystopian and causing massive job loss, but we’ll soon have too few workers, not too many,” Davis says. “Economies like the US, Japan, China, and those across Europe will need to step up function in automation as their populations age.” 

    For example, consider nursing, a profession in which empathy and human presence are irreplaceable. AI has already shown the potential to augment rather than automate in this field, streamlining data entry in electronic health records and helping nurses reclaim time for patient care. Davis estimates that these tools could increase nursing productivity by as much as 20% by 2035, a crucial gain as health-care systems adapt to ageing populations and rising demand. “In our most likely scenario, AI will offset demographic pressures. Within five to seven years, AI’s ability to automate portions of work will be roughly equivalent to adding 16 million to 17 million workers to the US labor force,” Davis says. “That’s essentially the same as if everyone turning 65 over the next five years decided not to retire.” He projects that more than 60% of occupations, including nurses, family physicians, high school teachers, pharmacists, human resource managers, and insurance sales agents, will benefit from AI as an augmentation tool. 

    Implications for all investors 

    As AI technology spreads, the strongest performers in the stock market won’t be its producers, but its users. “That makes sense, because general-purpose technologies enhance productivity, efficiency, and profitability across entire sectors,” says Davis. This adoption of AI is creating flexibility for investment options, which means diversifying beyond technology stocks might be appropriate as reflected in Vanguard’s Economic and Market Outlook for 2026. “As that happens, the benefits move beyond places like Silicon Valley or Boston and into industries that apply the technology in transformative ways.” And history shows that early adopters of new technologies reap the greatest productivity rewards. “We’re clearly in the experimentation phase of learning by doing,” says Davis. “Those companies that encourage and reward experimentation will capture the most value from AI.” 

    Looking globally, Davis sees the United States and China as significantly ahead in the AI race. “It’s a virtual dead heat,” he says. “That tells me the competition between the two will remain intense.” But other economies, especially those with low automation rates and large service sectors, like Japan, Europe, and Canada, could also see significant benefits. “If AI is truly going to be transformative, three sectors stand out: health care, education, and finance,” says Davis. “For AI to live up to its potential, it must fundamentally reshape these industries, which face high costs and rising demand for better, faster, more personalized services.”

    However, Davis says Vanguard is more bullish on AI’s potential to transform the economy than it was just a year ago. Especially since that transformation requires application beyond Silicon Valley. “When I speak to business leaders, I remind them that this transformation hasn’t happened yet,” says Davis. “It’s their investment and innovation that will determine whether it does.”

    This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

    The UK government is backing AI that can run its own lab experiments

    A number of startups and universities that are building “AI scientists” to design and run experiments in the lab, including robot biologists and chemists, have just won extra funding from the UK government agency that funds moonshot R&D. The competition, set up by ARIA (the Advanced Research and Invention Agency), gives a clear sense of how fast this technology is moving: The agency received 245 proposals from research teams that are already building tools capable of automating increasing amounts of lab work.

    ARIA defines an AI scientist as a system that can run an entire scientific workflow, coming up with hypotheses, designing and running experiments to test those hypotheses, and then analyzing the results. In many cases, the system may then feed those results back into itself and run the loop again and again. Human scientists become overseers, coming up with the initial research questions and then letting the AI scientist get on with the grunt work.

    “There are better uses for a PhD student than waiting around in a lab until 3 a.m. to make sure an experiment is run to the end,” says Ant Rowstron, ARIA’s chief technology officer. 

    ARIA picked 12 projects to fund from the 245 proposals, doubling the amount of funding it had intended to allocate because of the large number and high quality of submissions. Half the teams are from the UK; the rest are from the US and Europe. Some of the teams are from universities, some from industry. Each will get around £500,000 (around $675,000) to cover nine months’ work. At the end of that time, they should be able to demonstrate that their AI scientist was able to come up with novel findings.

    Winning teams include Lila Sciences, a US company that is building what it calls an AI nano-scientist—a system that will design and run experiments to discover the best ways to compose and process quantum dots, which are nanometer-scale semiconductor particles used in medical imaging, solar panels, and QLED TVs.

    “We are using the funds and time to prove a point,” says Rafa Gómez-Bombarelli, chief science officer for physical sciences at Lila: “The grant lets us design a real AI robotics loop around a focused scientific problem, generate evidence that it works, and document the playbook so others can reproduce and extend it.”

    Another team, from the University of Liverpool, UK, is building a robot chemist, which runs multiple experiments at once and uses a vision language model to help troubleshoot when the robot makes an error.

    And a startup based in London, still in stealth mode, is developing an AI scientist called ThetaWorld, which is using LLMs to design experiments on the physical and chemical interactions that are important for the performance of batteries. The experiments will then be run in an automated lab by Sandia National Laboratories in the US.

    Taking the temperature

    Compared with the £5 million projects spanning two or three years that ARIA usually funds, £500,000 is small change. But that was the idea, says Rowstron: It’s an experiment on ARIA’s part too. By funding a range of projects for a short amount of time, the agency is taking the temperature at the cutting edge to determine how the way science is done is changing, and how fast. What it learns will become the baseline for funding future large-scale projects.   

    Rowstron acknowledges there’s a lot of hype, especially now that most of the top AI companies have teams focused on science. When results are shared by press release and not peer review, it can be hard to know what the technology can and can’t do. “That’s always a challenge for a research agency trying to fund the frontier,” he says. “To do things at the frontier, we’ve got to know what the frontier is.”

    For now, the cutting edge involves agentic systems calling up other existing tools on the fly. “They’re running things like large language models to do the ideation, and then they use other models to do optimization and run experiments,” says Rowstron. “And then they feed the results back round.”

    Rowstron sees the technology stacked in tiers. At the bottom are AI tools designed by humans for humans, such as AlphaFold. These tools let scientists leapfrog slow and painstaking parts of the scientific pipeline but can still require many months of lab work to verify results. The idea of an AI scientist is to automate that work too.  

    AI scientists sit in a layer above those human-made tools and call ton hose tools as needed, says Rowstron. “But there’s a point in time—and I don’t think it’s a decade away—where that AI scientist layer says, ‘I need a tool and it doesn’t exist,’ and it will actually create an AlphaFold kind of tool just on the way to figuring out how to solve another problem. That whole bottom zone will just be automated.”

    That’s still some way off, he says. All the projects ARIA is now funding involve systems that call on existing tools rather than spin up new ones.

    There are also unsolved problems with agentic systems in general, which limits how long they can run by themselves without going off track or making errors. For example, a study, titled “Why LLMs aren’t scientists yet,” posted online last week by researchers at Lossfunk, an AI lab based in India, reports that in an experiment to get LLM agents to run a scientific workflow to completion, the system failed three out of four times. According to the researchers, the reasons the LLMs broke down included changes in the initial specifications and “overexcitement that declares success despite obvious failures.”

    “Obviously, at the moment these tools are still fairly early in their cycle and these things might plateau,” says Rowstron. “I’m not expecting them to win a Nobel Prize.”

    “But there is a world where some of these tools will force us to operate so much quicker,” he continues. “And if we end up in that world, it’s super important for us to be ready.”

    The era of agentic chaos and how data will save us

    AI agents are moving beyond coding assistants and customer service chatbots into the operational core of the enterprise. The ROI is promising, but autonomy without alignment is a recipe for chaos. Business leaders need to lay the essential foundations now.

    The agent explosion is coming

    Agents are independently handling end-to-end processes across lead generation, supply chain optimization, customer support, and financial reconciliation. A mid-sized organization could easily run 4,000 agents, each making decisions that affect revenue, compliance, and customer experience. 

    The transformation toward an agent-driven enterprise is inevitable. The economic benefits are too significant to ignore, and the potential is becoming a reality faster than most predicted. The problem? Most businesses and their underlying infrastructure are not prepared for this shift. Early adopters have found unlocking AI initiatives at scale to be extremely challenging. 

    The reliability gap that’s holding AI back

    Companies are investing heavily in AI, but the returns aren’t materializing. According to recent research from Boston Consulting Group, 60% of companies report minimal revenue and cost gains despite substantial investment. However, the leaders reported they achieved five times the revenue increases and three times the cost reductions. Clearly, there is a massive premium for being a leader. 

    What separates the leaders from the pack isn’t how much they’re spending or which models they’re using. Before scaling AI deployment, these “future-built” companies put critical data infrastructure capabilities in place. They invested in the foundational work that enables AI to function reliably. 

    A framework for agent reliability: The four quadrants

    To understand how and where enterprise AI can fail, consider four critical quadrants: models, tools, context, and governance.

    Take a simple example: an agent that orders you pizza. The model interprets your request (“get me a pizza”). The tool executes the action (calling the Domino’s or Pizza Hut API). Context provides personalization (you tend to order pepperoni on Friday nights at 7pm). Governance validates the outcome (did the pizza actually arrive?). 

    Each dimension represents a potential failure point:

    • Models: The underlying AI systems that interpret prompts, generate responses, and make predictions
    • Tools: The integration layer that connects AI to enterprise systems, such as APIs, protocols, and connectors 
    • Context: Before making decisions, information agents need to understand the full business picture, including customer histories, product catalogs, and supply chain networks
    • Governance: The policies, controls, and processes that ensure data quality, security, and compliance

    This framework helps diagnose where reliability gaps emerge. When an enterprise agent fails, which quadrant is the problem? Is the model misunderstanding intent? Are the tools unavailable or broken? Is the context incomplete or contradictory? Or is there no mechanism to verify that the agent did what it was supposed to do?

    Why this is a data problem, not a model problem

    The temptation is to think that reliability will simply improve as models improve. Yet, model capability is advancing exponentially. The cost of inference has dropped nearly 900 times in three years, hallucination rates are on the decline, and AI’s capacity to perform long tasks doubles every six months.

    Tooling is also accelerating. Integration frameworks like the Model Context Protocol (MCP) make it dramatically easier to connect agents with enterprise systems and APIs.

    If models are powerful and tools are maturing, then what is holding back adoption?

    To borrow from James Carville, “It is the data, stupid.” The root cause of most misbehaving agents is misaligned, inconsistent, or incomplete data.

    Enterprises have accumulated data debt over decades. Acquisitions, custom systems, departmental tools, and shadow IT have left data scattered across silos that rarely agree. Support systems do not match what is in marketing systems. Supplier data is duplicated across finance, procurement, and logistics. Locations have multiple representations depending on the source.

    Drop a few agents into this environment, and they will perform wonderfully at first, because each one is given a curated set of systems to call. Add more agents and the cracks grow, as each one builds its own fragment of truth.

    This dynamic has played out before. When business intelligence became self-serve, everyone started creating dashboards. Productivity soared, reports failed to match. Now imagine that phenomenon not in static dashboards, but in AI agents that can take action. With agents, data inconsistency produces real business consequences, not just debates among departments.

    Companies that build unified context and robust governance can deploy thousands of agents with confidence, knowing they’ll work together coherently and comply with business rules. Companies that skip this foundational work will watch their agents produce contradictory results, violate policies, and ultimately erode trust faster than they create value.

    Leverage agentic AI without the chaos 

    The question for enterprises centers on organizational readiness. Will your company prepare the data foundation needed to make agent transformation work? Or will you spend years debugging agents, one issue at a time, forever chasing problems that originate in infrastructure you never built?

    Autonomous agents are already transforming how work gets done. But the enterprise will only experience the upside if those systems operate from the same truth. This ensures that when agents reason, plan, and act, they do so based on accurate, consistent, and up-to-date information. 

    The companies generating value from AI today have built on fit-for-purpose data foundations. They recognized early that in an agentic world, data functions as essential infrastructure. A solid data foundation is what turns experimentation into dependable operations.

    At Reltio, the focus is on building that foundation. The Reltio data management platform unifies core data from across the enterprise, giving every agent immediate access to the same business context. This unified approach enables enterprises to move faster, act smarter, and unlock the full value of AI.

    Agents will define the future of the enterprise. Context intelligence will determine who leads it.

    For leaders navigating this next wave of transformation, see Relatio’s practical guide:
    Unlocking Agentic AI: A Business Playbook for Data Readiness. Get your copy now to learn how real-time context becomes the decisive advantage in the age of intelligence. 

    Going beyond pilots with composable and sovereign AI

    Today marks an inflection point for enterprise AI adoption. Despite billions invested in generative AI, only 5% of integrated pilots deliver measurable business value and nearly one in two companies abandons AI initiatives before reaching production.

    The bottleneck is not the models themselves. What’s holding enterprises back is the surrounding infrastructure: Limited data accessibility, rigid integration, and fragile deployment pathways prevent AI initiatives from scaling beyond early LLM and RAG experiments. In response, enterprises are moving toward composable and sovereign AI architectures that lower costs, preserve data ownership, and adapt to the rapid, unpredictable evolution of AI—a shift IDC expects 75% of global businesses to make by 2027.

    The concept to production reality

    AI pilots almost always work, and that’s the problem. Proofs of concept (PoCs) are meant to validate feasibility, surface use cases, and build confidence for larger investments. But they thrive in conditions that rarely resemble the realities of production.

    Source: Compiled by MIT Technology Review Insights with data from Informatica, CDO Insights 2025 report, 2026

    “PoCs live inside a safe bubble” observes Cristopher Kuehl, chief data officer at Continent 8 Technologies. Data is carefully curated, integrations are few, and the work is often handled by the most senior and motivated teams.

    The result, according to Gerry Murray, research director at IDC, is not so much pilot failure as structural mis-design: Many AI initiatives are effectively “set up for failure from the start.”

    Download the article.