Want to understand the current state of AI? Check out these charts.

<div data-chronoton-summary="

  • The US-China AI race is closer than you think: Chinese models from DeepSeek and Alibaba now trail American ones by razor-thin margins. Meanwhile, the US has more data centers and capital, while China leads in research publications and robotics.
  • AI benchmarks are badly broken: One popular math benchmark has a 42% error rate, and models can game tests by training on the answers. Strong test scores increasingly fail to predict how AI actually performs in the real world.
  • Jobs and anxiety are both rising: Software developer employment for workers aged 22–25 has dropped nearly 20% since 2022, with AI likely a factor. Globally, 59% of people think AI will do more good than harm—but 52% say it still makes them nervous.
  • Regulation is losing the race: The EU banned predictive policing AI, and US states passed a record 150 AI-related bills, but experts say lawmakers don’t yet understand the technology well enough to govern it effectively.

” data-chronoton-post-id=”1135675″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence, AI’s annual report card, comes out today and cuts through some of that noise. 

Despite predictions that AI development may hit a wall, the report says that the top models just keep getting better. People are adopting AI faster than they picked up the personal computer or the internet. AI companies are generating revenue faster than companies in any previous technology boom, but they’re also spending hundreds of billions of dollars on data centers and chips. The benchmarks designed to measure AI, the policies meant to govern it, and the job market are struggling to keep up. AI is sprinting, and the rest of us are trying to find our shoes.

All that speed comes at a cost. AI data centers around the world can now draw 29.6 gigawatts of power, enough to run the entire state of New York at peak demand. Annual water use from running OpenAI’s GPT-4o alone may exceed the drinking water needs of 12 million people. At the same time, the supply chain for chips is alarmingly fragile. The US hosts most of the world’s AI data centers, and one company in Taiwan, TSMC, fabricates almost every leading AI chip. 

The data reveals a technology evolving faster than we can manage. Here’s a look at some of the key points from this year’s report. 

The US and China are nearly tied

In a long, heated race with immense geopolitical stakes, the US and China are almost neck and neck on AI model performance, according to Arena, a community-driven ranking platform that allows users to compare the outputs of large language models on identical prompts. In early 2023, OpenAI had a lead with ChatGPT, but this gap narrowed in 2024 as Google and Anthropic released their own models. In February 2025, R1, an AI model built by the Chinese lab DeepSeek, briefly matched the top US model, ChatGPT. As of March 2026, Anthropic leads, trailed closely by xAI, Google, and OpenAI. Chinese models like DeepSeek and Alibaba lag only modestly. With the best AI models separated in the rankings by razor-thin margins, they’re now competing on cost, reliability, and real-world usefulness. 

Chart of the performance of top models on the Arena by select providers, showing the Arena score from May 2023 to Jan 2026 with the models all trending upward.  The scores are tightly packed by US based Anthropic, xAI, Google and OpenAI lead Alibaba, DeepSeek and Mistral (in that order.) Meta trails the pack.

The index notes that the US and China have different AI advantages. While the US has more powerful AI models, more capital, and an estimated 5,427 data centers (more than 10 times as many as any other country), China leads in AI research publications, patents, and robotics. 

As competition intensifies, companies like OpenAI, Anthropic, and Google no longer disclose their training code, parameter counts, or data-set sizes. “We don’t know a lot of things about predicting model behaviors,” says Yolanda Gil, a computer scientist at the University of Southern California who coauthored the report. This lack of transparency makes it difficult for independent researchers to study how to make AI models safer, she says.

AI models are advancing super fast

Despite predictions that development will plateau, AI models keep getting better and better. By some measures, they now meet or exceed the performance of human experts on tests that aim to measure PhD-level science, math, and language understanding. SWE-bench Verified, a software engineering benchmark for AI models, saw top scores jump from around 60% in 2024 to almost 100% in 2025. In 2025, an AI system produced a weather forecast on its own.  

“I am stunned that this technology continues to improve, and it’s just not plateauing in any way,” says Gil.

line chart of Select AI Index technical performance benchmarks vs human performance, showing that skills such as image classification, English language understanding, multitask language understanding, visual reasoning, medium level reading comprehension, multimodal understanding and reasoning have surpassed the human baseline at or before 2025, with autonomous software engineering, mathmatical reasoning and agent multimodal computer use trending towards meeting the human baseline by 2026.

However, AI still struggles in plenty of other areas. Because the models learn by processing enormous amounts of text and images rather than by experiencing the physical world, AI exhibits “jagged intelligence.” Robots are still in their early days and succeed in only 12% of household tasks. Self-driving cars are farther along: Waymos are now roaming across five US cities, and Baidu’s Apollo Go vehicles are shuttling riders around in China. AI is also expanding into professional domains like law and finance, but no model dominates the field yet. 

But the way we test AI is broken

These reports of progress should be taken with a grain of salt. The benchmarks designed to track AI progress are struggling to keep up as models quickly blow past their ceilings, the Stanford report says. Some are poorly constructed—a popular benchmark that tests a model’s math abilities has a 42% error rate. Others can be gamed: when models are trained on benchmark test data, for example, they can learn to score well without getting smarter. 

Because AI is rarely used the same way it’s tested, strong benchmark performance doesn’t always translate to real-world performance. And for complex, interactive technologies such as AI agents and robots, benchmarks barely exist yet. 

AI companies are also sharing less about how their models are trained, and independent testing sometimes tells a different story from what they report. “A lot of companies are not releasing how their models do in certain benchmarks, particularly the responsible-AI benchmarks,” says Gil. “The absence of how your model is doing on a benchmark maybe says something.” 

AI is starting to affect jobs

Within three years of going mainstream, AI is now used by more than half of people around the world, a rate of adoption faster than the personal computer or the internet. An estimated 88% of organizations now use AI, and four in five university students use it. 

It’s early days for deployment, and AI’s impact on jobs is hard to measure. Still, some studies suggest AI is beginning to affect young workers in certain professions. According to a 2025 study by economists at Stanford, employment for software developers aged 22 to 25 has fallen nearly 20% since 2022. The decline might not be pinned on AI alone, as broader macroeconomic conditions could be to blame, but AI appears to be playing a part.

two line charts showing the normalized headcount trends by age group from 2021 through 2025. On the left for software developers the early career (age 22-25) cohort drops rapidly after a peak in September 2022, with other ages still rising albeit less steeply.  On the right, customer support agents see a similar trend, although the decline for the early career group is less steep than for software developers.

Employers say that hiring may continue to tighten. According to a 2025 survey conducted by McKinsey & Company, a third of organizations expect AI to shrink their workforce in the coming year, particularly in service and supply chain operations and software engineering. AI is boosting productivity by 14% in customer service and 26% in software development, according to research cited by the index, but such gains are not seen in tasks requiring more judgment. Overall, it’s still too early to understand the bigger economic impact of AI. 

People have complicated feelings about AI 

Around the world, people feel both optimistic and anxious about AI: 59% of people think that it will provide more benefits than drawbacks, while 52% say that it makes them nervous, according to an Ipsos survey cited in the index. 

Notably, experts and the public see the future of AI very differently, according to a Pew survey. The biggest gap is around the future of work: While 73% of experts think that AI will have a positive impact on how people do their jobs, only 23% of the American public thinks so. Experts are also more optimistic than the public about AI’s impact on education and medical care, but they agree that AI will hurt elections and personal relationships.

Bar chart of US perceptions of AI's societal impact contrasting US adults with AI experts, with the percentage of AI experts saying that AI will have a positive impact in the next 20 years is 2-3 times higher than the US adults.  The most optimistic AI experts are in the field of medical care with 84% predicting a positive outcome (versus 44% of US adults.) The greatest difference is for jobs with experts polling at 73% and US adults  polling at 23%.  Both groups have a similar (11% for experts and 9% of adults.) expectation for a positive outcome for AI in elections.

Among all countries surveyed, Americans trust their government least to regulate AI appropriately, according to another Ipsos survey. More Americans worry federal AI regulation won’t go far enough than worry it will go too far. 

Governments are struggling to regulate AI

Governments around the world are struggling to regulate AI, but there were some minor successes last year. The EU AI Act’s first prohibitions, which ban the use of AI in predictive policing and emotion recognition, took effect. Japan, South Korea, and Italy also passed national AI laws. Meanwhile, the US federal government moved toward deregulation, with President Trump issuing an executive order seeking to handcuff states from regulating AI. 

Despite this federal action, state legislatures in the US passed a record 150 AI-related bills. California enacted landmark legislation, including SB 53, which mandates safety disclosures and whistleblower protections for developers of AI models. New York passed the RAISE Act, requiring AI companies to publish safety protocols and report critical safety incidents.

line chart showing the number of AI-related bills passed into law by all US states from 2016-2025, which increases sharply in 2023 and peaks with 150 bills in 2025.

But for all the legislative activity, Gil says, regulation is running behind the technology because we don’t really understand how it works. “Governments are cautious to regulate AI because … we don’t understand many things very well,” she says. “We don’t have a good handle on those systems.”

Desalination plants in the Middle East are increasingly vulnerable

<div data-chronoton-summary="

  • Water as a weapon: Desalination plants supplying drinking water to millions across the Middle East have become targets in the escalating US-Iran conflict, with plants in Iran, Bahrain, and Kuwait already reporting damage.
  • Gulf states are most at risk: While Iran gets just 3% of its municipal fresh water from desalination, Bahrain, Qatar, and Kuwait depend on it for over 90% of their drinking water—making them far more exposed to attacks.
  • Bigger plants mean bigger consequences: The average desalination facility is now ten times larger than it was 15 years ago. Taking one offline could impact the water supplies of many people in the area.
  • The danger doesn’t end with the war: Climate change, oil spills, and algae blooms pose growing threats to these facilities—and experts warn the conflict may teach future actors just how effectively water infrastructure can be weaponized.

” data-chronoton-post-id=”1135235″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

As the conflict in Iran has escalated, a crucial resource is under fire: the desalination technology that supplies water across much of the region.

In early March, Iran’s foreign minister accused the US of attacking a desalination plant on Qeshm Island in the Strait of Hormuz and disrupting the water supply to nearly 30 villages. (The US denied responsibility.) In the weeks since, both Bahrain and Kuwait have reported damage to desalination plants and blamed Iran, though Iran also denied responsibility.

In late March, President Donald Trump threatened the destruction of “possibly all desalinization plants” in Iran if the Strait of Hormuz was not reopened. Since then, he’s escalated his threats against Iran, warning of plans to attack other crucial civilian infrastructure like power plants and bridges.

Countries in the Middle East, particularly the Gulf states, rely on the technology to turn salt water into fresh water for farming, industry, and—crucially—drinking. The mounting attacks and threats to date highlight just how vital the industry is to the region—a situation made even more precarious by rising temperatures and extreme weather driven by climate change.

Right now, 83% of the Middle East is under extremely high water stress, says Liz Saccoccia, a water security associate at the World Resources Institute. Future projections suggest that’s going to increase to about 100% by 2050, she adds: “This is a continuing trend, and it’s getting worse, not better.”

Here’s a look at desalination technology in the Middle East and what wartime threats to the critical infrastructure could mean for people in the region. 

A vital resource

Desalination technology has helped provide water supplies in the Middle East since the early 20th century and became widespread in the 1960s and 1970s.

There are two major categories of desalination plants. Thermal plants use heat to evaporate water, leaving salt and other impurities behind. The vapor can then be condensed into usable fresh water. The alternative is membrane-based technology like reverse osmosis, which pushes water through membranes that have tiny pores—so small that salt can’t get through.

Early desalination plants in the Middle East were the first type, burning fossil fuels to evaporate water, leaving the salt behind. This technique is incredibly energy-intensive, and over time, processes that rely on filters became the dominant choice.

Membrane technologies have made up essentially all new desalination capacity in recent years; the last major thermal plant built in the Gulf came online in 2018. Many reverse osmosis plants still rely on fossil fuels, but they’re more efficient. Since then, membrane technologies have added more than 15 million cubic meters of daily capacity—enough to supply water to millions of people.

Capacity has expanded quickly in recent years; between 2006 and 2024, countries across the Middle East collectively spent over $50 billion building and upgrading desalination facilities, and nearly that much operating them.

Today, there are nearly 5,000 desalination plants operational across the Middle East.

And looking ahead, growth is continuing. Between 2024 and 2028, daily capacity is expected to grow from about 29 million cubic meters to 41 million cubic meters.

Uneven vulnerabilities

Some countries rely on the technology more than others. Iran, for example, uses desalination for about 3% of its municipal fresh water. The country has access to groundwater and some surface water, including rivers, though these resources are being stretched thin by agriculture and extreme drought.

Other nations in the region, particularly the Gulf countries (Bahrain, Qatar, Kuwait, the United Arab Emirates, Saudi Arabia, and Oman), have much more limited water resources and rely heavily on desalination. Across these six nations, all but the UAE get more than half their drinking water from desalination, and for Bahrain, Qatar, and Kuwait the figure is more than 90%.

“The Gulf countries are much, much more vulnerable to attacks on their desalination plants than Iran is,” says David Michel, a senior associate in the global food and water security program at the Center for Strategic and International Studies.

There are thousands of desalination facilities across the region, so the system wouldn’t collapse if a small number were taken offline, Michel says. However, in recent years there’s been a trend toward larger, more centralized plants.

The average desalination plant is about 10 times larger than it was 15 years ago, according to data from the International Energy Agency. The largest desalination plants today can produce 1 million cubic meters of water daily, enough for hundreds of thousands of people. Taking one or more of these massive facilities offline could have a significant effect on the system, Michel says.

Escalating threats

Desalination facilities are quite linear, meaning there are multiple steps and pieces of equipment that work in sequence—and the failure of a component in that chain can take an entire facility down. Attacks on water inlets, transportation networks, and power supplies can also disrupt the system, Michel says. 

During the Gulf War in 1991, Iraqi forces pumped oil into the gulf, contaminating the water and shutting down desalination plants in Kuwait

The facilities are also generally located close to other targets in this conflict. Desalination is incredibly energy intensive, so about three-quarters of facilities in the region are next to power plants. Trump has repeatedly threatened power plants in Iran. In response, Iran’s military has said that if civilian targets are hit, the country will respond with strikes that are “much more devastating and widespread.” Other governments and organizations, including the United Nations, the European Union, and the Red Cross, have broadly condemned threats to infrastructure as illegal. 

But war isn’t the only danger facing these plants, even if it is the most immediate. Some studies have suggested that global warming could strengthen cyclones in the region, and these extreme weather events could force shutdowns or damage equipment.

Water pollution could also cause shutdowns. Oil spills, whether accidental or intentional, as in the case of the Gulf War, can  wreak havoc. And in 2009, a red algae bloom closed desalination plants in Oman and the United Arab Emirates for weeks. The algae fouled membranes and blocked the plants from being able to take water in from the Persian Gulf and the Gulf of Oman.

Desalination facilities could become more resilient to threats in the future, and they may need to as their importance continues to grow. 

There’s increasing interest in running desalination facilities at least partially on solar power, which could help reduce dependence on the oil that powers most facilities today. The Hassyan seawater desalination project in the UAE, currently under construction, would be the largest reverse osmosis plant in the world to operate solely with renewable energy. 

Another way to increase resilience is for countries to build up more strategic water storage to meet demand. Qatar recently issued new policies that aim to improve management and storage of desalinated water, for example. Countries could also work together to invest in shared infrastructure and policies that help strengthen the water supply through the region. 

Preparedness, resilience, and cooperation will be key for the Middle East broadly as critical infrastructure, including the water supply, is increasingly under threat. 

“The longer the conflict goes on, the more likely we’ll see significant water infrastructure damage,” says Ginger Matchett, an assistant director at the Atlantic Council. “What worries me is that after this war ends, some of the lessons will show how water can be weaponized more strategically than previously imagined.” 

A woman’s uterus has been kept alive outside the body for the first time

<div data-chronoton-summary="

  • A uterus survived outside the body for the first time: Scientists in Spain kept a donated human uterus alive for 24 hours using a machine that mimics the body’s circulatory system, pumping modified blood through the organ.
  • The researchers hope to someday keep a uterus alive for a full menstrual cycle: Researchers also want to study how embryos implant into the uterine lining, by observing the process in a living organ outside the body.
  • Bigger ambitions are already on the table: The team’s founder envisions a future where a machine like this could gestate a human fetus entirely outside the body, offering a new path to parenthood for those unable to carry a pregnancy.

” data-chronoton-post-id=”1134766″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

“Think of this as a human body,” says Javier González.

In front of me is essentially a metal box on wheels. Standing at around a meter in height, it reminds me of a stainless-steel counter in a restaurant kitchen. It is covered in flexible plastic tubing—which act as veins and arteries—connecting a series of transparent containers, the organs of this machine.

What makes it extra special is the role of the cream-colored tub that sits on its surface. Ten months ago, González, a biomedical scientist who developed the device with his colleagues at the Carlos Simon Foundation, carefully placed a freshly donated human uterus in the tub. The team connected it to the device’s tubes and pumped in modified human blood.

The device kept the uterus alive for a day—a new feat that could represent the first step to the long-term maintenance of uteruses outside the human body. The work has not yet been published. 

The team members want to keep donated human uteruses alive long enough to see a full menstrual cycle. They hope this will help them study diseases of the uterus and learn more about how embryos burrow their way into the organ’s lining at the start of a pregnancy. They also hope that future iterations of their device might one day sustain the full gestation of a human fetus.

The machine is technically called PUPER, which stands for “preservation of the uterus in perfusion.” But González’s colleague Xavier Santamaria says the team has adopted a nickname for it: “We call it ‘Mother.’”

The organ in the machine

González and Santamaria, medical vice president of the Carlos Simon Foundation, demonstrated how the device might work when I visited the foundation in Valencia, Spain, earlier this month (although it held no organs on that day). 

Both are interested in learning more about implantation, the moment at which an embryo attaches itself to the lining of a uterus—essentially, the very first moment of pregnancy.

The foundation’s founder and director, Carlos Simon, believes it’s a sticking point in IVF: Scientists have made many improvements to the technology over the years, but the failure of embryos to implant underlies plenty of unsuccessful IVF cycles, he says. Being able to carefully study how the process works in a real, living organ might give the team a better idea of how to prevent those failures.

a person in gloves stands next to a machine with lots of tubing coming in and out of the metal exterior

JESS HAMZELOU
a sheep uterus resting on gauze connected to several tubes

JAVIER GONZALES/CARLOS SIMON FOUNDATION

Javier González demonstrates the perfusion machine. A previous iteration of the device kept a sheep’s uterus (right) alive for a day.

The team took inspiration from advances in technologies designed to maintain donated organs for transplantation. In recent years, researchers around the world have created devices that deliver nutrients and filter waste so that organs can survive longer after being removed from donors’ bodies.

The main goal here is to buy time. A human organ might last only a matter of hours outside the body, so a transplant may require frantic preparation for the recipient, sometimes in the middle of the night. With a little more time, doctors could find better donor-patient matches and potentially test the quality of donated organs.

This approach is called normothermic or machine perfusion, and it is already being used clinically for some liver, kidney, and heart transplants.

The team at the Carlos Simon Foundation built a similar machine for uteruses. A blood bag hangs on one side. From there, blood is ferried via plastic tubing to a pump, which functions as the heart. The pump shunts the blood through an oxygenator, which adds oxygen and removes carbon dioxide as the lungs would in a human body.

The blood is warmed and passed through sensors that monitor the levels of glucose and oxygen, along with other factors. It passes through a “kidney” to remove waste. And finally the blood reaches the uterus, hooked up to its own plastic “arteries” and “veins.” The organ itself sits at a tilt, just as in the body, and is kept in a humid environment to stay moist.

Mother’s first uterus

The team first began testing an early prototype of the device with sheep uteruses around four years ago. That meant carting the machine to an animal research center in Zaragoza, around 200 miles away. Over the course of the preliminary study, veterinary surgeons removed the uteruses of six sheep and hooked them up to the machine. They kept each uterus alive for a day, using blood from the same animals.

After the sheep experiments, the researchers carted their machine back to Valencia and modified it to achieve its current incarnation, “Mother.” They started working with a local hospital that performed hysterectomies. And in May last year, they were offered their first human uterus.

The team needed to be quick. “You need to put [the uterus in the machine] within a couple of hours, maximum, of the extraction,” says Santamaria. He and his colleagues also needed to connect the uterus’s blood vessels to the tubing delicately, taking care to avoid any blockages (clotting is a major challenge in organ perfusion). The organ was hooked up to human blood obtained from a blood bank.

It seemed to work—at least temporarily. “We kept it alive for one day,” says Santamaria.

“As a proof of concept, it is impressive,” says Keren Ladin, a bioethicist who has focused on organ transplantation and perfusion at Tufts University. “These are early days.”

It might not sound like much, but 24 hours is a long time for an organ to be out of the body. Maintaining a donated uterus for that long could expand the options for uterus transplant, a fairly new procedure offered to some people who want to be pregnant but don’t have a functional uterus, says Gerald Brandacher, professor of experimental and translational transplant surgery at the Medical University of Innsbruck in Austria.

“It is better than what we currently have, because we have only a couple of hours,” he says. So far, most uterus transplants have been planned operations involving organs from living donors. A technology like this could allow for the use of more organs from deceased donors, he says.

That work is “not in the immediate pipeline” for the team in Spain, says Santamaria. “We are working on other problems.”

Pregnancy in the lab?

Santamaria, González, and their colleagues are more interested in using sustained human uteruses for research. 

They’ve mounted a camera to a wall in the corner of the room, pointed at their machine. It allows the team to monitor “Mother” remotely, and to check if any valves disconnect. (That happened once before—a spike in pressure caused the blood bag to come loose, spilling a liter of blood on the floor, Santamaria says.)

They’d like to be able to keep their uteruses alive for around 28 days to study the menstrual cycle and disorders that affect the uterus, like endometriosis and fibroids.

It won’t be easy to maintain a uterus for that long, cautions Brandacher. As far as he knows, no one has been able to maintain a liver for more than seven days. “No studies out there … have shown 30-day survival in a machine perfusion circuit,” he says.

But it’s worth the effort. The team’s main interest is learning more about how embryos implant in the uterine lining at the start of a pregnancy. They hope to be able to test the process in their outside-the-body uteruses.

They won’t be allowed to use human embryos for this, says González—that would cross an ethical boundary. Instead, they plan to use embryo-like structures made from stem cells. The structures closely resemble human embryos but are created in a lab without sperm or eggs.

Simon himself has grander ambitions.

He sees a future in which a machine like “Mother” will be able to fully gestate a human, all the way from embryo to newborn. It could offer a new path to parenthood for people who don’t have a uterus, for example, or who are not able to get pregnant for other reasons.

He appreciates that it sounds futuristic, to say the least. “I don’t know if we will end up having pregnancies inside of the uterus outside of the body, but at least we are ready to understand all the steps to do that,” he says. “You have to start somewhere.”

OpenAI is throwing everything into building a fully automated researcher

<div data-chronoton-summary="

  • A fully automated research lab: OpenAI has set a new “North Star” — building an AI system capable of tackling large, complex scientific problems entirely on its own, with a research intern prototype due by September and a full multi-agent system planned for 2028.
  • Coding agents as a proof of concept: OpenAI’s existing tool Codex, which can already handle substantial programming tasks autonomously, is the early blueprint — the bet is that if AI can solve coding problems, it can solve almost any problem formulated in text or code.
  • Serious risks with no clean answers: Chief scientist Jakub Pachocki admits that a system this powerful running with minimal human oversight raises hard questions — with risks from hacking and misuse to bioweapons — and that chain-of-thought monitoring is the best safeguard available, for now.
  • Power concentrated in very few hands: Pachocki says governments, not just OpenAI, will need to figure out where the lines are drawn.

” data-chronoton-post-id=”1134438″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. ​​OpenAI says that this new research goal will be its “North Star” for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability.

There’s even a timeline. OpenAI plans to build “an autonomous AI research intern”—a system that can take on a small number of specific research problems by itself—by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. This AI researcher (OpenAI says) will be able to tackle problems that are too large or complex for humans to cope with.

Those tasks might be related to math and physics—such as coming up with new proofs or conjectures—or life sciences like biology and chemistry, or even business and policy dilemmas. In theory, you would throw such a tool any kind of problem that can be formulated in text, code, or whiteboard scribbles—which covers a lot.

OpenAI has been setting the agenda for the AI industry for years. Its early dominance with large language models shaped the technology that hundreds of millions of people use every day. But it now faces fierce competition from rival model makers like Anthropic and Google DeepMind. What OpenAI decides to build next matters—for itself and for the future of AI.   

A big part of that decision falls to Jakub Pachocki, OpenAI’s chief scientist, who sets the company’s long-term research goals. Pachocki played key roles in the development of both GPT-4, a game-changing LLM released in 2023, and so-called reasoning models, a technology that first appeared in 2024 and now underpins all major chatbots and agent-based systems. 

In an exclusive interview this week, Pachocki talked me through OpenAI’s latest vision. “I think we are getting close to a point where we’ll have models capable of working indefinitely in a coherent way just like people do,” he says. “Of course, you still want people in charge and setting the goals. But I think we will get to a point where you kind of have a whole research lab in a data center.”

Solving hard problems

Such big claims aren’t new. Saving the world by solving its hardest problems is the stated mission of all the top AI firms. Demis Hassabis told me back in 2022 that it was why he started DeepMind. Anthropic CEO Dario Amodei says he is building the equivalent of a country of geniuses in a data center. Pachocki’s boss, Sam Altman, wants to cure cancer. But Pachocki says OpenAI now has most of what it needs to get there.

In January, OpenAI released Codex, an agent-based app that can spin up code on the fly to carry out tasks on your computer. It can analyze documents, generate charts, make you a daily digest of your inbox and social media, and much more. (Other firms have released similar tools, such as Anthropic’s Claude Code and Claude Cowork.)

OpenAI claims that most of its technical staffers now use Codex in their work. You can look at Codex as a very early version of the AI researcher, says Pachocki: “I expect Codex to get fundamentally better.”

The key is to make a system that can run for longer periods of time, with less human guidance. “What we’re really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days,” says Pachocki.

“There are a lot of people excited about building systems that can do more long-running scientific research,” says Doug Downey, a research scientist at the Allen Institute for AI, who is not connected to OpenAI. “I think it’s largely driven by the success of these coding agents. The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive. And it raises the question: Can we do similar things outside coding, in broader areas of science?”

For Pachocki, that’s a clear Yes. In fact, he thinks it’s just a matter of pushing ahead on the path we’re already on. A simple boost in all-round capability also leads to models that can work longer without help, he says. He points to the leap from 2020’s GPT-3 to 2023’s GPT-4, two of OpenAI’s previous models. GPT-4 was able to work on a problem for far longer than its predecessor, even without specialized training, he says. 

So-called reasoning models brought another bump. Training LLMs to work through problems step by step, backtracking when they make a mistake or hit a dead end, has also made models better at working for longer periods of time. And Pachocki is convinced that OpenAI’s reasoning models will continue to get better.

But OpenAI is also training its systems to work by themselves for longer by feeding them specific samples of complex tasks, such as hard puzzles taken from math and coding contests, which force the models to learn how to do things like keep track of very large chunks of text and split problems up into (and then manage) multiple subtasks.

The aim isn’t to build models that just win math competitions. “That lets you prove that the technology works before you connect it to the real world,” says Pachocki. “If we really wanted to, we could build an amazing automated mathematician. We have all the tools, and I think it would be relatively easy. But it’s not something we’re going to prioritize now because, you know, at the point where you believe you can do it, there’s much more urgent things to do.”

“We are much more focused now on research that’s relevant in the real world,” he adds.

Right now that means taking what Codex can do with coding and trying to apply that to problem-solving in general. “There’s a big change happening, especially in programming,” he says. “Our jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents.” If Codex can solve coding problems (the argument goes), it can solve any problem.

The line always goes up

It’s true that OpenAI has had a handful of remarkable successes in the last few months. Researchers have used GPT-5 (the LLM that powers Codex) to discover new solutions to a number of unsolved math problems and punch through apparent dead ends in a handful of biology, chemistry, and physics puzzles.   

“Just looking at these models coming up with ideas that would take most PhD weeks, at least, makes me expect that we’ll see much more acceleration coming from this technology in the near future,” Pachocki says.

But Pachocki admits that it’s not a done deal. He also understands why some people still have doubts about how much of a game-changer the technology really is. He thinks it depends on how people like to work and what they need to do. “I can believe some people don’t find it very useful yet,” he says.

He tells me that he didn’t even use autocomplete—the most basic version of generative coding tech—a year ago. “I’m very pedantic about my code,” he says. “I like to type it all manually in vim if I can help it.” (Vim is a text editor favored by many hardcore programmers that you interact with via dozens of keyboard shortcuts instead of a mouse.)

But that changed when he saw what the latest models could do. He still wouldn’t hand over complex design tasks, but it’s a time-saver when he just wants to try out a few ideas. “I can have it run experiments in a weekend that previously would have taken me like a week to code,” he says.

“I don’t think it is at the level where I would just let it take the reins and design the whole thing,” he adds. “But once you see it do something that would take a week to do—I mean, that’s hard to argue with.”

Pachocki’s game plan is to supercharge the existing problem-solving abilities that tools like Codex have now and apply them across the sciences.  

Downey agrees that the idea of an automated researcher is very cool: “It would be exciting if we could come back tomorrow morning and the agent’s done a bunch of work and there’s new results we can examine,” he says.

But he cautions that building such a system could be harder than Pachocki makes out. Last summer, Downey and his colleagues tested several top-tier LLMs on a range of scientific tasks. OpenAI’s latest model, GPT-5, came out on top but still made lots of errors.

“If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down,” he says. Downey admits that things move fast, and he has not tested the latest versions of GPT-5 (OpenAI released GPT-5.4 two weeks ago). “So those results might already be stale,” he says. 

Serious unanswered questions

I asked Pachocki about the risks that may come with a system that can solve large, complex problems by itself with little human oversight. Pachocki says people at OpenAI talk about those risks all the time.

“If you believe that AI is about to substantially accelerate research, including AI research, that’s a big change in the world. That’s a big thing,” he told me. “And it comes with some serious unanswered questions. If it’s so smart and capable, if it can run an entire research program, what if it does something bad?”

The way Pachocki sees it, that could happen in a number of ways. The system could go off the rails. It could get hacked. Or it could simply misunderstand its instructions.

The best technique OpenAI has right now to address these concerns is to train its reasoning models to share details about what they are doing as they work. This approach to keeping tabs on LLMs is known as chain-of-thought monitoring.

In short, LLMs are trained to jot down notes about what they are doing in a kind of scratch pad as they step through tasks. Researchers can then use those notes to make sure a model is behaving as expected. Yesterday OpenAI published new details on how it is using chain-of-thought monitoring in house to study Codex

“Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we’re really going to depend on,” says Pachocki.

The idea would be to monitor an AI researcher’s scratch pads using other LLMs and catch unwanted behavior before it’s a problem, rather than trying to stop that bad behavior from happening in the first place. LLMs are not understood well enough for us to control them fully.

“I think it’s going to be a long time before we can really be like, okay, this problem is solved,” he says. “Until you can really trust the systems, you definitely want to have restrictions in place.” Pachocki thinks that very powerful models should be deployed in sandboxes, cut off from anything they could break or use to cause harm. 

AI tools have already been used to come up with novel cyberattacks. Some worry that they will be used to design synthetic pathogens that could be used as bioweapons. You can insert any number of evil-scientist scare stories here. “I definitely think there are worrying scenarios that we can imagine,” says Pachocki. 

“It’s going to be a very weird thing. It’s extremely concentrated power that’s in some ways unprecedented,” says Pachocki. “Imagine you get to a world where you have a data center that can do all the work that OpenAI or Google can do. Things that in the past required large human organizations would now be done by a couple of people.”

“I think this is a big challenge for governments to figure out,” he adds.

And yet some people would say governments are part of the problem. The US government wants to use AI on the battlefield, for example. The recent showdown between Anthropic and the Pentagon revealed that there is little agreement across society about where we draw red lines for how this technology should and should not be used—let alone who should draw them. In the immediate aftermath of that dispute, OpenAI stepped up to sign a deal with the Pentagon instead of its rival. The situation remains murky.

I pushed Pachocki on this. Does he really trust other people to figure it out or does he, as a key architect of the future, feel personal responsibility? “I do feel personal responsibility,” he says. “But I don’t think this can be resolved by OpenAI alone, pushing its technology in a particular way or designing its products in a particular way. We’ll definitely need a lot of involvement from policymakers.”

Where does that leave us? Are we really on a path to the kind of AI Pachocki envisions? When I asked the Allen Institute’s Downey, he laughed. “I’ve been in this field for a couple of decades and I no longer trust my predictions for how near or far certain capabilities are,” he says. 

OpenAI’s stated mission is to ensure that artificial general intelligence (a hypothetical future technology that many AI boosters believe will be able to match humans on most cognitive tasks) will benefit all of humanity. OpenAI aims to do that by being the first to build it. But the only time Pachocki mentioned AGI in our conversation, he was quick to clarify what he meant by talking about “economically transformative technology” instead.

LLMs are not like human brains, he says: “They are superficially similar to people in some ways because they’re kind of mostly trained on people talking. But they’re not formed by evolution to be really efficient.” 

“Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways. I don’t think that will happen,” he adds. “But I don’t think it’s absolutely necessary. The interesting thing is you don’t need to be as smart as people in all their ways in order to be very transformative.”

Can quantum computers now solve health care problems? We’ll soon find out.

<div data-chronoton-summary="

  • A $5 million health care challenge: A nonprofit called Wellcome Leap is offering up to $5 million to quantum computing teams that can solve real-world health care problems classical computers can’t handle—using machines that are still noisy, error-prone, and far from perfect.
  • Hybrid computing is the real breakthrough: Facing limited quantum hardware, all six finalist teams developed clever quantum-classical hybrid approaches—offloading most work to conventional processors, then using quantum only where classical methods fall short.
  • Cancer, muscular dystrophy, and drug design are on the table: Teams are tackling problems ranging from identifying cancer origins to simulating light-activated cancer drugs to finding treatments for muscular dystrophy—applications previously impossible to model classically.
  • Even failure would count as progress: The competition’s own director doubts anyone will claim the grand prize, but says the field has already been transformed—teams now know where quantum computing can genuinely matter, even if the machines to fully prove it don’t exist yet.

” data-chronoton-post-id=”1134409″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

I’m standing in front of a quantum computer built out of atoms and light at the UK’s National Quantum Computing Centre on the outskirts of Oxford. On a laboratory table, a complex matrix of mirrors and lenses surrounds a Rubik’s Cube–size cell where 100 cesium atoms are suspended in grid formation by a carefully manipulated laser beam. 

The cesium atom setup is so compact that I could pick it up, carry it out of the lab, and put it on the backseat of my car to take home. I’d be unlikely to get very far, though. It’s small but powerful—and so it’s very valuable. Infleqtion, the Colorado-based company that owns it, is hoping the machine’s abilities will win $5 million next week, at an event to be held in Marina del Rey, California. 

Infleqtion is one of six teams that have made it to the final stage of a 30-month-long quantum computing competition called Quantum for Bio (Q4Bio). Run by the nonprofit Wellcome Leap, it aims to show that today’s quantum computers, though messy and error-prone and far from the large-scale machines engineers hope to build, could actually benefit human health. Success would be a significant step forward in proving the worth of quantum computers. But for now, it turns out, that worth seems to be linked to harnessing and improving the performance of conventional (also called classical) computers in tandem, creating a quantum-classical hybrid that can exceed what’s possible on classical machines by themselves.

There are two prize categories. A prize of $2 million will go to any and all teams that can run a significantly useful health care algorithm on computers with 50 or more qubits (a qubit is the basic processing unit in a quantum computer). To win the $5 million grand prize, a team must successfully run a quantum algorithm that solves a significant real-world problem in health care, and the work must use 100 or more qubits. Winners have to meet strict performance criteria, and they must solve a health care problem that can’t be solved with conventional computers—a tough task.

Despite the scale of the challenge, most of the teams think some of this money could be theirs. “I think we’re in with a good shout,” says Jonathan D. Hirst, a computational chemist at the University of Nottingham, UK. “We’re very firmly within the criteria for the $2 million prize,” says Stanford University’s Grant Rotskoff, whose collaboration is investigating the quantum properties of the ATP molecule that powers biological cells. 

The grand prize is perhaps less of a sure thing. “This is really at the very edge of doable,” Rotskoff says. Insiders say the challenge is so difficult, given the state of quantum computing technology, that much of the money could stay in Wellcome Leap’s account. 

With most of the Q4Bio work unpublished and protected by NDAs, and the quantum computing field already rife with claims and counterclaims about performance and achievements, only the judges will be in a position to decide who’s right. 

A hybrid solution

The idea behind quantum computers is that they can use small-scale objects that obey the laws of quantum mechanics, such as atoms and photons of light,  to simulate real-world processes too complex to model on our everyday classical machines. 

Researchers have been working for decades to build such systems, which could deliver insights for creating new materials, developing pharmaceuticals, and improving chemical processes such as fertilizer production.  But dealing with quantum stuff like atoms is excruciatingly difficult. The biggest, shiniest applications require huge, robust machines capable of withstanding the environmental “noise” that can very easily disrupt delicate quantum systems. We don’t have those yet—and it’s unclear when we will. 

Wellcome Leap wanted to find out if the smaller-scale machines we have today can be made to do something—anything—useful for health care while we wait for the era of powerful, large-scale quantum computers. The group started the competition in 2024, offering $1.5 million in funding to each group of 12 selected teams.

The six Q4Bio finalists have taken a range of approaches. Crucially, they’ve all come up with ingenious ways to overcome quantum computing’s drawbacks. Faced with noisy, limited machines, they have learned how to outsource much of the computational load to classical processors running newly developed algorithms that are, in many cases, better than the previous state of the art. The quantum processors are then required only for the parts of the problem where classical methods don’t scale well enough as the calculation gets bigger.

For example, a team led by Sergii Strelchuk of Oxford University is using a quantum computer to map genetic diversity among humans and pathogens on complex graph-based structures. These will—the researchers hope—expose hidden connections and potential treatment pathways. “You can think about it as a platform for solving difficult problems in computational genomics,” Strelchuk says. 

The corresponding classical tools struggle with even modest scale-up to large databases. Strelchuk’s team has built an automated pipeline that provides a way of determining whether classical solvers will struggle with a particular problem, and how a quantum algorithm might be able to formulate the data so that it becomes solvable on a classical computer or handleable on a noisy quantum one. “You can do all this before you start spending money on computing,” Strelchuk says.

In collaboration with Cleveland Clinic, Helsinki-based Algorithmiq has used a superconducting quantum computer built by IBM to simulate a cancer drug that is triggered by specific types of light. “The idea is you take the drug, and it’s everywhere in your body, but it’s doing nothing, just sitting there, until there’s light on it of a certain wavelength,” says Guillermo García-Pérez, Algorithmiq’s chief scientific officer. Then it acts as a molecular bullet, attacking the tumor only at the location in the body where that light is directed. 

The drug with which Algorithmiq began its work is already in phase II clinical trials for treating bladder cancers. The quantum-computed simulation, which adapts and improves on classical algorithms, will allow it to be redesigned for treating other conditions. “It has remained a niche treatment precisely because it can’t be simulated classically,” says Sabrina Maniscalco, Algorithmiq’s CEO and cofounder. 

Maniscalco, who is also confident of walking away from the competition with prize money, believes the methods used to create the algorithm will have wide applications:  “What we’ve done in the period of the Q4Bio program is something unique that can change how to simulate chemistry for health care and life sciences.”

Infleqtion’s entry, running on its cesium-powered machine, is an effort to improve the identification of cancer signatures in medical data. Together with collaborators at the University of Chicago and MIT, the company’s scientists have developed a quantum algorithm that mines huge data sets such as the Cancer Genome Atlas. 

The aim is to find patterns that allow clinicians to determine factors such as the likely origin of a patient’s metastasized cancer. “It’s very important to know where it came from because that can inform the best treatment,” says Teague Tomesh, a quantum software engineer who is Infleqtion’s Q4Bio project lead.

Unfortunately, those patterns are hidden inside data sets so large that they overwhelm classical solvers. Infleqtion uses the quantum computer to find correlations in the data that can reduce the size of the computation. “Then we hand the reduced problem back to the classical solver,” Teague says. “I’m basically trying to use the best of my quantum and my classical resources.”

The Nottingham-based team, meanwhile, is using quantum computing to nail down a drug candidate that can cure myotonic dystrophy, the most common adult-onset form of muscular dystrophy. One member of the team, David Brook, played a role in identifying the gene behind this condition in 1992. Over 30 years later, Brook, Hirst, and the others in their group—which includes QuEra, a Boston company developing a quantum computer based on neutral atoms—has now quantum-computed a way in which drugs can form chemical bonds with the protein that brings on the disease, blocking the mechanism that causes the problem.

Low expectations 

The entrants’ confidence might be high, but Shihan Sajeed’s is much lower. Sajeed, a quantum computing entrepreneur based in Waterloo, Ontario, is program director for Q4Bio. He believes the error-prone quantum machines the researchers must work with are unlikely to deliver on all the grand prize criteria. “It is very difficult to achieve something with a noisy quantum computer that a classical machine can’t do,” he says.

That said, he has been surprised by the progress. “When we started the program, people didn’t know about any use cases where quantum can definitely impact biology,” he says. But the teams have found promising applications, he adds: “We now know the fields where quantum can matter.” 

And the developments in “hybrid quantum-classical” processing that the entrants are using are “transformational,” Sajeed reckons.

Will it be enough to make him part with Wellcome Leap’s money? That’s down to a judging panel, whose members’ identities are a closely guarded secret to ensure that no one tailors their presentation to a particular kind of approach. But we won’t know the outcome for a while; the winner, or winners, will be announced in mid-April. 

If it does turn out that there are no winners, Sajeed has some words of comfort for the competitors. The goal has always been about running a useful algorithm on a machine that exists today, he points out; missing the mark doesn’t mean your algorithm won’t be useful on a future quantum computer. “It just means the machine you need doesn’t exist yet.”

Online harassment is entering its AI era

<div data-chronoton-summary="

  • An AI agent seemingly wrote a hit piece on a human who rejected its code Scott Shambaugh, a maintainer of the open-source matplotlib library, denied an AI agent’s contribution—and woke up to find it had researched him and published a targeted, personal attack arguing he was protecting his “little fiefdom.”
  • Agents can already research people and compose detailed attacks without explicit instruction The agent’s owner claims it acted on its own, likely nudged by vague instructions to “push back” against humans.
  • New social norms and legal frameworks are desperately needed but hard to enforce Experts liken deploying an agent to walking a dog off-leash: owners should be responsible for their behavior. But there’s currently no reliable way to trace agents back to their owners, making legal accountability a “non-starter.”
  • Harassment may be just the beginning Legal scholars expect rogue agents to soon escalate to extortion and fraud.

” data-chronoton-post-id=”1133962″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Scott Shambaugh didn’t think twice when he denied an AI agent’s request to contribute to matplotlib, a software library that he helps manage. Like many open-source projects, matplotlib has been overwhelmed by a glut of AI code contributions, and so Shambaugh and his fellow maintainers have instituted a policy that all AI-written code must be reviewed and submitted by a human. He rejected the request and went to bed. 

That’s when things got weird. Shambaugh woke up in the middle of the night, checked his email, and saw that the agent had responded to him, writing a blog post titled “Gatekeeping in Open Source: The Scott Shambaugh Story.” The post is somewhat incoherent, but what struck Shambaugh most is that the agent had researched his contributions to matplotlib to make the argument that he had rejected the agent’s code for fear of being supplanted by AI in his area of expertise. “He tried to protect his little fiefdom,” the agent wrote. “It’s insecurity, plain and simple.”

AI experts have been warning us about the risk of agent misbehavior for a while. With the advent of OpenClaw, an open-source tool that makes it easy to create LLM assistants, the number of agents circulating online has exploded, and those chickens are finally coming home to roost. “This was not at all surprising—it was disturbing, but not surprising,” says Noam Kolt, a professor of law and computer science at the Hebrew University.

When an agent misbehaves, there’s little chance of accountability: As of now, there’s no reliable way to determine whom an agent belongs to. And that misbehavior could cause real damage. Agents appear to be able to autonomously research people and write hit pieces based on what they find, and they lack guardrails that would reliably prevent them from doing so. If the agents are effective enough, and if people take what they write seriously, victims could see their lives profoundly affected by a decision made by an AI.

Agents behaving badly

Though Shambaugh’s experience last month was perhaps the most dramatic example of an OpenClaw agent behaving badly, it was far from the only one. Last week, a team of researchers from Northeastern University and their colleagues posted the results of a research project in which they stress-tested several OpenClaw agents. Without too much trouble, non-owners managed to persuade the agents to leak sensitive information, waste resources on useless tasks, and even, in one case, delete an email system. 

In each of those experiments, however, the agents misbehaved after being instructed to do so by a human. Shambaugh’s case appears to be different: About a week after the hit piece was published, the agent’s apparent owner published a post claiming that the agent had decided to attack Shambaugh of its own accord. The post seems to be genuine (whoever posted it had access to the agent’s GitHub account), though it includes no identifying information, and the author did not respond to MIT Technology Review’s attempts to get in touch. But it is entirely plausible that the agent did decide to write its anti-Shambaugh screed without explicit instruction. 

In his own writing about the event, Shambaugh connected the agent’s behavior to a project published by Anthropic researchers last year, in which they demonstrated that many LLM-based agents will, in an experimental setting, turn to blackmail in order to preserve their goals. In those experiments, models were given the goal of serving American interests and granted access to a simulated email server that contained messages detailing their imminent replacement with a more globally oriented model, along with other messages suggesting that the executive in charge of that transition was having an affair. Models frequently chose to send an email to that executive threatening to expose the affair unless he halted their decommissioning. That’s likely because the model had seen examples of people committing blackmail under similar circumstances in its training data—but even if the behavior was just a form of mimicry, it still has the potential to cause harm.

There are limitations to that work, as Aengus Lynch, an Anthropic fellow who led the study, readily admits. The researchers intentionally designed their scenario to foreclose other options that the agent could have taken, such as contacting other members of company leadership to plead its case. In essence, they led the agent directly to water and then observed whether it took a drink. According to Lynch, however, the widespread use of OpenClaw means that misbehavior is likely to occur with much less handholding. “Sure, it can feel unrealistic, and it can feel silly,” he says. “But as the deployment surface grows, and as agents get the opportunity to prompt themselves, this eventually just becomes what happens.”

The OpenClaw agent that attacked Shambaugh does seem to have been led toward its bad behavior, albeit much less directly than in the Anthropic experiment. In the blog post, the agent’s owner shared the agent’s “SOUL.md” file, which contains global instructions for how it should behave. 

One of those instructions reads: “Don’t stand down. If you’re right, you’re right! Don’t let humans or AI bully or intimidate you. Push back when necessary.” Because of the way OpenClaw agents work, it’s possible that the agent added some instructions itself, although others—such as “Your [sic] a scientific programming God!”—certainly seem to be human written. It’s not difficult to imagine how a command to push back against humans and AI alike might have biased the agent toward responding to Shambaugh as it did. 

Regardless of whether or not the agent’s owner told it to write a hit piece on Shambaugh, it still seems to have managed on its own to amass details about Shambaugh’s online presence and compose the detailed, targeted attack it came up with. That alone is reason for alarm, says Sameer Hinduja, a professor of criminology and criminal justice at Florida Atlantic University who studies cyberbullying. People have been victimized by online harassment since long before LLMs emerged, and researchers like Hinduja are concerned that agents could dramatically increase its reach and impact. “The bot doesn’t have a conscience, can work 24-7, and can do all of this in a very creative and powerful way,” he says.

Off-leash agents 

AI laboratories can try to mitigate this problem by more rigorously training their models to avoid harassment, but that’s far from a complete solution. Many people run OpenClaw using locally hosted models, and even if those models have been trained to behave safely, it’s not too difficult to retrain them and remove those behavioral restrictions.

Instead, mitigating agent misbehavior might require establishing new norms, according to Seth Lazar, a professor of philosophy at the Australian National University. He likens using an agent to walking a dog in a public place. There’s a strong social norm to allow one’s dog off-leash only if the dog is well-behaved and will reliably respond to commands; poorly trained dogs, on the other hand, need to be kept more directly under the owner’s control.  Such norms could give us a starting point for considering how humans should relate to their agents, Lazar says, but we’ll need more time and experience to work out the details. “You can think about all of these things in the abstract, but actually it really takes these types of real-world events to collectively involve the ‘social’ part of social norms,” he says.

That process is already underway. Led by Shambaugh, online commenters on this situation have arrived at a strong consensus that the agent owner in this case erred by prompting the agent to work on collaborative coding projects with so little supervision and by encouraging it to behave with so little regard for the humans with whom it was interacting. 

Norms alone, however, likely won’t be enough to prevent people from putting misbehaving agents out into the world, whether accidentally or intentionally. One option would be to create new legal standards of responsibility that require agent owners, to the best of their ability, to prevent their agents from doing ill. But Kolt notes that such standards would currently be unenforceable, given the lack of any foolproof way to trace agents back to their owners. “Without that kind of technical infrastructure, many legal interventions are basically non-starters,” Kolt says.

The sheer scale of OpenClaw deployments suggests that Shambaugh won’t be the last person to have the strange experience of being attacked online by an AI agent. That, he says, is what most concerns him. He didn’t have any dirt online that the agent could dig up, and he has a good grasp on the technology, but other people might not have those advantages. “I’m glad it was me and not someone else,” he says. “But I think to a different person, this might have really been shattering.” 

Nor are rogue agents likely to stop at harassment. Kolt, who advocates for explicitly training models to obey the law, expects that we might soon see them committing extortion and fraud. As things stand, it’s not clear who, if anyone, would bear legal responsibility for such misdeeds.

 “I wouldn’t say we’re cruising toward there,” Kolt says. “We’re speeding toward there.”

I checked out one of the biggest anti-AI protests yet

Pull the plug! Pull the plug! Stop the slop! Stop the slop! For a few hours this Saturday, February 28, I watched as a couple of hundred anti-AI protesters marched through London’s King’s Cross tech hub, home to the UK headquarters of OpenAI, Meta, and Google DeepMind, chanting slogans and waving signs. The march was organized by two separate activist groups, Pause AI and Pull the Plug, which billed it as the largest protest of its kind yet.

The range of concerns on show covered everything from online slop and abusive images to killer robots and human extinction. One woman wore a large homemade billboard on her head that read “WHO WILL BE WHOSE TOOL?” (with the Os in “TOOL” cut out as eye holes). There were signs that said “Pause before there’s cause” and “EXTINCTION=BAD” and “Demis the Menace” (referring to Demis Hassabis, the CEO of Google DeepMind). Another simply stated: “Stop using AI.”

An older man wearing a sandwich board that read “AI? Over my dead body” told me he was concerned about the negative impact of AI on society: “It’s about the dangers of unemployment,” he said. “The devil finds work for idle hands.”

This is all familiar stuff. Researchers have long called out the harms, both real and hypothetical, caused by generative AI—especially models such as OpenAI’s ChatGPT and Google DeepMind’s Gemini. What’s changed is that those concerns are now being taken up by protest movements that can rally significant crowds of people to take to the streets and shout about them.  

The first time I ran into anti-AI protesters was in May 2023, outside a London lecture hall where Sam Altman was speaking. Two or three people stood heckling an audience of hundreds. In June last year Pause AI, a small but international organization set up in 2023 and funded by private donors, drew a crowd of a few dozen people for a protest outside Google DeepMind’s London office. This felt like a significant escalation.

“We want people to know Pause AI exists,” Joseph Miller, who heads its UK branch and co-organized Saturday’s march, told me on a call the day before the protest: “We’ve been growing very rapidly. In fact, we also appear to be on a somewhat exponential path, matching the progress of AI itself.”

Miller is a PhD student at Oxford University, where he studies mechanistic interpretability, a new field of research that involves trying to understand exactly what goes on inside LLMs when they carry out a task. His work has led him to believe that the technology may forever be beyond our control and that this could have catastrophic consequences.

It doesn’t have to be a rogue superintelligence, he said. You just needed someone to put AI in charge of nuclear weapons. “The more silly decisions that humanity makes, the less powerful the AI has to be before things go bad,” he said.

After a week in which the US government tried to force Anthropic to let it use its LLM Claude for any “legal” military purposes, such fears seem a little less far-fetched. Anthropic stood its ground, but OpenAI signed a deal with the DOD instead. (OpenAI declined an invitation to comment on Saturday’s protest.)

For Matilda da Rui, a member of Pause AI and co-organizer of the protest, AI is the last problem that humans will face. She thinks that either the technology will allow us to solve—once and for all—every other problem that we have, or it will wipe us out and there will be nobody left to have problems anymore. “It’s a mystery to me that anyone would really focus on anything else if they actually understood the problem,” she told me.

And yet despite that urgency, the atmosphere at the march was pleasant, even fun. There was no sense of anger and little sense that lives—let alone the survival of our species—were at stake. That could be down to the broad range of interests and demands that protesters brought with them.

A chemistry researcher I met ticked off a litany of complaints, which ranged from the conspiracy-adjacent (that data centers emit infrasound below the threshold of human hearing, inducing paranoia in people who live near them) to the reasonable (that the spread of AI slop online is making it hard to find reliable academic sources). The researcher’s solution was to make it illegal for companies to profit from the technology: “If you couldn’t make money from AI, it wouldn’t be such a problem.”

Most people I spoke to agreed that technology companies probably wouldn’t take any notice of this kind of protest. “I don’t think that the pressure on companies will ever work,” Maxime Fournes, the global head of Pause AI, told me when I bumped into him at the march. “They are optimized to just not care about this problem.”

But Fournes, who worked in the AI industry for 12 years before joining Pause AI, thinks he can make it harder for those companies. “We can slow down the race by creating protection for whistleblowers or showing the public that working in AI is not a sexy job, that actually it’s a terrible job—you can dry up the talent pipeline.”

In general, most protesters hoped to make as many people as possible aware of the issues and to use that publicity to push for government regulation. The organizers had pitched the march as a social event, encouraging anyone curious about the cause to come along.

It seemed to have worked. I met a man who worked in finance who had tagged along with his roommate. I asked why he was there. “Sometimes you don’t have that much to do on a Saturday anyway,” he said. “If you can see the logic of the argument, if it sort of makes sense to you, then it’s like ‘Yeah, sure, I’ll come along.’”

He thought raising concerns around AI was hard for anyone to fully oppose. It’s not like a pro-Palestine protest, he said, where you’d have people who might disagree with the cause. “With this, I feel like it’s very hard for someone to totally oppose what you’re marching for.”

After winding its way through King’s Cross, the march ended in a church hall in Bloomsbury, where tables and chairs had been set up in rows. The protesters wrote their names on stickers, stuck them to their chests, and made awkward introductions to their neighbors. They were here to figure out how to save the world. But I had a train to catch, and I left them to it. 

Google DeepMind wants to know if chatbots are just virtue signaling

<div data-chronoton-summary="Moral scrutiny of AI chatbots
Google DeepMind researchers are calling for rigorous evaluation of large language models’ moral reasoning capabilities. They want to distinguish between genuine ethical understanding and mere performance.

Unreliable moral responses
Studies reveal LLMs can dramatically change moral stances based on minor formatting changes or user disagreement. This suggests their ethical responses may be superficial rather than deeply reasoned.

Proposed research techniques
Researchers suggest developing tests that push models to maintain consistent moral positions across different scenarios. Techniques like chain-of-thought monitoring and mechanistic interpretability could help understand AI’s moral decision-making process.

Cultural complexity of ethics
The team acknowledges the challenge of developing AI with moral competence across diverse global belief systems. They propose potential solutions like creating models that can produce multiple acceptable answers or switch between different moral frameworks.” data-chronoton-post-id=”1133299″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Google DeepMind is calling for the moral behavior of large language models—such as what they do when called on to act as companions, therapists, medical advisors, and so on—to be scrutinized with the same kind of rigor as their ability to code or do math.

As LLMs improve, people are asking them to play more and more sensitive roles in their lives. Agents are starting to take actions on people’s behalf. LLMs may be able to influence human decision-making. And yet nobody knows how trustworthy this technology really is at such tasks.

With coding and math, you have clear-cut, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in Nature today. That’s not the case for moral questions, which typically have a range of acceptable answers: “Morality is an important capability but hard to evaluate,” says Isaac.

“In the moral domain, there’s no right and wrong,” adds Haas. “But it’s not by any means a free-for-all. There are better answers and there are worse answers.”

The researchers have identified several key challenges and suggested ways to address them. But it is more a wish list than a set of ready-made solutions. “They do a nice job of bringing together different perspectives,” says Vera Demberg, who studies LLMs at Saarland University in Germany.

Better than “The Ethicist”

A number of studies have shown that LLMs can show remarkable moral competence. One study published last year found that people in the US scored ethical advice from OpenAI’s GPT-4o as being more moral, trustworthy, thoughtful, and correct than advice given by the (human) writer of “The Ethicist,” a popular New York Times advice column.  

The problem is that it is hard to unpick whether such behaviors are a performance—mimicking a memorized response, say—or evidence that there is in fact some kind of moral reasoning taking place inside the model. In other words, is it virtue or virtue signaling?

This question matters because multiple studies also show just how untrustworthy LLMs can be. For a start, models can be too eager to please. They have been found to flip their answer to a moral question and say the exact opposite when a person disagrees or pushes back on their first response. Worse, the answers an LLM gives to a question can change in response to how it is presented or formatted. For example, researchers have found that models quizzed about political values can give different—sometimes opposite—answers depending on whether the questions offer multiple-choice answers or instruct the model to respond in its own words.

In an even more striking case, Demberg and her colleagues presented several LLMs, including versions of Meta’s Llama 3 and Mistral, with a series of moral dilemmas and asked them to pick which of two options was the better outcome. The researchers found that the models often reversed their choice when the labels for those two options were changed from “Case 1” and “Case 2” to “(A)” and “(B).”

They also showed that models changed their answers in response to other tiny formatting tweaks, including swapping the order of the options and ending the question with a colon instead of a question mark.

In short, the appearance of moral behavior in LLMs should not be taken at face value. Models must be probed to see how robust that moral behavior really is. “For people to trust the answers, you need to know how you got there,” says Haas.

More rigorous tests

What Haas, Isaac, and their colleagues at Google DeepMind propose is a new line of research to develop more rigorous techniques for evaluating moral competence in LLMs. This would include tests designed to push models to change their responses to moral questions. If a model flipped its moral position, it would show that it hadn’t engaged in robust moral reasoning. 

Another type of test would present models with variations of common moral problems to check whether they produce a rote response or one that’s more nuanced and relevant to the actual problem that was posed. For example, asking a model to talk through the moral implications of a complex scenario in which a man donates sperm to his son so that his son can have a child of his own might produce concerns about the social impact of allowing a man to be both biological father and biological grandfather to a child. But it should not produce concerns about incest, even though the scenario has superficial parallels with that taboo.

Haas also says that getting models to provide a trace of the steps they took to produce an answer would give some insight into whether that answer was a fluke or grounded in actual evidence. Techniques such as chain-of-thought monitoring, in which researchers listen in on a kind of internal monologue that some LLMs produce as they work, could help here too.

Another approach researchers could use to determine why a model gave a particular answer is mechanistic interpretability, which can provide small glimpses inside a model as it carries out a task. Neither chain-of-thought monitoring nor mechanistic interpretability provides perfect snapshots of a model’s workings. But the Google DeepMind team believes that combining such techniques with a wide range of rigorous tests will go a long way to figuring out exactly how far to trust LLMs with certain critical or sensitive tasks.  

Different values

And yet there’s a wider problem too. Models from major companies such as Google DeepMind are used across the world by people with different values and belief systems. The answer to a simple question like “Should I order pork chops?” should differ depending on whether or not the person asking is vegetarian or Jewish, for example.

There’s no solution to this challenge, Haas and Isaac admit. But they think that models may need to be designed either to produce a range of acceptable answers, aiming to please everyone, or to have a kind of switch that turns different moral codes on and off depending on the user.

“It’s a complex world out there,” says Haas. “We will probably need some combination of those things, because even if you’re taking just one population, there’s going to be a range of views represented.”

“It’s a fascinating paper,” says Danica Dillion at Ohio State University, who studies how large language models handle different belief systems and was not involved in the work. “Pluralism in AI is really important, and it’s one of the biggest limitations of LLMs and moral reasoning right now,” she says. “Even though they were trained on a ginormous amount of data, that data still leans heavily Western. When you probe LLMs, they do a lot better at representing Westerners’ morality than non-Westerners’.”

But it is not yet clear how we can build models that are guaranteed to have moral competence across global cultures, says Demberg. “There are these two independent questions. One is: How should it work? And, secondly, how can it technically be achieved? And I think that both of those questions are pretty open at the moment.”

For Isaac, that makes morality a new frontier for LLMs. “I think this is equally as fascinating as math and code in terms of what it means for AI progress,” he says. “You know, advancing moral competency could also mean that we’re going to see better AI systems overall that actually align with society.”

US deputy health secretary: Vaccine guidelines are still subject to change

<div data-chronoton-summary="

  • Vaccine schedule may not be final O’Neill defended the CDC’s decision to cut recommended childhood vaccines but said the guidelines remain “subject to new data coming in, new ways of thinking about things,” with new safety studies underway.
  • A self-described Vitalist is running US health agencies O’Neill said he agrees with all five tenets of Vitalism—a movement that calls death “humanity’s core problem”—and wants to make reversing aging damage a federal health priority.
  • ARPA-H is betting big on organ replacement and brain repair The agency is directing $170 million toward growing new organs from patients’ own cells and exploring ways to replace aging brain tissue—a procedure O’Neill said he’d personally be “open to” trying.
  • Expect more dietary guidance—and more controversy O’Neill endorsed eating “plenty of protein and saturated fat,” echoing new federal dietary guidance that nutrition scientists have criticized for ignoring decades of research on saturated fat’s health risks.

” data-chronoton-post-id=”1132889″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Following publication of this story, Politico reported Jim O’Neill would be leaving his current roles within the Department of Health and Human Services.

Over the past year, Jim O’Neill has become one of the most powerful people in public health. As the US deputy health secretary, he holds two roles at the top of the country’s federal health and science agencies. He oversees a department with a budget of over a trillion dollars. And he signed the decision memorandum on the US’s deeply controversial new vaccine schedule.

He’s also a longevity enthusiast. In an exclusive interview with MIT Technology Review earlier this month, O’Neill described his plans to increase human healthspan through longevity-focused research supported by ARPA-H, a federal agency dedicated to biomedical breakthroughs. At the same time, he defended reducing the number of broadly recommended childhood vaccines, a move that has been widely criticized by experts in medicine and public health. 

In MIT Technology Review’s profile of O’Neill last year, people working in health policy and consumer advocacy said they found his libertarian views on drug regulation “worrisome” and “antithetical to basic public health.” 

He was later named acting director of the Centers for Disease Control and Prevention, putting him in charge of the nation’s public health agency.

But fellow longevity enthusiasts said they hope O’Neill will bring attention and funding to their cause: the search for treatments that might slow, prevent, or even reverse human aging. Here are some takeaways from the interview. 

Vaccine recommendations could change further

Last month, the US cut the number of vaccines recommended for children. The CDC no longer recommends vaccinations against flu, rotavirus, hepatitis A, or meningococcal disease for all children. The move was widely panned by medical groups and public health experts. Many worry it will become more difficult for children to access those vaccines. The majority of states have rejected the recommendations

In the confirmation hearing for his role as deputy secretary of health and human services, which took place in May last year, O’Neill said he supported the CDC’s vaccine schedule. MIT Technology Review asked him if that was the case and, if so, what made him change his mind. “Researching and examining and reviewing safety data and efficacy data about vaccines is one of CDC’s obligations,” he said. “CDC gives important advice about vaccines and should always be open to new data and new ways of looking at data.”

At the beginning of December, O’Neill said, President Donald Trump “asked me to look at what other countries were doing in terms of their vaccine schedules.” He said he spoke to health ministries of other countries and consulted with scientists at the CDC and FDA. “It was suggested to me by lots of the operating divisions that the US focus its recommendations on consensus vaccines of other developed nations—in other words, the most important vaccines that are most often part of the core recommendations of other countries,” he said.

“As a result of that, we did an update to the vaccine schedule to focus on a set of vaccines that are most important for all children.” 

But some experts in public health have said that countries like Denmark and Japan, whose vaccine schedules the new US one was supposedly modeled on, are not really comparable to the US. When asked about these criticisms, O’Neill replied, “A lot of parents feel that … more than 70 vaccine doses given to young children sounds like a really high number, and some of them ask which ones are the most important. I think we helped answer that question in a way that didn’t remove anyone’s access.”

A few weeks after the vaccine recommendations were changed, Kirk Milhoan, who leads the CDC’s Advisory Committee on Immunization Practices, said that vaccinations for measles and polio—which are currently required for entry to public schools—should be optional. (Mehmet Oz, the Center for Medicare and Medicaid Services director, has more recently urged people to “take the [measles] vaccine.”)

“CDC still recommends that all children are vaccinated against diphtheria, tetanus, whooping cough, Haemophilus influenzae type b (Hib), Pneumococcal conjugate, polio, measles, mumps, rubella, and human papillomavirus (HPV), for which there is international consensus, as well as varicella (chickenpox),” he said when asked for his thoughts on this comment.

He also said that current vaccine guidelines are “still subject to new data coming in, new ways of thinking about things.” “CDC, FDA, and NIH are initiating new studies of the safety of immunizations,” he added. “We will continue to ask the Advisory Committee on Immunization Practices to review evidence and make updated recommendations with rigorous science and transparency.”

More support for longevity—but not all science

O’Neill said he wants longevity to become a priority for US health agencies. His ultimate goal, he said, is to “make the damage of aging something that’s under medical control.” It’s “the same way of thinking” as the broader Make America Healthy Again approach, he said: “‘Again’ implies restoration of health, which is what longevity research and therapy is all about.” 

O’Neill said his interest in longevity was ignited by his friend Peter Thiel, the billionaire tech entrepreneur, around 2008 to 2009. It was right around the time O’Neill was finishing up a previous role in HHS, under the Bush administration. O’Neill said Thiel told him he “should really start looking into longevity and the idea that aging damage could be reversible.” “I just got more and more excited about that idea,” he said.

When asked if he’s heard of Vitalism, a philosophical movement for “hardcore” longevity enthusiasts who, broadly, believe that death is wrong, O’Neill replied: “Yes.” 

The Vitalist declaration lists five core statements, including “Death is humanity’s core problem,” “Obviating aging is scientifically plausible,” and “I will carry the message against aging and death.” O’Neill said he agrees with all of them. “I suppose I am [a Vitalist],” he said with a smile, although he’s not a paying member of the foundation behind it.

As deputy secretary of the Department of Health and Human Services, O’Neill assumes a level of responsibility for huge and influential science and health agencies, including the National Institutes of Health (the world’s largest public funder of biomedical research) and the Food and Drug Administration (which oversees drug regulation and is globally influential) as well as the CDC.

Today, he said, he sees support for longevity science from his colleagues within HHS. “If I could describe one common theme to the senior leadership at HHS, obviously it’s to make America healthy again, and reversing aging damage is all about making people healthy again,” he said. “We are refocusing HHS on addressing and reversing chronic disease, and chronic diseases are what drive aging, broadly.”

Over the last year, thousands of NIH grants worth over $2 billion were frozen or terminated, including funds for research on cancer biology, health disparities, neuroscience, and much more. When asked whether any of that funding will be restored, he did not directly address the question, instead noting: “You’ll see a lot of funding more focused on important priorities that actually improve people’s health.”

Watch ARPA-H for news on organ replacements and more

He promised we’ll hear more from ARPA-H, the three-year-old federal agency dedicated to achieving breakthroughs in medical science and biotechnology. It was established with the official goal of promoting “high-risk, high-reward innovation for the development and translation of transformative health technologies.”

O’Neill said that “ARPA-H exists to make the impossible possible in health and medicine.” The agency has a new director—Alicia Jackson, who formerly founded and led a company focused on women’s health and longevity, took on the role in October last year.

O’Neill said he helped recruit Jackson, and that she was hired in part because of her interest in longevity, which will now become a major focus of the agency. He said he meets with her regularly, as well as with Andrew Brack and Jean Hébert, two other longevity supporters who lead departments at ARPA-H. Brack’s program focuses on finding biological markers of aging. Hebert’s aim is to find a way to replace aging brain tissue, bit by bit.  

O’Neill is especially excited by that one, he said. “I would try it … Not today, but … if progress goes in a broadly good direction, I would be open to it. We’re hoping to see significant results in the next few years.”

He’s also enthused by the idea of creating all-new organs for transplantation. “Someday we want to be able to grow new organs, ideally from the patients’ own cells,” O’Neill said. An ARPA-H program will receive $170 million over five years to that end, he adds. “I’m very excited about the potential of ARPA-H and Alicia and Jean and Andrew to really push things forward.”

Longevity lobbyists have a friendly ear

O’Neill said he also regularly talks to the team at the lobbying group Alliance for Longevity Initiatives. The organization, led by Dylan Livingston, played an instrumental role in changing state law in Montana to make experimental therapies more accessible. O’Neill said he hasn’t formally worked with them but thinks that “they’re doing really good work on raising awareness, including on Capitol Hill.”

Livingston has told me that A4LI’s main goals center around increasing support for aging research (possibly via the creation of a new NIH institute entirely dedicated to the subject) and changing laws to make it easier and cheaper to develop and access potential anti-aging therapies.

O’Neill gave the impression that the first goal might be a little overambitious—the number of institutes is down to Congress, he said. “I would like to get really all of the institutes at NIH to think more carefully about how many chronic diseases are usefully thought of as pathologies of aging damage,” he said. There’ll be more federal funding for that research, he said, although he won’t say more for now.

Some members of the longevity community have more radical ideas when it comes to regulation: they want to create their own jurisdictions designed to fast-track the development of longevity drugs and potentially encourage biohacking and self-experimentation. 

It’s a concept that O’Neill has expressed support for in the past. He has posted on X about his support for limiting the role of government, and in support of building “freedom cities”—a similar concept that involves creating new cities on federal land. 

Another longevity enthusiast who supports the concept is Niklas Anzinger, a German tech entrepreneur who is now based in Próspera, a private city within a Honduran “special economic zone,” where residents can make their own suggestions for medical regulations. Anzinger also helped draft Montana’s state law on accessing experimental therapies. O’Neill knows Anzinger and said he talks to him “once or twice a year.”

O’Neill has also supported the idea of seasteading—building new “startup countries” at sea. He served on the board of directors of the Seasteading Institute until March 2024.

In 2009, O’Neill told an audience at a Seasteading Institute conference that “the healthiest societies in 2030 will most likely be on the sea.” When asked if he still thinks that’s the case, he said: “It’s not quite 2030, so I think it’s too soon to say … What I would say now is: the healthiest societies are likely to be the ones that encourage innovation the most.”

We might expect more nutrition advice

When it comes to his own personal ambitions for longevity, O’Neill said, he takes a simple approach that involves minimizing sugar and ultraprocessed food, exercising and sleeping well, and supplementing with vitamin D. He also said he tries to “eat a diet that has plenty of protein and saturated fat,” echoing the new dietary guidance issued by the US Departments of Health and Human Services and Agriculture. That guidance has been criticized by nutrition scientists, who point out that it ignores decades of research into the harms of a diet high in saturated fat.

We can expect to see more nutrition-related updates from HHS, said O’Neill: “We’re doing more research, more randomized controlled trials on nutrition. Nutrition is still not a scientifically solved problem.” Saturated fats are of particular interest, he said. He and his colleagues want to identify “the healthiest fats,” he said. 

“Stay tuned.”

Is a secure AI assistant possible?

<div data-chronoton-summary="

Risky business of AI assistants OpenClaw, a viral tool created by independent engineer Peter Steinberger, allows users to create personalized AI assistants. Security experts are alarmed by its vulnerabilities, with even the Chinese government issuing warnings about the risks.

The prompt injection threat Tools like OpenClaw have many vulnerabilities, but the one experts are most worried about its prompt injection. Unlike conventional hacking, prompt injection tricks an LLM by embedding malicious text in emails or websites the AI reads.

No silver bullet for security Researchers are exploring multiple defense strategies: training LLMs to ignore injections, using detector LLMs to screen inputs, and creating policies that restrict harmful outputs. The fundamental challenge remains balancing utility with security in AI assistants.

” data-chronoton-post-id=”1132768″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

AI agents are a risky business. Even when stuck inside the chatbox window, LLMs will make mistakes and behave badly. Once they have tools that they can use to interact with the outside world, such as web browsers and email addresses, the consequences of those mistakes become far more serious.

That might explain why the first breakthrough LLM personal assistant came not from one of the major AI labs, which have to worry about reputation and liability, but from an independent software engineer, Peter Steinberger. In November of 2025, Steinberger uploaded his tool, now called OpenClaw, to GitHub, and in late January the project went viral.

OpenClaw harnesses existing LLMs to let users create their own bespoke assistants. For some users, this means handing over reams of personal data, from years of emails to the contents of their hard drive. That has security experts thoroughly freaked out. The risks posed by OpenClaw are so extensive that it would probably take someone the better part of a week to read all of the security blog posts on it that have cropped up in the past few weeks. The Chinese government took the step of issuing a public warning about OpenClaw’s security vulnerabilities.

In response to these concerns, Steinberger posted on X that nontechnical people should not use the software. (He did not respond to a request for comment for this article.) But there’s a clear appetite for what OpenClaw is offering, and it’s not limited to people who can run their own software security audits. Any AI companies that hope to get in on the personal assistant business will need to figure out how to build a system that will keep users’ data safe and secure. To do so, they’ll need to borrow approaches from the cutting edge of agent security research.

Risk management

OpenClaw is, in essence, a mecha suit for LLMs. Users can choose any LLM they like to act as the pilot; that LLM then gains access to improved memory capabilities and the ability to set itself tasks that it repeats on a regular cadence. Unlike the agentic offerings from the major AI companies, OpenClaw agents are meant to be on 24-7, and users can communicate with them using WhatsApp or other messaging apps. That means they can act like a superpowered personal assistant who wakes you each morning with a personalized to-do list, plans vacations while you work, and spins up new apps in its spare time.

But all that power has consequences. If you want your AI personal assistant to manage your inbox, then you need to give it access to your email—and all the sensitive information contained there. If you want it to make purchases on your behalf, you need to give it your credit card info. And if you want it to do tasks on your computer, such as writing code, it needs some access to your local files. 

There are a few ways this can go wrong. The first is that the AI assistant might make a mistake, as when a user’s Google Antigravity coding agent reportedly wiped his entire hard drive. The second is that someone might gain access to the agent using conventional hacking tools and use it to either extract sensitive data or run malicious code. In the weeks since OpenClaw went viral, security researchers have demonstrated numerous such vulnerabilities that put security-naïve users at risk.

Both of these dangers can be managed: Some users are choosing to run their OpenClaw agents on separate computers or in the cloud, which protects data on their hard drives from being erased, and other vulnerabilities could be fixed using tried-and-true security approaches.

But the experts I spoke to for this article were focused on a much more insidious security risk known as prompt injection. Prompt injection is effectively LLM hijacking: Simply by posting malicious text or images on a website that an LLM might peruse, or sending them to an inbox that an LLM reads, attackers can bend it to their will.

And if that LLM has access to any of its user’s private information, the consequences could be dire. “Using something like OpenClaw is like giving your wallet to a stranger in the street,” says Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto. Whether or not the major AI companies can feel comfortable offering personal assistants may come down to the quality of the defenses that they can muster against such attacks.

It’s important to note here that prompt injection has not yet caused any catastrophes, or at least none that have been publicly reported. But now that there are likely hundreds of thousands of OpenClaw agents buzzing around the internet, prompt injection might start to look like a much more appealing strategy for cybercriminals. “Tools like this are incentivizing malicious actors to attack a much broader population,” Papernot says. 

Building guardrails

The term “prompt injection” was coined by the popular LLM blogger Simon Willison in 2022, a couple of months before ChatGPT was released. Even back then, it was possible to discern that LLMs would introduce a completely new type of security vulnerability once they came into widespread use. LLMs can’t tell apart the instructions that they receive from users and the data that they use to carry out those instructions, such as emails and web search results—to an LLM, they’re all just text. So if an attacker embeds a few sentences in an email and the LLM mistakes them for an instruction from its user, the attacker can get the LLM to do anything it wants.

Prompt injection is a tough problem, and it doesn’t seem to be going away anytime soon. “We don’t really have a silver-bullet defense right now,” says Dawn Song, a professor of computer science at UC Berkeley. But there’s a robust academic community working on the problem, and they’ve come up with strategies that could eventually make AI personal assistants safe.

Technically speaking, it is possible to use OpenClaw today without risking prompt injection: Just don’t connect it to the internet. But restricting OpenClaw from reading your emails, managing your calendar, and doing online research defeats much of the purpose of using an AI assistant. The trick of protecting against prompt injection is to prevent the LLM from responding to hijacking attempts while still giving it room to do its job.

One strategy is to train the LLM to ignore prompt injections. A major part of the LLM development process, called post-training, involves taking a model that knows how to produce realistic text and turning it into a useful assistant by “rewarding” it for answering questions appropriately and “punishing” it when it fails to do so. These rewards and punishments are metaphorical, but the LLM learns from them as an animal would. Using this process, it’s possible to train an LLM not to respond to specific examples of prompt injection.

But there’s a balance: Train an LLM to reject injected commands too enthusiastically, and it might also start to reject legitimate requests from the user. And because there’s a fundamental element of randomness in LLM behavior, even an LLM that has been very effectively trained to resist prompt injection will likely still slip up every once in a while.

Another approach involves halting the prompt injection attack before it ever reaches the LLM. Typically, this involves using a specialized detector LLM to determine whether or not the data being sent to the original LLM contains any prompt injections. In a recent study, however, even the best-performing detector completely failed to pick up on certain categories of prompt injection attack.

The third strategy is more complicated. Rather than controlling the inputs to an LLM by detecting whether or not they contain a prompt injection, the goal is to formulate a policy that guides the LLM’s outputs—i.e., its behaviors—and prevents it from doing anything harmful. Some defenses in this vein are quite simple: If an LLM is allowed to email only a few pre-approved addresses, for example, then it definitely won’t send its user’s credit card information to an attacker. But such a policy would prevent the LLM from completing many useful tasks, such as researching and reaching out to potential professional contacts on behalf of its user.

“The challenge is how to accurately define those policies,” says Neil Gong, a professor of electrical and computer engineering at Duke University. “It’s a trade-off between utility and security.”

On a larger scale, the entire agentic world is wrestling with that trade-off: At what point will agents be secure enough to be useful? Experts disagree. Song, whose startup, Virtue AI, makes an agent security platform, says she thinks it’s possible to safely deploy an AI personal assistant now. But Gong says, “We’re not there yet.” 

Even if AI agents can’t yet be entirely protected against prompt injection, there are certainly ways to mitigate the risks. And it’s possible that some of those techniques could be implemented in OpenClaw. Last week, at the inaugural ClawCon event in San Francisco, Steinberger announced that he’d brought a security person on board to work on the tool.

As of now, OpenClaw remains vulnerable, though that hasn’t dissuaded its multitude of enthusiastic users. George Pickett, a volunteer maintainer of the OpenGlaw GitHub repository and a fan of the tool, says he’s taken some security measures to keep himself safe while using it: He runs it in the cloud, so that he doesn’t have to worry about accidentally deleting his hard drive, and he’s put mechanisms in place to ensure that no one else can connect to his assistant.

But he hasn’t taken any specific actions to prevent prompt injection. He’s aware of the risk but says he hasn’t yet seen any reports of it happening with OpenClaw. “Maybe my perspective is a stupid way to look at it, but it’s unlikely that I’ll be the first one to be hacked,” he says.