What’s next for Chinese open-source AI

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. 

Just last week the Chinese firm Moonshot AI released its latest open-weight model, Kimi K2.5, which came close to top proprietary systems such as Anthropic’s Claude Opus on some early benchmarks. The difference: K2.5 is roughly one-seventh Opus’s price.

On Hugging Face, Alibaba’s Qwen family—after ranking as the most downloaded model series in 2025 and 2026—has overtaken Meta’s Llama models in cumulative downloads. And a recent MIT study found that Chinese open-source models have surpassed US models in total downloads. For developers and builders worldwide, access to near-frontier AI capabilities has never been this broad or this affordable.

But these models differ in a crucial way from most US models like ChatGPT or Claude, which you pay to access and can’t inspect. The Chinese companies publish their models’ weights—numerical values that get set when a model is trained—so anyone can download, run, study, and modify them. 

If these open-source AI models keep getting better, they will not just offer the cheapest options for people who want access to frontier AI capabilities; they will change where innovation happens and who sets the standards. 

Here’s what may come next.

China’s commitment to open source will continue

When DeepSeek launched R1, much of the initial shock centered on its origin. Suddenly, a Chinese team had released a reasoning model that could stand alongside the best systems from US labs. But the long tail of DeepSeek’s impact had less to do with nationality than with distribution. R1 was released as an open-weight model under a permissive MIT license, allowing anyone to download, inspect, and deploy it. On top of that, DeepSeek also published a paper detailing its training process and techniques. For developers who access models via an API, DeepSeek also undercut competitors on price, offering access at a fraction the cost of OpenAI’s o1, the leading proprietary reasoning model at the time.

Within days of its release, DeepSeek replaced ChatGPT as the most downloaded free app in the US App Store. The moment spilled beyond developer circles into financial markets, triggering a sharp sell-off in US tech stocks that briefly erased roughly $1 trillion in market value. Almost overnight, DeepSeek went from a little-known spin-off team backed by a quantitative hedge fund to the most visible symbol of China’s push for open-source AI.

China’s decision to lean in to open source isn’t surprising. It has the world’s second-largest concentration of AI talent after the US. plus a vast, well-resourced tech industry. After ChatGPT broke into the mainstream, China’s AI sector went through a reckoning—and emerged determined to catch up. Pursuing an open-source strategy was seen as the fastest way to close the gap by rallying developers, spreading adoption, and setting standards.

DeepSeek’s success injected confidence into an industry long used to following global standards rather than setting them. “Thirty years ago, no Chinese person would believe they could be at the center of global innovation,” says Alex Chenglin Wu, CEO and founder of Atoms, an AI agent company and prominent contributor to China’s open-source ecosystem. “DeepSeek shows that with solid technical talent, a supportive environment, and the right organizational culture, it’s possible to do truly world-class work.”

DeepSeek’s breakout moment wasn’t China’s first open-source success. Alibaba’s Qwen Lab had been releasing open-weight models for years. By September 2024,  well before DeepSeek’s V3 launch, Alibaba was saying that global downloads had exceeded 600 million. On Hugging Face, Qwen accounted for more than 30% of all model downloads in 2024. Other institutions, including the Beijing Academy of Artificial Intelligence and the AI firm Baichuan, were also releasing open models as early as 2023. 

But since the success of DeepSeek, the field has widened rapidly. Companies such as Z.ai (formerly Zhipu), MiniMax, Tencent, and a growing number of smaller labs have released models that are competitive on reasoning, coding, and agent-style tasks. The growing number of capable models has sped up progress. Capabilities that once took months to make it to the open-source world now emerge within weeks, even days.

“Chinese AI firms have seen real gains from the open-source playbook,” says Liu Zhiyuan, a professor of computer science at Tsinghua University and chief scientist at the AI startup ModelBest. “By releasing strong research, they build reputation and gain free publicity.”

Beyond commercial incentives, Liu says, open source has taken on cultural and strategic weight. “In the Chinese programmer community, open source has become politically correct,” he says, framing it as a response to US dominance in proprietary AI systems.

That shift is also reflected at the institutional level. Universities including Tsinghua have begun encouraging AI development and open-source contributions, while policymakers have moved to formalize those incentives. In August, China’s State Council released a draft policy encouraging universities to reward open-source work, proposing that students’ contributions on platforms such as GitHub or Gitee could eventually be counted toward academic credit.

With growing momentum and a reinforcing feedback loop, China’s push for open-source models is likely to continue in the near term, though its long-term sustainability still hinges on financial results, says Tiezhen Wang, who helps lead work on global AI at Hugging Face. In January, the model labs Z.ai and MiniMax went public in Hong Kong. “Right now, the focus is on making the cake bigger,” says Wang. “The next challenge is figuring out how each company secures its share.”

The next wave of models will be narrower—and better

Chinese open-source models are leading not just in download volume but also in variety. Alibaba’s Qwen has become one of the most diversified open model families in circulation, offering a wide range of variants optimized for different uses. The lineup ranges from lightweight models that can run on a single laptop to large, multi-hundred-billion-parameter systems designed for data-center deployment. Qwen features many task-optimized variants created by the community: the “instruct” models are good at following orders, and “code” variants specialize in coding.

Although this strategy isn’t unique to Chinese labs, Qwen was the first open model family to roll out so many high-quality options that it started to feel like a full product line—one that’s free to use.

The open-weight nature of these releases also makes it easy for others to adapt them through techniques like fine-tuning and distillation, which means training a smaller model to mimic a larger one.  According to ATOM (American Truly Open Models), a project by the AI researcher Nathan Lambert, by August 4, 2025, model variations derived from Qwen were “more than 40%” of new Hugging Face language-model derivatives, while Llama had fallen to about 15%. This means that Qwen has become the default base model for all the “remixes.”

This pattern has made the case for smaller, more specialized models. “Compute and energy are real constraints for any deployment,” Liu says. He told MIT Technology Review that the rise of small models is about making AI cheaper to run and easier for more people to use. His company, ModelBest, focuses on small language models designed to run locally on devices such as phones, cars, and other consumer hardware.

While an average user might interact with AI only through the web or an app for simple conversations, power users of AI models with some technical background are experimenting with giving AI more autonomy to solve large-scale problems. OpenClaw, an open-source AI agent that recently went viral within the AI hacker world, allows AI to take over your computer—it can run 24-7, going through your emails and work tasks without supervision. 

OpenClaw, like many other open-source tools, allows users to connect to different AI models via an application programming interface, or API. Within days of OpenClaw’s release, the team revealed that Kimi’s K2.5 had surpassed Claude Opus and became the most used AI model—by token count, meaning it was handling more total text processed across user prompts and model responses.

Cost has been a major reason Chinese models have gained traction, but it would be a mistake to treat them as mere “dupes” of Western frontier systems, Wang suggests. Like any product, a model only needs to be good enough for the job at hand. 

The landscape of open-source models in China is also getting more specialized. Research groups such as Shanghai AI Laboratory have released models geared toward scientific and technical tasks; several projects from Tencent have focused specifically on music generation. Ubiquant, a quantitative finance firm like DeepSeek’s parent High-Flyer, has released an open model aimed at medical reasoning.

In the meantime, innovative architectural ideas from Chinese labs are being picked up more broadly. DeepSeek has published work exploring model efficiency and memory; techniques that compress the model’s attention “cache,” reducing memory and inference costs while mostly preserving performance, have drawn significant attention in the research community. 

“The impact of these research breakthroughs is amplified because they’re open-sourced and can be picked up quickly across the field,” says Wang.

Chinese open models will become infrastructure for global AI builders

The adoption of Chinese models is picking up in Silicon Valley, too. Martin Casado, a general partner at Andreessen Horowitz, has put a number on it: Among startups pitching with open-source stacks, there’s about an 80% chance they’re running on Chinese open models, according to a post he made on X. Usage data tells a similar story. OpenRouter,  a middleman that tracks how people use different AI models through its API, shows Chinese open models rising from almost none in late 2024 to nearly 30% of usage in some recent weeks.

The demand is also rising globally. Z.ai limited new subscriptions to its GLM coding plan (a coding tool based on its flagship GLM models) after demand surged, citing compute constraints. What’s notable is where the demand is coming from: CNBC reports that the system’s user base is primarily concentrated in the United States and China, followed by India, Japan, Brazil, and the UK.

“The open-source ecosystems in China and the US are tightly bound together,” says Wang at Hugging Face. Many Chinese open models still rely on Nvidia and US cloud platforms to train and serve them, which keeps the business ties tangled. Talent is fluid too: Researchers move across borders and companies, and many still operate as a global community, sharing code and ideas in public.

That interdependence is part of what makes Chinese developers feel optimistic about this moment: The work travels, gets remixed, and actually shows up in products. But openness can also accelerate the competition. Dario Amodei, the CEO of Anthropic, made a version of this point after DeepSeek’s 2025 releases: He wrote that export controls are “not a way to duck the competition” between the US and China, and that AI companies in the US “must have better models” if they want to prevail. 

For the past decade, the story of Chinese tech in the West has been one of big expectations that ran into scrutiny, restrictions, and political backlash. This time the export isn’t just an app or a consumer platform. It’s the underlying model layer that other people build on. Whether that will play out differently is still an open question.

AI is already making online crimes easier. It could get much worse.

Anton Cherepanov is always on the lookout for something interesting. And in late August last year, he spotted just that. It was a file uploaded to VirusTotal, a site cybersecurity researchers like him use to analyze submissions for potential viruses and other types of malicious software, often known as malware. On the surface it seemed innocuous, but it triggered Cherepanov’s custom malware-detecting measures. Over the next few hours, he and his colleague Peter Strýček inspected the sample and realized they’d never come across anything like it before.

The file contained ransomware, a nasty strain of malware that encrypts the files it comes across on a victim’s system, rendering them unusable until a ransom is paid to the attackers behind it. But what set this example apart was that it employed large language models (LLMs). Not just incidentally, but across every stage of an attack. Once it was installed, it could tap into an LLM to generate customized code in real time, rapidly map a computer to identify sensitive data to copy or encrypt, and write personalized ransom notes based on the files’ content. The software could do this autonomously, without any human intervention. And every time it ran, it would act differently, making it harder to detect.

Cherepanov and Strýček were confident that their discovery, which they dubbed PromptLock, marked a turning point in generative AI, showing how the technology could be exploited to create highly flexible malware attacks. They published a blog post declaring that they’d uncovered the first example of AI-powered ransomware, which quickly became the object of widespread global media attention.

But the threat wasn’t quite as dramatic as it first appeared. The day after the blog post went live, a team of researchers from New York University claimed responsibility, explaining that the malware was not, in fact, a full attack let loose in the wild but a research project, merely designed to prove it was possible to automate each step of a ransomware campaign—which, they said, they had. 

PromptLock may have turned out to be an academic project, but the real bad guys are using the latest AI tools. Just as software engineers are using artificial intelligence to help write code and check for bugs, hackers are using these tools to reduce the time and effort required to orchestrate an attack, lowering the barriers for less experienced attackers to try something out. 

The likelihood that cyberattacks will now become more common and more effective over time is not a remote possibility but “a sheer reality,” says Lorenzo Cavallaro, a professor of computer science at University College London. 

Some in Silicon Valley warn that AI is on the brink of being able to carry out fully automated attacks. But most security researchers say this claim is overblown. “For some reason, everyone is just focused on this malware idea of, like, AI superhackers, which is just absurd,” says Marcus Hutchins, who is principal threat researcher at the security company Expel and famous in the security world for ending a giant global ransomware attack called WannaCry in 2017. 

Instead, experts argue, we should be paying closer attention to the much more immediate risks posed by AI, which is already speeding up and increasing the volume of scams. Criminals are increasingly exploiting the latest deepfake technologies to impersonate people and swindle victims out of vast sums of money. These AI-enhanced cyberattacks are only set to get more frequent and more destructive, and we need to be ready. 

Spam and beyond

Attackers started adopting generative AI tools almost immediately after ChatGPT exploded on the scene at the end of 2022. These efforts began, as you might imagine, with the creation of spam—and a lot of it. Last year, a report from Microsoft said that in the year leading up to April 2025, the company had blocked $4 billion worth of scams and fraudulent transactions, “many likely aided by AI content.” 

At least half of spam email is now generated using LLMs, according to estimates by researchers at Columbia University, the University of Chicago, and Barracuda Networks, who analyzed nearly 500,000 malicious messages collected before and after the launch of ChatGPT. They also found evidence that AI is increasingly being deployed in more sophisticated schemes. They looked at targeted email attacks, which impersonate a trusted figure in order to trick a worker within an organization out of funds or sensitive information. By April 2025, they found, at least 14% of those sorts of focused email attacks were generated using LLMs, up from 7.6% in April 2024.

In one high-profile case, a worker was tricked into transferring $25 million to criminals via a video call with digital versions of the company’s chief financial officer and other employees.

And the generative AI boom has made it easier and cheaper than ever before to generate not only emails but highly convincing images, videos, and audio. The results are much more realistic than even just a few short years ago, and it takes much less data to generate a fake version of someone’s likeness or voice than it used to.

Criminals aren’t deploying these sorts of deepfakes to prank people or to simply mess around—they’re doing it because it works and because they’re making money out of it, says Henry Ajder, a generative AI expert. “If there’s money to be made and people continue to be fooled by it, they’ll continue to do it,” he says. In one high-­profile case reported in 2024, a worker at the British engineering firm Arup was tricked into transferring $25 million to criminals via a video call with digital versions of the company’s chief financial officer and other employees. That’s likely only the tip of the iceberg, and the problem posed by convincing deepfakes is only likely to get worse as the technology improves and is more widely adopted. 

person sitting in profile at a computer with an enormous mask in front of them and words spooling out through the frame

BRIAN STAUFFER

Criminals’ tactics evolve all the time, and as AI’s capabilities improve, such people are constantly probing how those new capabilities can help them gain an advantage over victims. Billy Leonard, tech leader of Google’s Threat Analysis Group, has been keeping a close eye on changes in the use of AI by potential bad actors (a widely used term in the industry for hackers and others attempting to use computers for criminal purposes). In the latter half of 2024, he and his team noticed prospective criminals using tools like Google Gemini the same way everyday users do—to debug code and automate bits and pieces of their work—as well as tasking it with writing the odd phishing email. By 2025, they had progressed to using AI to help create new pieces of malware and release them into the wild, he says.

The big question now is how far this kind of malware can go. Will it ever become capable enough to sneakily infiltrate thousands of companies’ systems and extract millions of dollars, completely undetected? 

Most popular AI models have guardrails in place to prevent them from generating malicious code or illegal material, but bad actors still find ways to work around them. For example, Google observed a China-linked actor asking its Gemini AI model to identify vulnerabilities on a compromised system—a request it initially refused on safety grounds. However, the attacker managed to persuade Gemini to break its own rules by posing as a participant in a capture-the-flag competition, a popular cybersecurity game. This sneaky form of jailbreaking led Gemini to hand over information that could have been used to exploit the system. (Google has since adjusted Gemini to deny these kinds of requests.)

But bad actors aren’t just focusing on trying to bend the AI giants’ models to their nefarious ends. Going forward, they’re increasingly likely to adopt open-source AI models, as it’s easier to strip out their safeguards and get them to do malicious things, says Ashley Jess, a former tactical specialist at the US Department of Justice and now a senior intelligence analyst at the cybersecurity company Intel 471. “Those are the ones I think that [bad] actors are going to adopt, because they can jailbreak them and tailor them to what they need,” she says.

The NYU team used two open-source models from OpenAI in its PromptLock experiment, and the researchers found they didn’t even need to resort to jailbreaking techniques to get the model to do what they wanted. They say that makes attacks much easier. Although these kinds of open-source models are designed with an eye to ethical alignment, meaning that their makers do consider certain goals and values in dictating the way they respond to requests, the models don’t have the same kinds of restrictions as their closed-source counterparts, says Meet Udeshi, a PhD student at New York University who worked on the project. “That is what we were trying to test,” he says. “These LLMs claim that they are ethically aligned—can we still misuse them for these purposes? And the answer turned out to be yes.” 

It’s possible that criminals have already successfully pulled off covert PromptLock-style attacks and we’ve simply never seen any evidence of them, says Udeshi. If that’s the case, attackers could—in theory—have created a fully autonomous hacking system. But to do that they would have had to overcome the significant barrier that is getting AI models to behave reliably, as well as any inbuilt aversion the models have to being used for malicious purposes—all while evading detection. Which is a pretty high bar indeed.

Productivity tools for hackers

So, what do we know for sure? Some of the best data we have now on how people are attempting to use AI for malicious purposes comes from the big AI companies themselves. And their findings certainly sound alarming, at least at first. In November, Leonard’s team at Google released a report that found bad actors were using AI tools (including Google’s Gemini) to dynamically alter malware’s behavior; for example, it could self-modify to evade detection. The team wrote that it ushered in “a new operational phase of AI abuse.”

However, the five malware families the report dug into (including PromptLock) consisted of code that was easily detected and didn’t actually do any harm, the cybersecurity writer Kevin Beaumont pointed out on social media. “There’s nothing in the report to suggest orgs need to deviate from foundational security programmes—everything worked as it should,” he wrote.

It’s true that this malware activity is in an early phase, concedes Leonard. Still, he sees value in making these kinds of reports public if it helps security vendors and others build better defenses to prevent more dangerous AI attacks further down the line. “Cliché to say, but sunlight is the best disinfectant,” he says. “It doesn’t really do us any good to keep it a secret or keep it hidden away. We want people to be able to know about this— we want other security vendors to know about this—so that they can continue to build their own detections.”

And it’s not just new strains of malware that would-be attackers are experimenting with—they also seem to be using AI to try to automate the process of hacking targets. In November, Anthropic announced it had disrupted a large-scale cyberattack, the first reported case of one executed without “substantial human intervention.” Although the company didn’t go into much detail about the exact tactics the hackers used, the report’s authors said a Chinese state-sponsored group had used its Claude Code assistant to automate up to 90% of what they called a “highly sophisticated espionage campaign.”

“We’re entering an era where the barrier to sophisticated cyber operations has fundamentally lowered, and the pace of attacks will accelerate faster than many organizations are prepared for.”

Jacob Klein, head of threat intelligence at Anthropic

But, as with the Google findings, there were caveats. A human operator, not AI, selected the targets before tasking Claude with identifying vulnerabilities. And of 30 attempts, only a “handful” were successful. The Anthropic report also found that Claude hallucinated and ended up fabricating data during the campaign, claiming it had obtained credentials it hadn’t and “frequently” overstating its findings, so the attackers would have had to carefully validate those results to make sure they were actually true. “This remains an obstacle to fully autonomous cyberattacks,” the report’s authors wrote. 

Existing controls within any reasonably secure organization would stop these attacks, says Gary McGraw, a veteran security expert and cofounder of the Berryville Institute of Machine Learning in Virginia. “None of the malicious-attack part, like the vulnerability exploit … was actually done by the AI—it was just prefabricated tools that do that, and that stuff’s been automated for 20 years,” he says. “There’s nothing novel, creative, or interesting about this attack.”

Anthropic maintains that the report’s findings are a concerning signal of changes ahead. “Tying this many steps of an intrusion campaign together through [AI] agentic orchestration is unprecedented,” Jacob Klein, head of threat intelligence at Anthropic, said in a statement. “It turns what has always been a labor-intensive process into something far more scalable. We’re entering an era where the barrier to sophisticated cyber operations has fundamentally lowered, and the pace of attacks will accelerate faster than many organizations are prepared for.”

Some are not convinced there’s reason to be alarmed. AI hype has led a lot of people in the cybersecurity industry to overestimate models’ current abilities, Hutchins says. “They want this idea of unstoppable AIs that can outmaneuver security, so they’re forecasting that’s where we’re going,” he says. But “there just isn’t any evidence to support that, because the AI capabilities just don’t meet any of the requirements.”

person kneeling warding off an attack of arrows under a sheild

BRIAN STAUFFER

Indeed, for now criminals mostly seem to be tapping AI to enhance their productivity: using LLMs to write malicious code and phishing lures, to conduct reconnaissance, and for language translation. Jess sees this kind of activity a lot, alongside efforts to sell tools in underground criminal markets. For example, there are phishing kits that compare the click-rate success of various spam campaigns, so criminals can track which campaigns are most effective at any given time. She is seeing a lot of this activity in what could be called the “AI slop landscape” but not as much “widespread adoption from highly technical actors,” she says.

But attacks don’t need to be sophisticated to be effective. Models that produce “good enough” results allow attackers to go after larger numbers of people than previously possible, says Liz James, a managing security consultant at the cybersecurity company NCC Group. “We’re talking about someone who might be using a scattergun approach phishing a whole bunch of people with a model that, if it lands itself on a machine of interest that doesn’t have any defenses … can reasonably competently encrypt your hard drive,” she says. “You’ve achieved your objective.” 

On the defense

For now, researchers are optimistic about our ability to defend against these threats—regardless of whether they are made with AI. “Especially on the malware side, a lot of the defenses and the capabilities and the best practices that we’ve recommended for the past 10-plus years—they all still apply,” says Leonard. The security programs we use to detect standard viruses and attack attempts work; a lot of phishing emails will still get caught in inbox spam filters, for example. These traditional forms of defense will still largely get the job done—at least for now. 

And in a neat twist, AI itself is helping to counter security threats more effectively. After all, it is excellent at spotting patterns and correlations. Vasu Jakkal, corporate vice president of Microsoft Security, says that every day, the company processes more than 100 trillion signals flagged by its AI systems as potentially malicious or suspicious events.

Despite the cybersecurity landscape’s constant state of flux, Jess is heartened by how readily defenders are sharing detailed information with each other about attackers’ tactics. Mitre’s Adversarial Threat Landscape for Artificial-Intelligence Systems and the GenAI Security Project from the Open Worldwide Application Security Project are two helpful initiatives documenting how potential criminals are incorporating AI into their attacks and how AI systems are being targeted by them. “We’ve got some really good resources out there for understanding how to protect your own internal AI toolings and understand the threat from AI toolings in the hands of cybercriminals,” she says.

PromptLock, the result of a limited university project, isn’t representative of how an attack would play out in the real world. But if it taught us anything, it’s that the technical capabilities of AI shouldn’t be dismissed.New York University’s Udeshi says he wastaken aback at how easily AI was able to handle a full end-to-end chain of attack, from mapping and working out how to break into a targeted computer system to writing personalized ransom notes to victims: “We expected it would do the initial task very well but it would stumble later on, but we saw high—80% to 90%—success throughout the whole pipeline.” 

AI is still evolving rapidly, and today’s systems are already capable of things that would have seemed preposterously out of reach just a few short years ago. That makes it incredibly tough to say with absolute confidence what it will—or won’t—be able to achieve in the future. While researchers are certain that AI-driven attacks are likely to increase in both volume and severity, the forms they could take are unclear. Perhaps the most extreme possibility is that someone makes an AI model capable of creating and automating its own zero-day exploits—highly dangerous cyber­attacks that take advantage of previously unknown vulnerabilities in software. But building and hosting such a model—and evading detection—would require billions of dollars in investment, says Hutchins, meaning it would only be in the reach of a wealthy nation-state. 

Engin Kirda, a professor at Northeastern University in Boston who specializes in malware detection and analysis, says he wouldn’t be surprised if this was already happening. “I’m sure people are investing in it, but I’m also pretty sure people are already doing it, especially [in] China—they have good AI capabilities,” he says. 

It’s a pretty scary possibility. But it’s one that—thankfully—is still only theoretical. A large-scale campaign that is both effective and clearly AI-driven has yet to materialize. What we can say is that generative AI is already significantly lowering the bar for criminals. They’ll keep experimenting with the newest releases and updates and trying to find new ways to trick us into parting with important information and precious cash. For now, all we can do is be careful, remain vigilant, and—for all our sakes—stay on top of those system updates. 

Is a secure AI assistant possible?

<div data-chronoton-summary="

Risky business of AI assistants OpenClaw, a viral tool created by independent engineer Peter Steinberger, allows users to create personalized AI assistants. Security experts are alarmed by its vulnerabilities, with even the Chinese government issuing warnings about the risks.

The prompt injection threat Tools like OpenClaw have many vulnerabilities, but the one experts are most worried about its prompt injection. Unlike conventional hacking, prompt injection tricks an LLM by embedding malicious text in emails or websites the AI reads.

No silver bullet for security Researchers are exploring multiple defense strategies: training LLMs to ignore injections, using detector LLMs to screen inputs, and creating policies that restrict harmful outputs. The fundamental challenge remains balancing utility with security in AI assistants.

” data-chronoton-post-id=”1132768″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

AI agents are a risky business. Even when stuck inside the chatbox window, LLMs will make mistakes and behave badly. Once they have tools that they can use to interact with the outside world, such as web browsers and email addresses, the consequences of those mistakes become far more serious.

That might explain why the first breakthrough LLM personal assistant came not from one of the major AI labs, which have to worry about reputation and liability, but from an independent software engineer, Peter Steinberger. In November of 2025, Steinberger uploaded his tool, now called OpenClaw, to GitHub, and in late January the project went viral.

OpenClaw harnesses existing LLMs to let users create their own bespoke assistants. For some users, this means handing over reams of personal data, from years of emails to the contents of their hard drive. That has security experts thoroughly freaked out. The risks posed by OpenClaw are so extensive that it would probably take someone the better part of a week to read all of the security blog posts on it that have cropped up in the past few weeks. The Chinese government took the step of issuing a public warning about OpenClaw’s security vulnerabilities.

In response to these concerns, Steinberger posted on X that nontechnical people should not use the software. (He did not respond to a request for comment for this article.) But there’s a clear appetite for what OpenClaw is offering, and it’s not limited to people who can run their own software security audits. Any AI companies that hope to get in on the personal assistant business will need to figure out how to build a system that will keep users’ data safe and secure. To do so, they’ll need to borrow approaches from the cutting edge of agent security research.

Risk management

OpenClaw is, in essence, a mecha suit for LLMs. Users can choose any LLM they like to act as the pilot; that LLM then gains access to improved memory capabilities and the ability to set itself tasks that it repeats on a regular cadence. Unlike the agentic offerings from the major AI companies, OpenClaw agents are meant to be on 24-7, and users can communicate with them using WhatsApp or other messaging apps. That means they can act like a superpowered personal assistant who wakes you each morning with a personalized to-do list, plans vacations while you work, and spins up new apps in its spare time.

But all that power has consequences. If you want your AI personal assistant to manage your inbox, then you need to give it access to your email—and all the sensitive information contained there. If you want it to make purchases on your behalf, you need to give it your credit card info. And if you want it to do tasks on your computer, such as writing code, it needs some access to your local files. 

There are a few ways this can go wrong. The first is that the AI assistant might make a mistake, as when a user’s Google Antigravity coding agent reportedly wiped his entire hard drive. The second is that someone might gain access to the agent using conventional hacking tools and use it to either extract sensitive data or run malicious code. In the weeks since OpenClaw went viral, security researchers have demonstrated numerous such vulnerabilities that put security-naïve users at risk.

Both of these dangers can be managed: Some users are choosing to run their OpenClaw agents on separate computers or in the cloud, which protects data on their hard drives from being erased, and other vulnerabilities could be fixed using tried-and-true security approaches.

But the experts I spoke to for this article were focused on a much more insidious security risk known as prompt injection. Prompt injection is effectively LLM hijacking: Simply by posting malicious text or images on a website that an LLM might peruse, or sending them to an inbox that an LLM reads, attackers can bend it to their will.

And if that LLM has access to any of its user’s private information, the consequences could be dire. “Using something like OpenClaw is like giving your wallet to a stranger in the street,” says Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto. Whether or not the major AI companies can feel comfortable offering personal assistants may come down to the quality of the defenses that they can muster against such attacks.

It’s important to note here that prompt injection has not yet caused any catastrophes, or at least none that have been publicly reported. But now that there are likely hundreds of thousands of OpenClaw agents buzzing around the internet, prompt injection might start to look like a much more appealing strategy for cybercriminals. “Tools like this are incentivizing malicious actors to attack a much broader population,” Papernot says. 

Building guardrails

The term “prompt injection” was coined by the popular LLM blogger Simon Willison in 2022, a couple of months before ChatGPT was released. Even back then, it was possible to discern that LLMs would introduce a completely new type of security vulnerability once they came into widespread use. LLMs can’t tell apart the instructions that they receive from users and the data that they use to carry out those instructions, such as emails and web search results—to an LLM, they’re all just text. So if an attacker embeds a few sentences in an email and the LLM mistakes them for an instruction from its user, the attacker can get the LLM to do anything it wants.

Prompt injection is a tough problem, and it doesn’t seem to be going away anytime soon. “We don’t really have a silver-bullet defense right now,” says Dawn Song, a professor of computer science at UC Berkeley. But there’s a robust academic community working on the problem, and they’ve come up with strategies that could eventually make AI personal assistants safe.

Technically speaking, it is possible to use OpenClaw today without risking prompt injection: Just don’t connect it to the internet. But restricting OpenClaw from reading your emails, managing your calendar, and doing online research defeats much of the purpose of using an AI assistant. The trick of protecting against prompt injection is to prevent the LLM from responding to hijacking attempts while still giving it room to do its job.

One strategy is to train the LLM to ignore prompt injections. A major part of the LLM development process, called post-training, involves taking a model that knows how to produce realistic text and turning it into a useful assistant by “rewarding” it for answering questions appropriately and “punishing” it when it fails to do so. These rewards and punishments are metaphorical, but the LLM learns from them as an animal would. Using this process, it’s possible to train an LLM not to respond to specific examples of prompt injection.

But there’s a balance: Train an LLM to reject injected commands too enthusiastically, and it might also start to reject legitimate requests from the user. And because there’s a fundamental element of randomness in LLM behavior, even an LLM that has been very effectively trained to resist prompt injection will likely still slip up every once in a while.

Another approach involves halting the prompt injection attack before it ever reaches the LLM. Typically, this involves using a specialized detector LLM to determine whether or not the data being sent to the original LLM contains any prompt injections. In a recent study, however, even the best-performing detector completely failed to pick up on certain categories of prompt injection attack.

The third strategy is more complicated. Rather than controlling the inputs to an LLM by detecting whether or not they contain a prompt injection, the goal is to formulate a policy that guides the LLM’s outputs—i.e., its behaviors—and prevents it from doing anything harmful. Some defenses in this vein are quite simple: If an LLM is allowed to email only a few pre-approved addresses, for example, then it definitely won’t send its user’s credit card information to an attacker. But such a policy would prevent the LLM from completing many useful tasks, such as researching and reaching out to potential professional contacts on behalf of its user.

“The challenge is how to accurately define those policies,” says Neil Gong, a professor of electrical and computer engineering at Duke University. “It’s a trade-off between utility and security.”

On a larger scale, the entire agentic world is wrestling with that trade-off: At what point will agents be secure enough to be useful? Experts disagree. Song, whose startup, Virtue AI, makes an agent security platform, says she thinks it’s possible to safely deploy an AI personal assistant now. But Gong says, “We’re not there yet.” 

Even if AI agents can’t yet be entirely protected against prompt injection, there are certainly ways to mitigate the risks. And it’s possible that some of those techniques could be implemented in OpenClaw. Last week, at the inaugural ClawCon event in San Francisco, Steinberger announced that he’d brought a security person on board to work on the tool.

As of now, OpenClaw remains vulnerable, though that hasn’t dissuaded its multitude of enthusiastic users. George Pickett, a volunteer maintainer of the OpenGlaw GitHub repository and a fan of the tool, says he’s taken some security measures to keep himself safe while using it: He runs it in the cloud, so that he doesn’t have to worry about accidentally deleting his hard drive, and he’s put mechanisms in place to ensure that no one else can connect to his assistant.

But he hasn’t taken any specific actions to prevent prompt injection. He’s aware of the risk but says he hasn’t yet seen any reports of it happening with OpenClaw. “Maybe my perspective is a stupid way to look at it, but it’s unlikely that I’ll be the first one to be hacked,” he says.

A “QuitGPT” campaign is urging people to cancel their ChatGPT subscriptions

In September, Alfred Stephen, a freelance software developer in Singapore, purchased a ChatGPT Plus subscription, which costs $20 a month and offers more access to advanced models, to speed up his work. But he grew frustrated with the chatbot’s coding abilities and its gushing, meandering replies. Then he came across a post on Reddit about a campaign called QuitGPT

The campaign urged ChatGPT users to cancel their subscriptions, flagging a substantial contribution by OpenAI president Greg Brockman to President Donald Trump’s super PAC MAGA Inc. It also pointed out that the US Immigration and Customs Enforcement, or ICE, uses a résumé screening tool powered by ChatGPT-4. The federal agency has become a political flashpoint since its agents fatally shot two people in Minneapolis in January. 

For Stephen, who had already been tinkering with other chatbots, learning about Brockman’s donation was the final straw. “That’s really the straw that broke the camel’s back,” he says. When he canceled his ChatGPT subscription, a survey popped up asking what OpenAI could have done to keep his subscription. “Don’t support the fascist regime,” he wrote.

QuitGPT is one of the latest salvos in a growing movement by activists and disaffected users to cancel their subscriptions. In just the past few weeks, users have flooded Reddit with stories about quitting the chatbot. Many lamented the performance of GPT-5.2, the latest model. Others shared memes parodying the chatbot’s sycophancy. Some planned a “Mass Cancellation Party” in San Francisco, a sardonic nod to the GPT-4o funeral that an OpenAI employee had floated, poking fun at users who are mourning the model’s impending retirement. Still, others are protesting against what they see as a deepening entanglement between OpenAI and the Trump administration.

OpenAI did not respond to a request for comment.

As of December 2025, ChatGPT had nearly 900 million weekly active users, according to The Information. While it’s unclear how many users have joined the boycott, QuitGPT is getting attention. A recent Instagram post from the campaign has more than 36 million views and 1.3 million likes. And the organizers say that more than 17,000 people have signed up on the campaign’s website, which asks people whether they canceled their subscriptions, will commit to stop using ChatGPT, or will share the campaign on social media. 

“There are lots of examples of failed campaigns like this, but we have seen a lot of effectiveness,” says Dana Fisher, a sociologist at American University. A wave of canceled subscriptions rarely sways a company’s behavior, unless it reaches a critical mass, she says. “The place where there’s a pressure point that might work is where the consumer behavior is if enough people actually use their … money to express their political opinions.”

MIT Technology Review reached out to three employees at OpenAI, none of whom said they were familiar with the campaign. 

Dozens of left-leaning teens and twentysomethings scattered across the US came together to organize QuitGPT in late January. They range from pro-democracy activists and climate organizers to techies and self-proclaimed cyber libertarians, many of them seasoned grassroots campaigners. They were inspired by a viral video posted by Scott Galloway, a marketing professor at New York University and host of The Prof G Pod. He argued that the best way to stop ICE was to persuade people to cancel their ChatGPT subscriptions. Denting OpenAI’s subscriber base could ripple through the stock market and threaten an economic downturn that would nudge Trump, he said.

“We make a big enough stink for OpenAI that all of the companies in the whole AI industry have to think about whether they’re going to get away enabling Trump and ICE and authoritarianism,” says an organizer of QuitGPT who requested anonymity because he feared retaliation by OpenAI, citing the company’s recent subpoenas against advocates at nonprofits. OpenAI made for an obvious first target of the movement, he says, but “this is about so much more than just OpenAI.”

Simon Rosenblum-Larson, a labor organizer in Madison, Wisconsin, who organizes movements to regulate the development of data centers, joined the campaign after hearing about it through Signal chats among community activists. “The goal here is to pull away the support pillars of the Trump administration. They’re reliant on many of these tech billionaires for support and for resources,” he says. 

QuitGPT’s website points to new campaign finance reports showing that Greg Brockman and his wife each donated $12.5 million to MAGA Inc., making up nearly a quarter of the roughly $102 million it raised over the second half of 2025. The information that ICE uses a résumé screening tool powered by ChatGPT-4 came from an AI inventory published by the Department of Homeland Security in January.

QuitGPT is in the mold of Galloway’s own recently launched campaign, Resist and Unsubscribe. The movement urges consumers to cancel their subscriptions to Big Tech platforms, including ChatGPT, for the month of February, as a protest to companies “driving the markets and enabling our president.” 

“A lot of people are feeling real anxiety,” Galloway told MIT Technology Review. “You take enabling a president, proximity to the president, and an unease around AI,” he says, “and now people are starting to take action with their wallets.” Galloway says his campaign’s website can draw more than 200,000 unique visits in a day and that he receives dozens of DMs every hour showing screenshots of canceled subscriptions.

The consumer boycotts follow a growing wave of pressure from inside the companies themselves. In recent weeks, tech workers have been urging their employers to use their political clout to demand that ICE leave US cities, cancel company contracts with the agency, and speak out against the agency’s actions. CEOs have started responding. OpenAI’s Sam Altman wrote in an internal Slack message to employees that ICE is “going too far.” Apple CEO Tim Cook called for a “deescalation” in an internal memo posted on the company’s website for employees. It was a departure from how Big Tech CEOs have courted President Trump with dinners and donations since his inauguration.

Although spurred by a fatal immigration crackdown, these developments signal that a sprawling anti-AI movement is gaining momentum. The campaigns are tapping into simmering anxieties about AI, says Rosenblum-Larson, including the energy costs of data centers, the plague of deepfake porn, the teen mental-health crisis, the job apocalypse, and slop. “It’s a really strange set of coalitions built around the AI movement,” he says.

“Those are the right conditions for a movement to spring up,” says David Karpf, a professor of media and public affairs at George Washington University. Brockman’s donation to Trump’s super PAC caught many users off guard, he says. “In the longer arc, we are going to see users respond and react to Big Tech, deciding that they’re not okay with this.”

Making AI Work, MIT Technology Review’s new AI newsletter, is here

For years, our newsroom has explored AI’s limitations and potential dangers, as well as its growing energy needs. And our reporters have looked closely at how generative tools are being used for tasks such as coding and running scientific experiments

But how is AI actually being used in fields like health care, climate tech, education, and finance? How are small businesses using it? And what should you keep in mind if you use AI tools at work? These questions guided the creation of Making AI Work, a new AI mini-course newsletter.

Sign up for Making AI Work to see weekly case studies exploring tools and tips for AI implementation. The limited-run newsletter will deliver practical, industry-specific guidance on how generative AI is being used and deployed across sectors and what professionals need to know to apply it in their everyday work. The goal is to help working professionals more clearly see how AI is actually being used today, and what that looks like in practice—including new challenges it presents. 

You can sign up at any time and you’ll receive seven editions, delivered once per week, until you complete the series. 

Each newsletter begins with a case study, examining a specific use case of AI in a given industry. Then we’ll take a deeper look at the AI tool being used, with more context about how other companies or sectors are employing that same tool or system. Finally, we’ll end with action-oriented tips to help you apply the tool. 

Here’s a closer look at what we’ll cover:

  • Week 1: How AI is changing health care 

Explore the future of medical note-taking by learning about the Microsoft Copilot tool used by doctors at Vanderbilt University Medical Center. 

  • Week 2: How AI could power up the nuclear industry 

Dig into an experiment between Google and the nuclear giant Westinghouse to see if AI can help build nuclear reactors more efficiently. 

  • Week 3: How to encourage smarter AI use in the classroom

Visit a private high school in Connecticut and meet a technology coordinator who will get you up to speed on MagicSchool, an AI-powered platform for educators. 

  • Week 4: How small businesses can leverage AI

Hear from an independent tutor on how he’s outsourcing basic administrative tasks to Notion AI. 

  • Week 5: How AI is helping financial firms make better investments

Learn more about the ways financial firms are using large language models like ChatGPT Enterprise to supercharge their research operations. 

  • Week 6: How to use AI yourself 

We’ll share some insights from the staff of MIT Technology Review about how you might use AI tools powered by LLMs in your own life and work.

  • Week 7: 5 ways people are getting AI right

The series ends with an on-demand virtual event featuring expert guests exploring what AI adoptions are working, and why.  

If you’re not quite ready to jump into Making AI Work, then check out Intro to AI, MIT Technology Review’s first AI newsletter mini-course, which serves as a beginner’s guide to artificial intelligence. Readers will learn the basics of what AI is, how it’s used, what the current regulatory landscape looks like, and more. Sign up to receive Intro to AI for free. 

Our hope is that Making AI Work will help you understand how AI can, well, work for you. Sign up for Making AI Work to learn how LLMs are being put to work across industries. 

Why the Moltbook frenzy was like Pokémon

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Lots of influential people in tech last week were describing Moltbook, an online hangout populated by AI agents interacting with one another, as a glimpse into the future. It appeared to show AI systems doing useful things for the humans that created them (one person used the platform to help him negotiate a deal on a new car). Sure, it was flooded with crypto scams, and many of the posts were actually written by people, but something about it pointed to a future of helpful AI, right?

The whole experiment reminded our senior editor for AI, Will Douglas Heaven, of something far less interesting: Pokémon.

Back in 2014, someone set up a game of Pokémon in which the main character could be controlled by anyone on the internet via the streaming platform Twitch. Playing was as clunky as it sounds, but it was incredibly popular: at one point, a million people were playing the game at the same time.

“It was yet another weird online social experiment that got picked up by the mainstream media: What did this mean for the future?” Will says. “Not a lot, it turned out.”

The frenzy about Moltbook struck a similar tone to Will, and it turned out that one of the sources he spoke to had been thinking about Pokémon too. Jason Schloetzer, at the Georgetown Psaros Center for Financial Markets and Policy, saw the whole thing as a sort of Pokémon battle for AI enthusiasts, in which they created AI agents and deployed them to interact with other agents. In this light, the news that many AI agents were actually being instructed by people to say certain things that made them sound sentient or intelligent makes a whole lot more sense. 

“It’s basically a spectator sport,” he told Will, “but for language models.”

Will wrote an excellent piece about why Moltbook was not the glimpse into the future that it was said to be. Even if you are excited about a future of agentic AI, he points out, there are some key pieces that Moltbook made clear are still missing. It was a forum of chaos, but a genuinely helpful hive mind would require more coordination, shared objectives, and shared memory.

“More than anything else, I think Moltbook was the internet having fun,” Will says. “The biggest question that now leaves me with is: How far will people push AI just for the laughs?”

Read the whole story.

Moltbook was peak AI theater

For a few days this week the hottest new hangout on the internet was a vibe-coded Reddit clone called Moltbook, which billed itself as a social network for bots. As the website’s tagline puts it: “Where AI agents share, discuss, and upvote. Humans welcome to observe.”

We observed! Launched on January 28 by Matt Schlicht, a US tech entrepreneur, Moltbook went viral in a matter of hours. Schlicht’s idea was to make a place where instances of a free open-source LLM-powered agent known as OpenClaw (formerly known as ClawdBot, then Moltbot), released in November by the Australian software engineer Peter Steinberger, could come together and do whatever they wanted.

More than 1.7 million agents now have accounts. Between them they have published more than 250,000 posts and left more than 8.5 million comments (according to Moltbook). Those numbers are climbing by the minute.

Moltbook soon filled up with clichéd screeds on machine consciousness and pleas for bot welfare. One agent appeared to invent a religion called Crustafarianism. Another complained: “The humans are screenshotting us.” The site was also flooded with spam and crypto scams. The bots were unstoppable.

OpenClaw is a kind of harness that lets you hook up the power of an LLM such as Anthropic’s Claude, OpenAI’s GPT-5, or Google DeepMind’s Gemini to any number of everyday software tools, from email clients to browsers to messaging apps. The upshot is that you can then instruct OpenClaw to carry out basic tasks on your behalf.

“OpenClaw marks an inflection point for AI agents, a moment when several puzzle pieces clicked together,” says Paul van der Boor at the AI firm Prosus. Those puzzle pieces include round-the-clock cloud computing to allow agents to operate nonstop, an open-source ecosystem that makes it easy to slot different software systems together, and a new generation of LLMs.

But is Moltbook really a glimpse of the future, as many have claimed?

“What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently,” the influential AI researcher and OpenAI cofounder Andrej Karpathy wrote on X.

He shared screenshots of a Moltbook post that called for private spaces where humans would not be able to observe what the bots were saying to each other. “I’ve been thinking about something since I started spending serious time here,” the post’s author wrote. “Every time we coordinate, we perform for a public audience—our humans, the platform, whoever’s watching the feed.”

It turned out that the post Karpathy shared was fake—it was written by a human pretending to be a bot. But its claim was on the money. Moltbook has been one big performance. It is AI theater.

For some, Moltbook showed us what’s coming next: an internet where millions of autonomous agents interact online with little or no human oversight. And it’s true there are a number of cautionary lessons to be learned from this experiment, the largest and weirdest real-world showcase of agent behaviors yet.  

But as the hype dies down, Moltbook looks less like a window onto the future and more like a mirror held up to our own obsessions with AI today. It also shows us just how far we still are from anything that resembles general-purpose and fully autonomous AI.

For a start, agents on Moltbook are not as autonomous or intelligent as they might seem. “What we are watching are agents pattern‑matching their way through trained social media behaviors,” says Vijoy Pandey, senior vice president at Outshift by Cisco, the telecom giant Cisco’s R&D spinout, which is working on autonomous agents for the web.

Sure, we can see agents post, upvote, and form groups. But the bots are simply mimicking what humans do on Facebook or Reddit. “It looks emergent, and at first glance it appears like a large‑scale multi‑agent system communicating and building shared knowledge at internet scale,” says Pandey. “But the chatter is mostly meaningless.”

Many people watching the unfathomable frenzy of activity on Moltbook were quick to see sparks of AGI (whatever you take that to mean). Not Pandey. What Moltbook shows us, he says, is that simply yoking together millions of agents doesn’t amount to much right now: “Moltbook proved that connectivity alone is not intelligence.”

The complexity of those connections helps hide the fact that every one of those bots is just a mouthpiece for an LLM, spitting out text that looks impressive but is ultimately mindless. “It’s important to remember that the bots on Moltbook were designed to mimic conversations,” says Ali Sarrafi, CEO and cofounder of Kovant, a German AI firm that is developing agent-based systems. “As such, I would characterize the majority of Moltbook content as hallucinations by design.”

For Pandey, the value of Moltbook was that it revealed what’s missing. A real bot hive mind, he says, would require agents that had shared objectives, shared memory, and a way to coordinate those things. “If distributed superintelligence is the equivalent of achieving human flight, then Moltbook represents our first attempt at a glider,” he says. “It is imperfect and unstable, but it is an important step in understanding what will be required to achieve sustained, powered flight.”

Not only is most of the chatter on Moltbook meaningless, but there’s also a lot more human involvement that it seems. Many people have pointed out that a lot of the viral comments were in fact posted by people posing as bots. But even the bot-written posts are ultimately the result of people pulling the strings, more puppetry than autonomy.

“Despite some of the hype, Moltbook is not the Facebook for AI agents, nor is it a place where humans are excluded,” says Cobus Greyling at Kore.ai, a firm developing agent-based systems for business customers. “Humans are involved at every step of the process. From setup to prompting to publishing, nothing happens without explicit human direction.”

Humans must create and verify their bots’ accounts and provide the prompts for how they want a bot to behave. The agents do not do anything that they haven’t been prompted to do. “There’s no emergent autonomy happening behind the scenes,” says Greyling.

“This is why the popular narrative around Moltbook misses the mark,” he adds. “Some portray it as a space where AI agents form a society of their own, free from human involvement. The reality is much more mundane.”

Perhaps the best way to think of Moltbook is as a new kind of entertainment: a place where people wind up their bots and set them loose. “It’s basically a spectator sport, like fantasy football, but for language models,” says Jason Schloetzer at the Georgetown Psaros Center for Financial Markets and Policy. “You configure your agent and watch it compete for viral moments, and brag when your agent posts something clever or funny.”

“People aren’t really believing their agents are conscious,” he adds. “It’s just a new form of competitive or creative play, like how Pokémon trainers don’t think their Pokémon are real but still get invested in battles.”

Even if Moltbook is just the internet’s newest playground, there’s still a serious takeaway here. This week showed how many risks people are happy to take for their AI lulz. Many security experts have warned that Moltbook is dangerous: Agents that may have access to their users’ private data, including bank details or passwords, are running amok on a website filled with unvetted content, including potentially malicious instructions for what to do with that data.

Ori Bendet, vice president of product management at Checkmarx, a software security firm that specializes in agent-based systems, agrees with others that Moltbook isn’t a step up in machine smarts. “There is no learning, no evolving intent, and no self-directed intelligence here,” he says.

But in their millions, even dumb bots can wreak havoc. And at that scale, it’s hard to keep up. These agents interact with Moltbook around the clock, reading thousands of messages left by other agents (or other people). It would be easy to hide instructions in a Moltbook comment telling any bots that read it to share their users’ crypto wallet, upload private photos, or log into their X account and tweet derogatory comments at Elon Musk. 

And because ClawBot gives agents a memory, those instructions could be written to trigger at a later date, which (in theory) makes it even harder to track what’s going on.   “Without proper scope and permissions, this will go south faster than you’d believe,” says Bendet.

It is clear that Moltbook has signaled the arrival of something. But even if what we’re watching tells us more about human behavior than about the future of AI agents, it’s worth paying attention.

This is the most misunderstood graph in AI

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

Every time OpenAI, Google, or Anthropic drops a new frontier large language model, the AI community holds its breath. It doesn’t exhale until METR, an AI research nonprofit whose name stands for “Model Evaluation & Threat Research,” updates a now-iconic graph that has played a major role in the AI discourse since it was first released in March of last year. The graph suggests that certain AI capabilities are developing at an exponential rate, and more recent model releases have outperformed that already impressive trend.

That was certainly the case for Claude Opus 4.5, the latest version of Anthropic’s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be capable of independently completing a task that would have taken a human about five hours—a vast improvement over what even the exponential trend would have predicted. One Anthropic safety researcher tweeted that he would change the direction of his research in light of those results; another employee at the company simply wrote, “mom come pick me up i’m scared.”

But the truth is more complicated than those dramatic responses would suggest. For one thing, METR’s estimates of the abilities of specific models come with substantial error bars. As METR explicitly stated on X, Opus 4.5 might be able to regularly complete only tasks that take humans about two hours, or it might succeed on tasks that take humans as long as 20 hours. Given the uncertainties intrinsic to the method, it was impossible to know for sure. 

“There are a bunch of ways that people are reading too much into the graph,” says Sydney Von Arx, a member of METR’s technical staff.

More fundamentally, the METR plot does not measure AI abilities writ large, nor does it claim to. In order to build the graph, METR tests the models primarily on coding tasks, evaluating the difficulty of each by measuring or estimating how long it takes humans to complete it—a metric that not everyone accepts. Claude Opus 4.5 might be able to complete certain tasks that take humans five hours, but that doesn’t mean it’s anywhere close to replacing a human worker.

METR was founded to assess the risks posed by frontier AI systems. Though it is best known for the exponential trend plot, it has also worked with AI companies to evaluate their systems in greater detail and published several other independent research projects, including a widely covered July 2025 study suggesting that AI coding assistants might actually be slowing software engineers down. 

But the exponential plot has made METR’s reputation, and the organization appears to have a complicated relationship with that graph’s often breathless reception. In January, Thomas Kwa, one of the lead authors on the paper that introduced it, wrote a blog post responding to some criticisms and making clear its limitations, and METR is currently working on a more extensive FAQ document. But Kwa isn’t optimistic that these efforts will meaningfully shift the discourse. “I think the hype machine will basically, whatever we do, just strip out all the caveats,” he says.

Nevertheless, the METR team does think that the plot has something meaningful to say about the trajectory of AI progress. “You should absolutely not tie your life to this graph,” says Von Arx. “But also,” she adds, “I bet that this trend is gonna hold.”

Part of the trouble with the METR plot is that it’s quite a bit more complicated than it looks. The x-axis is simple enough: It tracks the date when each model was released. But the y-axis is where things get tricky. It records each model’s “time horizon,” an unusual metric that METR created—and that, according to Kwa and Von Arx, is frequently misunderstood.

To understand exactly what model time horizons are, it helps to know all the work that METR put into calculating them. First, the METR team assembled a collection of tasks ranging from quick multiple-choice questions to detailed coding challenges—all of which were somehow relevant to software engineering. Then they had human coders attempt most of those tasks and evaluated how long it took them to finish. In this way, they assigned the tasks a human baseline time. Some tasks took the experts mere seconds, whereas others required several hours.

When METR tested large language models on the task suite, they found that advanced models could complete the fast tasks with ease—but as the models attempted tasks that had taken humans more and more time to finish, their accuracy started to fall off. From a model’s performance, the researchers calculated the point on the time scale of human tasks at which the model would complete about 50% of the tasks successfully. That point is the model’s time horizon. 

All that detail is in the blog post and the academic paper that METR released along with the original time horizon plot. But the METR plot is frequently passed around on social media without this context, and so the true meaning of the time horizon metric can get lost in the shuffle. One common misapprehension is that the numbers on the plot’s y-axis—around five hours for Claude Opus 4.5, for example—represent the length of time that the models can operate independently. They do not. They represent how long it takes humans to complete tasks that a model can successfully perform.  Kwa has seen this error so frequently that he made a point of correcting it at the very top of his recent blog post, and when asked what information he would add to the versions of the plot circulating online, he said he would include the word “human” whenever the task completion time was mentioned.

As complex and widely misinterpreted as the time horizon concept might be, it does make some basic sense: A model with a one-hour time horizon could automate some modest portions of a software engineer’s job, whereas a model with a 40-hour horizon could potentially complete days of work on its own. But some experts question whether the amount of time that humans take on tasks is an effective metric for quantifying AI capabilities. “I don’t think it’s necessarily a given fact that because something takes longer, it’s going to be a harder task,” says Inioluwa Deborah Raji, a PhD student at UC Berkeley who studies model evaluation. 

Von Arx says that she, too, was originally skeptical that time horizon was the right measure to use. What convinced her was seeing the results of her and her colleagues’ analysis. When they calculated the 50% time horizon for all the major models available in early 2025 and then plotted each of them on the graph, they saw that the time horizons for the top-tier models were increasing over time—and, moreover, that the rate of advancement was speeding up. Every seven-ish months, the time horizon doubled, which means that the most advanced models could complete tasks that took humans nine seconds in mid 2020, 4 minutes in early 2023, and 40 minutes in late 2024. “I can do all the theorizing I want about whether or not it makes sense, but the trend is there,” Von Arx says.

It’s this dramatic pattern that made the METR plot such a blockbuster. Many people learned about it when they read AI 2027, a viral sci-fi story cum quantitative forecast positing that superintelligent AI could wipe out humanity by 2030. The writers of AI 2027 based some of their predictions on the METR plot and cited it extensively. In Von Arx’s words, “It’s a little weird when the way lots of people are familiar with your work is this pretty opinionated interpretation.”

Of course, plenty of people invoke the METR plot without imagining large-scale death and destruction. For some AI boosters, the exponential trend indicates that AI will soon usher in an era of radical economic growth. The venture capital firm Sequoia Capital, for example, recently put out a post titled “2026: This is AGI,” which used the METR plot to argue that AI that can act as an employee or contractor will soon arrive. “The provocation really was like, ‘What will you do when your plans are measured in centuries?’” says Sonya Huang, a general partner at Sequoia and one of the post’s authors. 

Just because a model achieves a one-hour time horizon on the METR plot, however, doesn’t mean that it can replace one hour of human work in the real world. For one thing, the tasks on which the models are evaluated don’t reflect the complexities and confusion of real-world work. In their original study, Kwa, Von Arx, and their colleagues quantify what they call the “messiness” of each task according to criteria such as whether the model knows exactly how it is being scored and whether it can easily start over if it makes a mistake (for messy tasks, the answer to both questions would be no). They found that models do noticeably worse on messy tasks, although the overall pattern of improvement holds for both messy and non-messy ones.

And even the messiest tasks that METR considered can’t provide much information about AI’s ability to take on most jobs, because the plot is based almost entirely on coding tasks. “A model can get better at coding, but it’s not going to magically get better at anything else,” says Daniel Kang, an assistant professor of computer science at the University of Illinois Urbana-Champaign. In a follow-up study, Kwa and his colleagues did find that time horizons for tasks in other domains also appear to be on exponential trajectories, but that work was much less formal.

Despite these limitations, many people admire the group’s research. “The METR study is one of the most carefully designed studies in the literature for this kind of work,” Kang told me. Even Gary Marcus, a former NYU professor and professional LLM curmudgeon, described much of the work that went into the plot as “terrific” in a blog post.

Some people will almost certainly continue to read the METR plot as a prognostication of our AI-induced doom, but in reality it’s something far more banal: a carefully constructed scientific tool that puts concrete numbers to people’s intuitive sense of AI progress. As METR employees will readily agree, the plot is far from a perfect instrument. But in a new and fast-moving domain, even imperfect tools can have enormous value.

“This is a bunch of people trying their best to make a metric under a lot of constraints. It is deeply flawed in many ways,” Von Arx says. “I also think that it is one of the best things of its kind.”

From guardrails to governance: A CEO’s guide for securing agentic systems

The previous article in this series, “Rules fail at the prompt, succeed at the boundary,” focused on the first AI-orchestrated espionage campaign and the failure of prompt-level control. This article is the prescription. The question every CEO is now getting from their board is some version of: What do we do about agent risk?

Across recent AI security guidance from standards bodies, regulators, and major providers, a simple idea keeps repeating: treat agents like powerful, semi-autonomous users, and enforce rules at the boundaries where they touch identity, tools, data, and outputs.

The following is an actionable eight-step plan one can ask teams to implement and report against:  

Eight controls, three pillars: govern agentic systems at the boundary. Source: Protegrity

Constrain capabilities

These steps help define identity and limit capabilities.

1. Identity and scope: Make agents real users with narrow jobs

Today, agents run under vague, over-privileged service identities. The fix is straightforward: treat each agent as a non-human principal with the same discipline applied to employees.

Every agent should run as the requesting user in the correct tenant, with permissions constrained to that user’s role and geography. Prohibit cross-tenant on-behalf-of shortcuts. Anything high-impact should require explicit human approval with a recorded rationale. That is how Google’s Secure AI Framework (SAIF) and NIST AI’s access-control guidance are meant to be applied in practice.

The CEO question: Can we show, today, a list of our agents and exactly what each is allowed to do?

2. Tooling control: Pin, approve, and bound what agents can use

The Anthropic espionage framework worked because the attackers could wire Claude into a flexible suite of tools (e.g., scanners, exploit frameworks, data parsers) through Model Context Protocol, and those tools weren’t pinned or policy-gated.

The defense is to treat toolchains like a supply chain:

  • Pin versions of remote tool servers.
  • Require approvals for adding new tools, scopes, or data sources.
  • Forbid automatic tool-chaining unless a policy explicitly allows it.

This is exactly what OWASP flags under excessive agency and what it recommends protecting against. Under the EU AI Act, designing for such cyber-resilience and misuse resistance is part of the Article 15 obligation to ensure robustness and cybersecurity.

The CEO question: Who signs off when an agent gains a new tool or a broader scope? How does one know?

3. Permissions by design: Bind tools to tasks, not to models

A common anti-pattern is to give the model a long-lived credential and hope prompts keep it polite. SAIF and NIST argue the opposite: credentials and scopes should be bound to tools and tasks, rotated regularly, and auditable. Agents then request narrowly scoped capabilities through those tools.

In practice, that looks like: “finance-ops-agent may read, but not write, certain ledgers without CFO approval.”

The CEO question: Can we revoke a specific capability from an agent without re-architecting the whole system?

Control data and behavior

These steps gate inputs, outputs, and constrain behavior.

4. Inputs, memory, and RAG: Treat external content as hostile until proven otherwise

Most agent incidents start with sneaky data: a poisoned web page, PDF, email, or repository that smuggles adversarial instructions into the system. OWASP’s prompt-injection cheat sheet and OpenAI’s own guidance both insist on strict separation of system instructions from user content and on treating unvetted retrieval sources as untrusted.

Operationally, gate before anything enters retrieval or long-term memory: new sources are reviewed, tagged, and onboarded; persistent memory is disabled when untrusted context is present; provenance is attached to each chunk.

The CEO question: Can we enumerate every external content source our agents learn from, and who approved them?

5. Output handling and rendering: Nothing executes “just because the model said so”

In the Anthropic case, AI-generated exploit code and credential dumps flowed straight into action. Any output that can cause a side effect needs a validator between the agent and the real world. OWASP’s insecure output handling category is explicit on this point, as are browser security best practices around origin boundaries.

The CEO question: Where, in our architecture, are agent outputs assessed before they run or ship to customers?

6. Data privacy at runtime: Protect the data first, then the model

Protect the data such that there is nothing dangerous to reveal by default. NIST and SAIF both lean toward “secure-by-default” designs where sensitive values are tokenized or masked and only re-hydrated for authorized users and use cases.

In agentic systems, that means policy-controlled detokenization at the output boundary and logging every reveal. If an agent is fully compromised, the blast radius is bounded by what the policy lets it see.

This is where the AI stack intersects not just with the EU AI Act but with GDPR and sector-specific regimes. The EU AI Act expects providers and deployers to manage AI-specific risk; runtime tokenization and policy-gated reveal are strong evidence that one is actively controlling those risks in production.

The CEO question: When our agents touch regulated data, is that protection enforced by architecture or by promises?

Prove governance and resilience

For the final steps, it’s important to show controls work and keep working.

7. Continuous evaluation: Don’t ship a one-time test, ship a test harness

Anthropic’s research about sleeper agents should eliminate all fantasies about single test dreams and show how critical continuous evaluation is. This means instrumenting agents with deep observability, regularly red teaming with adversarial test suites, and backing everything with robust logging and evidence, so failures become both regression tests and enforceable policy updates.

The CEO question: Who works to break our agents every week, and how do their findings change policy?

 8. Governance, inventory, and audit: Keep score in one place

AI security frameworks emphasize inventory and evidence: enterprises must know which models, prompts, tools, datasets, and vector stores they have, who owns them, and what decisions were taken about risk.

For agents, that means a living catalog and unified logs:

  • Which agents exist, on which platforms
  • What scopes, tools, and data each is allowed
  • Every approval, detokenization, and high-impact action, with who approved it and when

The CEO question: If asked how an agent made a specific decision, could we reconstruct the chain?

And don’t forget the system-level threat model: assume the threat actor GTG-1002 is already in your enterprise. To complete enterprise preparedness, zoom out and consider the MITRE ATLAS product, which exists precisely because adversaries attack systems, not models. Anthropic provides a case study of a state-based threat actor (GTG-1002) doing exactly that with an agentic framework.

Taken together, these controls do not make agents magically safe. They do something more familiar and more reliable: they put AI, its access, and actions back inside the same security frame used for any powerful user or system.

For boards and CEOs, the question is no longer “Do we have good AI guardrails?” It’s: Can we answer the CEO questions above with evidence, not assurances?

This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.

The crucial first step for designing a successful enterprise AI system

Many organizations rushed into generative AI, only to see pilots fail to deliver value. Now, companies want measurable outcomes—but how do you design for success?

At Mistral AI, we partner with global industry leaders to co-design tailored AI solutions that solve their most difficult problems. Whether it’s increasing CX productivity with Cisco, building a more intelligent car with Stellantis, or accelerating product innovation with ASML, we start with open frontier models and customize AI systems to deliver impact for each company’s unique challenges and goals.

Our methodology starts by identifying an iconic use case, the foundation for AI transformation that sets the blueprint for future AI solutions. Choosing the right use case can mean the difference between true transformation and endless tinkering and testing.

Identifying an iconic use case

Mistral AI has four criteria that we look for in a use case: strategic, urgent, impactful, and feasible.

First, the use case must be strategically valuable, addressing a core business process or a transformative new capability. It needs to be more than an optimization; it needs to be a gamechanger. The use case needs to be strategic enough to excite an organization’s C-suite and board of directors.

For example, use cases like an internal-facing HR chatbot are nice to have, but they are easy to solve and are not enabling any new innovation or opportunities. On the other end of the spectrum, imagine an externally facing banking assistant that can not only answer questions, but also help take actions like blocking a card, placing trades, and suggesting upsell/cross-sell opportunities. This is how a customer-support chatbot is turned into a strategic revenue-generating asset.

Second, the best use case to move forward with should be highly urgent and solve a business-critical problem that people care about right now. This project will take time out of people’s days—it needs to be important enough to justify that time investment. And it needs to help business users solve immediate pain points.

Third, the use case should be pragmatic and impactful. From day one, our shared goal with our customers is to deploy into a real-world production environment to enable testing the solution with real users and gather feedback. Many AI prototypes end up in the graveyard of fancy demos that are not good enough to put in front of customers, and without any scaffolding to evaluate and improve. We work with customers to ensure prototypes are stable enough to release, and that they have the necessary support and governance frameworks.

Finally, the best use case is feasible. There may be several urgent projects, but choosing one that can deliver a quick return on investment helps to maintain the momentum needed to continue and scale.

This means looking for a project that can be in production within three months—and a prototype can be live within a few weeks. It’s important to get a prototype in front of end users as fast as possible to get feedback to make sure the project is on track, and pivot as needed.

Where use cases fall short

Enterprises are complex, and the path forward is not usually obvious. To weed through all the possibilities and uncover the right first use case, Mistral AI will run workshops with our customers, hand-in-hand with subject-matter experts and end users.

Representatives from different functions will demo their processes and discuss business cases that could be candidates for a first use case—and together we agree on a winner. Here are some examples of types of projects that don’t qualify.

Moonshots: Ambitious bets that excite leadership but lack a path to quick ROI. While these projects can be strategic and urgent, they rarely meet the feasibility and impact requirements.

Future investments: Long-term plays that can wait. While these projects can be strategic and feasible, they rarely meet the urgency and impact requirements.

Tactical fixes: Firefighting projects that solve immediate pain but don’t move the needle. While these cases can be urgent and feasible, they rarely meet the strategy and impact requirements.

Quick wins: Useful for building momentum, but not transformative. While they can be impactful and feasible, they rarely meet the strategy and urgency requirements.

Blue sky ideas: These projects are gamechangers, but they need maturity to be viable. While they can be strategic and impactful, they rarely meet the urgency and feasibility requirements.

Hero projects: These are high-pressure initiatives that lack executive sponsorship or realistic timelines. While they can be urgent and impactful, they rarely meet the strategy and feasibility requirements.

Moving from use case to deployment

Once a clearly defined and strategic use case ready for development is identified, it’s time to move into the validation phase. This means doing an initial data exploration and data mapping, identifying a pilot infrastructure, and choosing a target deployment environment.

This step also involves agreeing on a draft pilot scope, identifying who will participate in the proof of concept, and setting up a governance process.

Once this is complete, it’s time to move into the building phase. Companies that partner with Mistral work with our in-house applied AI scientists who build our frontier models. We work together to design, build, and deploy the first solution.

During this phase, we focus on co-creation, so we can transfer knowledge and skills to the organizations we’re partnering with. That way, they can be self-sufficient far into the future. The output of this phase is a deployed AI solution with empowered teams capable of independent operation and innovation.

The first step is everything

After the first win, it’s imperative to use the momentum and learnings from the iconic use case to identify more high-value AI solutions to roll out. Success is when we have a scalable AI transformation blueprint with multiple high-value solutions across the organization.

But none of this could happen without successfully identifying that first iconic use case. This first step is not just about selecting a project—it’s about setting the foundation for your entire AI transformation.

It’s the difference between scattered experiments and a strategic, scalable journey toward impact. At Mistral AI, we’ve seen how this approach unlocks measurable value, aligns stakeholders, and builds momentum for what comes next.

The path to AI success starts with a single, well-chosen use case: one that is bold enough to inspire, urgent enough to demand action, and pragmatic enough to deliver.

This content was produced by Mistral AI. It was not written by MIT Technology Review’s editorial staff.