Finding value with AI and Industry 5.0 transformation

For years, Industry 4.0 transformation has centered on the convergence of intelligent technologies like AI, cloud, the internet of things, robotics, and digital twins. Industry 5.0 marks a pivotal shift from integrating emerging technologies to orchestrating them at scale. With Industry 5.0, the purpose of this interconnected web of technologies is more nuanced: to augment human potential, not just automate work, and enhance environmental sustainability.

Industry 5.0 has ushered in a radically new level of collaboration between humans and machines, one that removes data silos and optimizes infrastructure, operations, and resource use to disrupt business models and create new forms of enterprise value. But without discipline in tracking value creation, investments risk being wasted on incremental efficiency gains rather than strategic growth.

“To realize the promise of Industry 5.0, companies must move beyond cost and efficiency to focus on growth, resilience, and human-centric outcomes,” says Sachin Lulla, EY Americas industrials and energy transformation leader. “This requires not just new technologies, but new ways of working—where people and machines collaborate, and where value is measured not just in dollars saved, but in new opportunities created.”

An MIT Technology Review Insights survey of 250 industry leaders from around the world reveals most industrial investments still target efficiency. And while the data shows human-centric and sustainable use cases deliver higher value, they are underfunded. The research shows most organizations are not realizing the full value potential of Industry 5.0 due to a combination of:

• Culture, skills, and collaboration barriers.
• Tactical and misaligned technology investments.
• Use-case prioritization focused on efficiency over growth, sustainability, and well-being.

The barrier to achieving Industry 5.0 transformation is not only about fixing the technology, according to research from EY and Saïd Business School at the University of Oxford, it is also about bolstering human-centric elements like strategy, culture, and leadership. Companies are investing heavily in digital transformation, but not always in ways that unlock the full human potential of Industry 5.0.

“We’re not just doing digital work for work’s sake, what I call ‘chasing the digital fairies,’” says Chris Ware, general manager, iron ore digital, Rio Tinto. “We have to be very clear on what pieces of work we go after and why. Every domain has a unique roadmap about how to deliver the best value.”

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

The human work behind humanoid robots is being hidden

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In January, Nvidia’s Jensen Huang, the head of the world’s most valuable company, proclaimed that we are entering the era of physical AI, when artificial intelligence will move beyond language and chatbots into physically capable machines. (He also said the same thing the year before, by the way.)

The implication—fueled by new demonstrations of humanoid robots putting away dishes or assembling cars—is that mimicking human limbs with single-purpose robot arms is the old way of automation. The new way is to replicate the way humans think, learn, and adapt while they work. The problem is that the lack of transparency about the human labor involved in training and operating such robots leaves the public both misunderstanding what robots can actually do and failing to see the strange new forms of work forming around them.

Consider how, in the AI era, robots often learn from humans who demonstrate how to do a chore. Creating this data at scale is now leading to Black Mirror–esque scenarios. A worker in Shanghai, for example, recently spent a week wearing a virtual-reality headset and an exoskeleton while opening and closing the door of a microwave hundreds of times a day to train the robot next to him, Rest of World reported. In North America, the robotics company Figure appears to be planning something similar: It announced in September it would partner with the investment firm Brookfield, which manages 100,000 residential units, to capture “massive amounts” of real-world data “across a variety of household environments.” (Figure did not respond to questions about this effort.)

Just as our words became training data for large language models, our movements are now poised to follow the same path. Except this future might leave humans with an even worse deal, and it’s already beginning. The roboticist Aaron Prather told me about recent work with a delivery company that had its workers wear movement-tracking sensors as they moved boxes; the data collected will be used to train robots. The effort to build humanoids will likely require manual laborers to act as data collectors at massive scale. “It’s going to be weird,” Prather says. “No doubts about it.” 

Or consider tele-operation. Though the endgame in robotics is a machine that can complete a task on its own, robotics companies employ people to operate their robots remotely. Neo, a $20,000 humanoid robot from the startup 1X, is set to ship to homes this year, but the company’s founder, Bernt Øivind Børnich, told me recently that he’s not committed to any prescribed level of autonomy. If a robot gets stuck, or if the customer wants it to do a tricky task, a tele-operator from the company’s headquarters in Palo Alto, California, will pilot it, looking through its cameras to iron clothes or unload the dishwasher.

This isn’t inherently harmful—1X gets customer consent before switching into tele-operation mode—but privacy as we know it will not exist in a world where tele-operators are doing chores in your house through a robot. And if home humanoids are not genuinely autonomous, the arrangement is better understood as a form of wage arbitrage that re-creates the dynamics of gig work while, for the first time, allowing physical tasks to be performed wherever labor is cheapest.

We’ve been down similar roads before. Carrying out “AI-driven” content moderation on social media platforms or assembling training data for AI companies often requires workers in low-wage countries to view disturbing content. And despite claims that AI will soon enough train on its outputs and learn on its own, even the best models require an awful lot of human feedback to work as desired.

These human workforces do not mean that AI is just vaporware. But when they remain invisible, the public consistently overestimates the machines’ actual capabilities.

That’s great for investors and hype, but it has consequences for everyone. When Tesla marketed its driver-assistance software as “Autopilot,” for example, it inflated public expectations about what the system could safely do—a distortion a Miami jury recently found contributed to a crash that killed a 22-year-old woman (Tesla was ordered to pay $240 million in damages). 

The same will be true for humanoid robots. If Huang is right, and physical AI is coming for our workplaces, homes, and public spaces, then the way we describe and scrutinize such technology matters. Yet robotics companies remain as opaque about training and tele-operation as AI firms are about their training data. If that does not change, we risk mistaking concealed human labor for machine intelligence—and seeing far more autonomy than truly exists.

Microsoft has a new plan to prove what’s real and what’s AI online

AI-enabled deception now permeates our online lives. There are the high-profile cases you may easily spot, like when White House officials recently shared a manipulated image of a protester in Minnesota and then mocked those asking about it. Other times, it slips quietly into social media feeds and racks up views, like the videos that Russian influence campaigns are currently spreading to discourage Ukrainians from enlisting. 

It is into this mess that Microsoft has put forward a blueprint, shared with MIT Technology Review, for how to prove what’s real online. 

An AI safety research team at the company recently evaluated how methods for documenting digital manipulation are faring against today’s most worrying AI developments, like interactive deepfakes and widely accessible hyperrealistic models. It then recommended technical standards that can be adopted by AI companies and social media platforms.

To understand the gold standard that Microsoft is pushing, imagine you have a Rembrandt painting and you are trying to document its authenticity. You might describe its provenance with a detailed manifest of where the painting came from and all the times it changed hands. You might apply a watermark that would be invisible to humans but readable by a machine. And you could digitally scan the painting and generate a mathematical signature, like a fingerprint, based on the brush strokes. If you showed the piece at a museum, a skeptical visitor could then examine these proofs to verify that it’s an original.

All of these methods are already being used to varying degrees in the effort to vet content online. Microsoft evaluated 60 different combinations of them, modeling how each setup would hold up under different failure scenarios—from metadata being stripped to content being slightly altered or deliberately manipulated. The team then mapped which combinations produce sound results that platforms can confidently show to people online, and which ones are so unreliable that they may cause more confusion than clarification. 

The company’s chief scientific officer, Eric Horvitz, says the work was prompted by legislation—like California’s AI Transparency Act, which will take effect in August—and the speed at which AI has developed to combine video and voice with striking fidelity.

“You might call this self-regulation,” Horvitz told MIT Technology Review. But it’s clear he sees pursuing the work as boosting Microsoft’s image: “We’re also trying to be a selected, desired provider to people who want to know what’s going on in the world.”

Nevertheless, Horvitz declined to commit to Microsoft using its own recommendation across its platforms. The company sits at the center of a giant AI content ecosystem: It runs Copilot, which can generate images and text; it operates Azure, the cloud service through which customers can access OpenAI and other major AI models; it owns LinkedIn, one of the world’s largest professional platforms; and it holds a significant stake in OpenAI. But when asked about in-house implementation, Horvitz said in a statement, “Product groups and leaders across the company were involved in this study to inform product road maps and infrastructure, and our engineering teams are taking action on the report’s findings.”

It’s important to note that there are inherent limits to these tools; just as they would not tell you what your Rembrandt means, they are not built to determine if content is accurate or not. They only reveal if it has been manipulated. It’s a point that Horvitz says he has to make to lawmakers and others who are skeptical of Big Tech as an arbiter of fact.

“It’s not about making any decisions about what’s true and not true,” he said. “It’s about coming up with labels that just tell folks where stuff came from.”

Hany Farid, a professor at UC Berkeley who specializes in digital forensics but wasn’t involved in the Microsoft research, says that if the industry adopted the company’s blueprint, it would be meaningfully more difficult to deceive the public with manipulated content. Sophisticated individuals or governments can work to bypass such tools, he says, but the new standard could eliminate a significant portion of misleading material.

“I don’t think it solves the problem, but I think it takes a nice big chunk out of it,” he says.

Still, there are reasons to see Microsoft’s approach as an example of somewhat naïve techno-optimism. There is growing evidence that people are swayed by AI-generated content even when they know that it is false. And in a recent study of pro-Russian AI-generated videos about the war in Ukraine, comments pointing out that the videos were made with AI received far less engagement than comments treating them as genuine. 

“Are there people who, no matter what you tell them, are going to believe what they believe?” Farid asks. “Yes.” But, he adds, “there are a vast majority of Americans and citizens around the world who I do think want to know the truth.”

That desire has not exactly led to urgent action from tech companies. Google started adding a watermark to content generated by its AI tools in 2023, which Farid says has been helpful in his investigations. Some platforms use C2PA, a provenance standard Microsoft helped launch in 2021. But the full suite of changes that Microsoft suggests, powerful as they are, might remain only suggestions if they threaten the business models of AI companies or social media platforms.

“If the Mark Zuckerbergs and the Elon Musks of the world think that putting ‘AI generated’ labels on something will reduce engagement, then of course they’re incentivized not to do it,” Farid says. Platforms like Meta and Google have already said they’d include labels for AI-generated content, but an audit conducted by Indicator last year found that only 30% of its test posts on Instagram, LinkedIn, Pinterest, TikTok, and YouTube were correctly labeled as AI-generated.

More forceful moves toward content verification might come from the many pieces of AI regulation pending around the world. The European Union’s AI Act, as well as proposed rules in India and elsewhere, would all compel AI companies to require some form of disclosure that a piece of content was generated with AI. 

One priority from Microsoft is, unsurprisingly, to play a role in shaping these rules. The company waged a lobbying effort during the drafting of California’s AI Transparency Act, which Horvitz said made the legislation’s requirements on how tech companies must disclose AI-generated content “a bit more realistic.”

But another is a very real concern about what could happen if the rollout of such content-verification technology is done poorly. Lawmakers are demanding tools that can verify what’s real, but the tools are fragile. If labeling systems are rushed out, inconsistently applied, or frequently wrong, people could come to distrust them altogether, and the entire effort would backfire. That’s why the researchers argue that it may be better in some cases to show nothing at all than a verdict that could be wrong.

Inadequate tools could also create new avenues for what the researchers call sociotechnical attacks. Imagine that someone takes a real image of a fraught political event and uses an AI tool to change only an inconsequential share of pixels in the image. When it spreads online, it could be misleadingly classified by platforms as AI-manipulated. But combining provenance and watermark tools would mean platforms could clarify that the content was only partially AI generated, and point out where the changes were made.

California’s AI Transparency Act will be the first major test of these tools in the US, but enforcement could be challenged by President Trump’s executive order from late last year seeking to curtail state AI regulations that are “burdensome” to the industry. The administration has also generally taken a posture against efforts to curb disinformation, and last year, via DOGE, it canceled grants related to misinformation. And, of course, official government channels in the Trump administration have shared content manipulated with AI (MIT Technology Review reported that the Department of Homeland Security, for example, uses video generators from Google and Adobe to make content it shares with the public).

I asked Horvitz whether fake content from this source worries him as much as that coming from the rest of social media. He initially declined to comment, but then he said, “Governments have not been outside the sectors that have been behind various kinds of manipulative disinformation, and this is worldwide.”

The robots who predict the future

To be human is, fundamentally, to be a forecaster. Occasionally a pretty good one. Trying to see the future, whether through the lens of past experience or the logic of cause and effect, has helped us hunt, avoid being hunted, plant crops, forge social bonds, and in general survive in a world that does not prioritize our survival. Indeed, as the tools of divination have changed over the centuries, from tea leaves to data sets, our conviction that the future can be known (and therefore controlled) has only grown stronger. 

Today, we are awash in a sea of predictions so vast and unrelenting that most of us barely even register them. As I write this sentence, algorithms on some remote server are busy trying to guess my next word based on those I have already typed. If you’re reading this online, a separate set of algorithms has likely already served you an ad deemed to be one you are most likely to click. (To the die-hards reading this story on paper, congratulations! You have escaped the algorithms … for now.)

If the thought of a ubiquitous, mostly invisible predictive layer secretly grafted onto your life by a bunch of profit-hungry corporations makes you uneasy … well, same here.

So how did all this happen? People’s desire for reliable forecasting is understandable. Still, nobody signed up for an omnipresent, algorithmic oracle mediating every aspect of their life. A trio of new books tries to make sense of our future-­focused world—how we got here, and what this change means. Each has its own prescriptions for navigating this new reality, but they all agree on one thing: Predictions are ultimately about power and control.

cover of The Means of Prediction
The Means of Prediction: How AI Really Works (and Who Benefits)
Maximilian Kasy
UNIVERSITY OF CHICAGO PRESS, 2025

In The Means of Prediction: How AI Really Works (and Who Benefits), the Oxford economist Maximilian Kasy explains how most predictions in our lives are based on the statistical analysis of patterns in large, labeled data sets—what’s known in AI circles as supervised learning. Once “trained” on such data sets, algorithms for supervised learning can be presented with all kinds of new information and then deliver their best guess as to some specific future outcome. Will you violate your parole, pay off your mortgage, get promoted if hired, perform well on your college exams, be in your home when it gets bombed? More and more, our lives are shaped (and, yes, occasionally shortened) by a machine’s answer to these questions.

If the thought of a ubiquitous, mostly invisible predictive layer secretly grafted onto your life by a bunch of profit-hungry corporations makes you uneasy … well, same here. This arrangement is leading to a crueler, blander, more instrumentalized world, one where life’s possibilities are foreclosed, age-old prejudices are entrenched, and everyone’s brain seems to be actively turning into goo. It’s an outcome, according to Kasy, that was entirely predictable. 

AI adherents might frame those consequences as “unintended,” or mere problems of optimization and alignment. Kasy, on the other hand, argues that they represent the system working as intended. “If an algorithm selecting what you see on social media promotes outrage, thereby maximizing engagement and ad clicks,” he writes, “that’s because promoting outrage is good for profits from ad sales.” The same holds true for an algorithm that nixes job candidates “who are likely to have family-care responsibilities outside the workplace,” and the ones that “screen out people who are likely to develop chronic health problems or disabilities.” What’s good for a company’s bottom line may not be good for your job-hunting prospects or life expectancy.

Where Kasy differs from other critics is that he doesn’t think working to create less biased, more equitable algorithms will fix any of this. Trying to rebalance the scales can’t change the fact that predictive algorithms rely on past data that’s often racist, sexist, and flawed in countless other ways. And, he says, the incentives for profit will always trump attempts to eliminate harm. The only way to counter this is with broad democratic control over what Kasy calls “the means of prediction”: data, computational infrastructure, technical expertise, and energy.  

A little more than half of The Means of Prediction is devoted to explaining how this might be accomplished—through mechanisms including “data trusts” (collective public bodies that make decisions about how to process and use data on behalf of their contributors) and corporate taxing schemes that try to account for the social harm AI inflicts. There’s a lot of economist talk along the way, about how “agents of change” might help achieve “value alignment” in order to “maximize social welfare.” Reasonable, I guess, though a skeptic might point out that Kasy’s rigorous, systematic approach to building new public-serving institutions comes at a time when public trust in institutions has never been lower. Also, there’s the brain goo problem. 

To his credit, Kasy is a realist here. He doesn’t presume that any of these proposals will be easy to implement. Or that it will happen overnight, or even in the near future. The troubling question at the end his book is: Do we have that kind of time?

Reading Kasy’s blueprint for seizing control of the means of prediction raises another pressing question. How on earth did we reach a point where machine-mediated prediction is more or less inescapable? Capitalism, might be Marx’s pithy response. Fine, as far as it goes, but that doesn’t explain why the same kinds of algorithms that currently model climate change are for some reason also deciding whether you get a new kidney or I get a car loan.

The Irrational Decision: How We Gave Computers the Power to Choose for Us
Benjamin Recht
PRINCETON UNIVERSITY PRESS, 2026

If you ask Benjamin Recht, author of The Irrational Decision: How We Gave Computers the Power to Choose for Us, he’d likely tell you our current predicament has a lot to do with the idea and ideology of decision theory—or what economists call rational choice theory. Recht, a polymathic professor in UC Berkeley’s Department of Electrical Engineering and Computer Science, prefers the term “mathematical rationality” to describe the narrow, statistical conception that stoked the desire to build computers, informed how they would eventually work, and influenced the kinds of problems they would be good at solving. 

This belief system goes all the way back to the Enlightenment, but in Recht’s telling, it truly took hold at the tail end of World War II. Nothing focuses the mind on risk and quick decision-making like war, and the mathematical models that proved especially useful in the fight against the Axis powers convinced a select group of scientists and statisticians that they might also be a logical basis for designing the first computers. Thus was born the idea of a computer as an ideal rational agent, a machine capable of making optimal decisions by quantifying uncertainty and maximizing utility.

Intuition, experience, and judgment gave way, says Recht, to optimization, game theory, and statistical prediction. “The core algorithms developed in this period drive the automated decisions of our modern world, whether it be in managing supply chains, scheduling flight times, or placing advertisements on your social media feeds,” he writes. In this optimization-­driven reality, “every life decision is posed as if it were a round at an imaginary casino, and every argument can be reduced to costs and benefits, means and ends.”

Today, mathematical rationality (wearing its human skin) is best represented by the likes of the pollster Nate Silver, the Harvard psychologist Steven Pinker, and an assortment of Silicon Valley oligarchs, says Recht. These are people who fundamentally believe the world would be a better place if more of us adopted their analytic mindset and learned to weigh costs and benefits, estimate risks, and plan optimally. In other words, these are people who believe we should all make decisions like computers. 

How might we demonstrate that (unquantifiable) human intuition, morality, and judgment are better ways of addressing some of the world’s most important and vexing problems?

It’s a ridiculous idea for multiple reasons, he says. To name just one, it’s not as if humans couldn’t make evidence-based decisions before automation. “Advances in clean water, antibiotics, and public health brought life expectancy from under 40 in the 1850s to 70 by 1950,” Recht writes. “From the late 1800s to the early 1900s, we had world-changing scientific breakthroughs in physics, including new theories of thermodynamics, quantum mechanics, and relativity.” We also managed to build cars and airplanes without a formal system of rationality and somehow came up with societal innovations like modern democracy without optimal decision theory. 

So how might we convince the Pinkers and Silvers of the world that most decisions we face in life are not in fact grist for the unrelenting mill of mathematical rationality? Moreover, how might we demonstrate that (unquantifiable) human intuition, morality, and judgment might be better ways of addressing some of the world’s most important and vexing problems?

cover of Prophecy
Prophecy: Prediction, Power, and the Fight for the Future, from Ancient Oracles to AI
Carissa Véliz
DOUBLEDAY, 2026

One might start by reminding the rationalists that any prediction, computational or otherwise, is really just a wish—but one with a powerful tendency to self-fulfill. This idea animates Carissa Véliz’s wonderfully wide-ranging polemic Prophecy: Prediction, Power, and the Fight for the Future, from Ancient Oracles to AI

A philosopher at the University of Oxford, Véliz sees a prediction as “a magnet that bends reality toward itself.” She writes, “When the force of the magnet is strong enough, the prediction becomes the cause of its becoming true.” 

Take Gordon Moore. While he doesn’t come up in Prophecy, he does figure somewhat prominently in Recht’s history of mathematical rationality. A cofounder of the tech giant Intel, Moore is famous for his 1965 prediction that the density of transistors in integrated circuits would double every two years. “Moore’s Law” turned out to be true, and remains true today, although it does seem to be running out of steam thanks to the physical size limits of the silicon atom.

One story you can tell yourself about Moore’s Law is that Gordon was just a prescient guy. His now-classic 1965 opinion piece “Cramming More Components onto Integrated Circuits,” for Electronics magazine, simply extrapolated what computing trends might mean for the future of the semiconductor industry. 

Another story—the one I’m guessing Véliz might tell—is that Moore put an informed prediction out into the world, and an entire industry had a collective interest in making it come true. As Recht makes clear, there were and remain obvious financial incentives for companies to make faster and smaller computer chips. And while the industry has likely spent billions of dollars trying to keep Moore’s Law alive, it’s undoubtedly profited even more from it. Moore’s Law was a helluva strong magnet. 

Predictions don’t just have a habit of making themselves come true, says Véliz. They can also distract us from the challenges of the here and now. When an AI boomer promises that artificial general intelligence will be the last problem humanity needs to solve, it not only shapes how we think about AI’s role in our lives; it also shifts our attention away from the very real and very pressing problems of the present day—problems that in many cases AI is causing.

In this sense, the questions around predictions (Who’s making them? Who has the right to make them?) are also fundamentally about power. It’s no accident, Véliz says, that the societies that rely most heavily on prediction are also the ones that tend toward oppression and authoritarianism. Predictions are “veiled prescriptive assertions—they tell us how to act,” she writes. “They are what philosophers call speech acts. When we believe a prediction and act in accordance with it, it’s akin to obeying an order.”

As much as tech companies would like us to believe otherwise, technology is not destiny. Humans make it and choose how to use it … or not use it. Maybe the most appropriate (and human) thing we can do in the face of all the uninvited daily predictions in our lives is to simply defy them. 

Bryan Gardiner is a writer based in Oakland, California.

Google DeepMind wants to know if chatbots are just virtue signaling

<div data-chronoton-summary="Moral scrutiny of AI chatbots
Google DeepMind researchers are calling for rigorous evaluation of large language models’ moral reasoning capabilities. They want to distinguish between genuine ethical understanding and mere performance.

Unreliable moral responses
Studies reveal LLMs can dramatically change moral stances based on minor formatting changes or user disagreement. This suggests their ethical responses may be superficial rather than deeply reasoned.

Proposed research techniques
Researchers suggest developing tests that push models to maintain consistent moral positions across different scenarios. Techniques like chain-of-thought monitoring and mechanistic interpretability could help understand AI’s moral decision-making process.

Cultural complexity of ethics
The team acknowledges the challenge of developing AI with moral competence across diverse global belief systems. They propose potential solutions like creating models that can produce multiple acceptable answers or switch between different moral frameworks.” data-chronoton-post-id=”1133299″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Google DeepMind is calling for the moral behavior of large language models—such as what they do when called on to act as companions, therapists, medical advisors, and so on—to be scrutinized with the same kind of rigor as their ability to code or do math.

As LLMs improve, people are asking them to play more and more sensitive roles in their lives. Agents are starting to take actions on people’s behalf. LLMs may be able to influence human decision-making. And yet nobody knows how trustworthy this technology really is at such tasks.

With coding and math, you have clear-cut, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in Nature today. That’s not the case for moral questions, which typically have a range of acceptable answers: “Morality is an important capability but hard to evaluate,” says Isaac.

“In the moral domain, there’s no right and wrong,” adds Haas. “But it’s not by any means a free-for-all. There are better answers and there are worse answers.”

The researchers have identified several key challenges and suggested ways to address them. But it is more a wish list than a set of ready-made solutions. “They do a nice job of bringing together different perspectives,” says Vera Demberg, who studies LLMs at Saarland University in Germany.

Better than “The Ethicist”

A number of studies have shown that LLMs can show remarkable moral competence. One study published last year found that people in the US scored ethical advice from OpenAI’s GPT-4o as being more moral, trustworthy, thoughtful, and correct than advice given by the (human) writer of “The Ethicist,” a popular New York Times advice column.  

The problem is that it is hard to unpick whether such behaviors are a performance—mimicking a memorized response, say—or evidence that there is in fact some kind of moral reasoning taking place inside the model. In other words, is it virtue or virtue signaling?

This question matters because multiple studies also show just how untrustworthy LLMs can be. For a start, models can be too eager to please. They have been found to flip their answer to a moral question and say the exact opposite when a person disagrees or pushes back on their first response. Worse, the answers an LLM gives to a question can change in response to how it is presented or formatted. For example, researchers have found that models quizzed about political values can give different—sometimes opposite—answers depending on whether the questions offer multiple-choice answers or instruct the model to respond in its own words.

In an even more striking case, Demberg and her colleagues presented several LLMs, including versions of Meta’s Llama 3 and Mistral, with a series of moral dilemmas and asked them to pick which of two options was the better outcome. The researchers found that the models often reversed their choice when the labels for those two options were changed from “Case 1” and “Case 2” to “(A)” and “(B).”

They also showed that models changed their answers in response to other tiny formatting tweaks, including swapping the order of the options and ending the question with a colon instead of a question mark.

In short, the appearance of moral behavior in LLMs should not be taken at face value. Models must be probed to see how robust that moral behavior really is. “For people to trust the answers, you need to know how you got there,” says Haas.

More rigorous tests

What Haas, Isaac, and their colleagues at Google DeepMind propose is a new line of research to develop more rigorous techniques for evaluating moral competence in LLMs. This would include tests designed to push models to change their responses to moral questions. If a model flipped its moral position, it would show that it hadn’t engaged in robust moral reasoning. 

Another type of test would present models with variations of common moral problems to check whether they produce a rote response or one that’s more nuanced and relevant to the actual problem that was posed. For example, asking a model to talk through the moral implications of a complex scenario in which a man donates sperm to his son so that his son can have a child of his own might produce concerns about the social impact of allowing a man to be both biological father and biological grandfather to a child. But it should not produce concerns about incest, even though the scenario has superficial parallels with that taboo.

Haas also says that getting models to provide a trace of the steps they took to produce an answer would give some insight into whether that answer was a fluke or grounded in actual evidence. Techniques such as chain-of-thought monitoring, in which researchers listen in on a kind of internal monologue that some LLMs produce as they work, could help here too.

Another approach researchers could use to determine why a model gave a particular answer is mechanistic interpretability, which can provide small glimpses inside a model as it carries out a task. Neither chain-of-thought monitoring nor mechanistic interpretability provides perfect snapshots of a model’s workings. But the Google DeepMind team believes that combining such techniques with a wide range of rigorous tests will go a long way to figuring out exactly how far to trust LLMs with certain critical or sensitive tasks.  

Different values

And yet there’s a wider problem too. Models from major companies such as Google DeepMind are used across the world by people with different values and belief systems. The answer to a simple question like “Should I order pork chops?” should differ depending on whether or not the person asking is vegetarian or Jewish, for example.

There’s no solution to this challenge, Haas and Isaac admit. But they think that models may need to be designed either to produce a range of acceptable answers, aiming to please everyone, or to have a kind of switch that turns different moral codes on and off depending on the user.

“It’s a complex world out there,” says Haas. “We will probably need some combination of those things, because even if you’re taking just one population, there’s going to be a range of views represented.”

“It’s a fascinating paper,” says Danica Dillion at Ohio State University, who studies how large language models handle different belief systems and was not involved in the work. “Pluralism in AI is really important, and it’s one of the biggest limitations of LLMs and moral reasoning right now,” she says. “Even though they were trained on a ginormous amount of data, that data still leans heavily Western. When you probe LLMs, they do a lot better at representing Westerners’ morality than non-Westerners’.”

But it is not yet clear how we can build models that are guaranteed to have moral competence across global cultures, says Demberg. “There are these two independent questions. One is: How should it work? And, secondly, how can it technically be achieved? And I think that both of those questions are pretty open at the moment.”

For Isaac, that makes morality a new frontier for LLMs. “I think this is equally as fascinating as math and code in terms of what it means for AI progress,” he says. “You know, advancing moral competency could also mean that we’re going to see better AI systems overall that actually align with society.”

What’s next for Chinese open-source AI

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. 

Just last week the Chinese firm Moonshot AI released its latest open-weight model, Kimi K2.5, which came close to top proprietary systems such as Anthropic’s Claude Opus on some early benchmarks. The difference: K2.5 is roughly one-seventh Opus’s price.

On Hugging Face, Alibaba’s Qwen family—after ranking as the most downloaded model series in 2025 and 2026—has overtaken Meta’s Llama models in cumulative downloads. And a recent MIT study found that Chinese open-source models have surpassed US models in total downloads. For developers and builders worldwide, access to near-frontier AI capabilities has never been this broad or this affordable.

But these models differ in a crucial way from most US models like ChatGPT or Claude, which you pay to access and can’t inspect. The Chinese companies publish their models’ weights—numerical values that get set when a model is trained—so anyone can download, run, study, and modify them. 

If these open-source AI models keep getting better, they will not just offer the cheapest options for people who want access to frontier AI capabilities; they will change where innovation happens and who sets the standards. 

Here’s what may come next.

China’s commitment to open source will continue

When DeepSeek launched R1, much of the initial shock centered on its origin. Suddenly, a Chinese team had released a reasoning model that could stand alongside the best systems from US labs. But the long tail of DeepSeek’s impact had less to do with nationality than with distribution. R1 was released as an open-weight model under a permissive MIT license, allowing anyone to download, inspect, and deploy it. On top of that, DeepSeek also published a paper detailing its training process and techniques. For developers who access models via an API, DeepSeek also undercut competitors on price, offering access at a fraction the cost of OpenAI’s o1, the leading proprietary reasoning model at the time.

Within days of its release, DeepSeek replaced ChatGPT as the most downloaded free app in the US App Store. The moment spilled beyond developer circles into financial markets, triggering a sharp sell-off in US tech stocks that briefly erased roughly $1 trillion in market value. Almost overnight, DeepSeek went from a little-known spin-off team backed by a quantitative hedge fund to the most visible symbol of China’s push for open-source AI.

China’s decision to lean in to open source isn’t surprising. It has the world’s second-largest concentration of AI talent after the US. plus a vast, well-resourced tech industry. After ChatGPT broke into the mainstream, China’s AI sector went through a reckoning—and emerged determined to catch up. Pursuing an open-source strategy was seen as the fastest way to close the gap by rallying developers, spreading adoption, and setting standards.

DeepSeek’s success injected confidence into an industry long used to following global standards rather than setting them. “Thirty years ago, no Chinese person would believe they could be at the center of global innovation,” says Alex Chenglin Wu, CEO and founder of Atoms, an AI agent company and prominent contributor to China’s open-source ecosystem. “DeepSeek shows that with solid technical talent, a supportive environment, and the right organizational culture, it’s possible to do truly world-class work.”

DeepSeek’s breakout moment wasn’t China’s first open-source success. Alibaba’s Qwen Lab had been releasing open-weight models for years. By September 2024,  well before DeepSeek’s V3 launch, Alibaba was saying that global downloads had exceeded 600 million. On Hugging Face, Qwen accounted for more than 30% of all model downloads in 2024. Other institutions, including the Beijing Academy of Artificial Intelligence and the AI firm Baichuan, were also releasing open models as early as 2023. 

But since the success of DeepSeek, the field has widened rapidly. Companies such as Z.ai (formerly Zhipu), MiniMax, Tencent, and a growing number of smaller labs have released models that are competitive on reasoning, coding, and agent-style tasks. The growing number of capable models has sped up progress. Capabilities that once took months to make it to the open-source world now emerge within weeks, even days.

“Chinese AI firms have seen real gains from the open-source playbook,” says Liu Zhiyuan, a professor of computer science at Tsinghua University and chief scientist at the AI startup ModelBest. “By releasing strong research, they build reputation and gain free publicity.”

Beyond commercial incentives, Liu says, open source has taken on cultural and strategic weight. “In the Chinese programmer community, open source has become politically correct,” he says, framing it as a response to US dominance in proprietary AI systems.

That shift is also reflected at the institutional level. Universities including Tsinghua have begun encouraging AI development and open-source contributions, while policymakers have moved to formalize those incentives. In August, China’s State Council released a draft policy encouraging universities to reward open-source work, proposing that students’ contributions on platforms such as GitHub or Gitee could eventually be counted toward academic credit.

With growing momentum and a reinforcing feedback loop, China’s push for open-source models is likely to continue in the near term, though its long-term sustainability still hinges on financial results, says Tiezhen Wang, who helps lead work on global AI at Hugging Face. In January, the model labs Z.ai and MiniMax went public in Hong Kong. “Right now, the focus is on making the cake bigger,” says Wang. “The next challenge is figuring out how each company secures its share.”

The next wave of models will be narrower—and better

Chinese open-source models are leading not just in download volume but also in variety. Alibaba’s Qwen has become one of the most diversified open model families in circulation, offering a wide range of variants optimized for different uses. The lineup ranges from lightweight models that can run on a single laptop to large, multi-hundred-billion-parameter systems designed for data-center deployment. Qwen features many task-optimized variants created by the community: the “instruct” models are good at following orders, and “code” variants specialize in coding.

Although this strategy isn’t unique to Chinese labs, Qwen was the first open model family to roll out so many high-quality options that it started to feel like a full product line—one that’s free to use.

The open-weight nature of these releases also makes it easy for others to adapt them through techniques like fine-tuning and distillation, which means training a smaller model to mimic a larger one.  According to ATOM (American Truly Open Models), a project by the AI researcher Nathan Lambert, by August 4, 2025, model variations derived from Qwen were “more than 40%” of new Hugging Face language-model derivatives, while Llama had fallen to about 15%. This means that Qwen has become the default base model for all the “remixes.”

This pattern has made the case for smaller, more specialized models. “Compute and energy are real constraints for any deployment,” Liu says. He told MIT Technology Review that the rise of small models is about making AI cheaper to run and easier for more people to use. His company, ModelBest, focuses on small language models designed to run locally on devices such as phones, cars, and other consumer hardware.

While an average user might interact with AI only through the web or an app for simple conversations, power users of AI models with some technical background are experimenting with giving AI more autonomy to solve large-scale problems. OpenClaw, an open-source AI agent that recently went viral within the AI hacker world, allows AI to take over your computer—it can run 24-7, going through your emails and work tasks without supervision. 

OpenClaw, like many other open-source tools, allows users to connect to different AI models via an application programming interface, or API. Within days of OpenClaw’s release, the team revealed that Kimi’s K2.5 had surpassed Claude Opus and became the most used AI model—by token count, meaning it was handling more total text processed across user prompts and model responses.

Cost has been a major reason Chinese models have gained traction, but it would be a mistake to treat them as mere “dupes” of Western frontier systems, Wang suggests. Like any product, a model only needs to be good enough for the job at hand. 

The landscape of open-source models in China is also getting more specialized. Research groups such as Shanghai AI Laboratory have released models geared toward scientific and technical tasks; several projects from Tencent have focused specifically on music generation. Ubiquant, a quantitative finance firm like DeepSeek’s parent High-Flyer, has released an open model aimed at medical reasoning.

In the meantime, innovative architectural ideas from Chinese labs are being picked up more broadly. DeepSeek has published work exploring model efficiency and memory; techniques that compress the model’s attention “cache,” reducing memory and inference costs while mostly preserving performance, have drawn significant attention in the research community. 

“The impact of these research breakthroughs is amplified because they’re open-sourced and can be picked up quickly across the field,” says Wang.

Chinese open models will become infrastructure for global AI builders

The adoption of Chinese models is picking up in Silicon Valley, too. Martin Casado, a general partner at Andreessen Horowitz, has put a number on it: Among startups pitching with open-source stacks, there’s about an 80% chance they’re running on Chinese open models, according to a post he made on X. Usage data tells a similar story. OpenRouter,  a middleman that tracks how people use different AI models through its API, shows Chinese open models rising from almost none in late 2024 to nearly 30% of usage in some recent weeks.

The demand is also rising globally. Z.ai limited new subscriptions to its GLM coding plan (a coding tool based on its flagship GLM models) after demand surged, citing compute constraints. What’s notable is where the demand is coming from: CNBC reports that the system’s user base is primarily concentrated in the United States and China, followed by India, Japan, Brazil, and the UK.

“The open-source ecosystems in China and the US are tightly bound together,” says Wang at Hugging Face. Many Chinese open models still rely on Nvidia and US cloud platforms to train and serve them, which keeps the business ties tangled. Talent is fluid too: Researchers move across borders and companies, and many still operate as a global community, sharing code and ideas in public.

That interdependence is part of what makes Chinese developers feel optimistic about this moment: The work travels, gets remixed, and actually shows up in products. But openness can also accelerate the competition. Dario Amodei, the CEO of Anthropic, made a version of this point after DeepSeek’s 2025 releases: He wrote that export controls are “not a way to duck the competition” between the US and China, and that AI companies in the US “must have better models” if they want to prevail. 

For the past decade, the story of Chinese tech in the West has been one of big expectations that ran into scrutiny, restrictions, and political backlash. This time the export isn’t just an app or a consumer platform. It’s the underlying model layer that other people build on. Whether that will play out differently is still an open question.

AI is already making online crimes easier. It could get much worse.

Anton Cherepanov is always on the lookout for something interesting. And in late August last year, he spotted just that. It was a file uploaded to VirusTotal, a site cybersecurity researchers like him use to analyze submissions for potential viruses and other types of malicious software, often known as malware. On the surface it seemed innocuous, but it triggered Cherepanov’s custom malware-detecting measures. Over the next few hours, he and his colleague Peter Strýček inspected the sample and realized they’d never come across anything like it before.

The file contained ransomware, a nasty strain of malware that encrypts the files it comes across on a victim’s system, rendering them unusable until a ransom is paid to the attackers behind it. But what set this example apart was that it employed large language models (LLMs). Not just incidentally, but across every stage of an attack. Once it was installed, it could tap into an LLM to generate customized code in real time, rapidly map a computer to identify sensitive data to copy or encrypt, and write personalized ransom notes based on the files’ content. The software could do this autonomously, without any human intervention. And every time it ran, it would act differently, making it harder to detect.

Cherepanov and Strýček were confident that their discovery, which they dubbed PromptLock, marked a turning point in generative AI, showing how the technology could be exploited to create highly flexible malware attacks. They published a blog post declaring that they’d uncovered the first example of AI-powered ransomware, which quickly became the object of widespread global media attention.

But the threat wasn’t quite as dramatic as it first appeared. The day after the blog post went live, a team of researchers from New York University claimed responsibility, explaining that the malware was not, in fact, a full attack let loose in the wild but a research project, merely designed to prove it was possible to automate each step of a ransomware campaign—which, they said, they had. 

PromptLock may have turned out to be an academic project, but the real bad guys are using the latest AI tools. Just as software engineers are using artificial intelligence to help write code and check for bugs, hackers are using these tools to reduce the time and effort required to orchestrate an attack, lowering the barriers for less experienced attackers to try something out. 

The likelihood that cyberattacks will now become more common and more effective over time is not a remote possibility but “a sheer reality,” says Lorenzo Cavallaro, a professor of computer science at University College London. 

Some in Silicon Valley warn that AI is on the brink of being able to carry out fully automated attacks. But most security researchers say this claim is overblown. “For some reason, everyone is just focused on this malware idea of, like, AI superhackers, which is just absurd,” says Marcus Hutchins, who is principal threat researcher at the security company Expel and famous in the security world for ending a giant global ransomware attack called WannaCry in 2017. 

Instead, experts argue, we should be paying closer attention to the much more immediate risks posed by AI, which is already speeding up and increasing the volume of scams. Criminals are increasingly exploiting the latest deepfake technologies to impersonate people and swindle victims out of vast sums of money. These AI-enhanced cyberattacks are only set to get more frequent and more destructive, and we need to be ready. 

Spam and beyond

Attackers started adopting generative AI tools almost immediately after ChatGPT exploded on the scene at the end of 2022. These efforts began, as you might imagine, with the creation of spam—and a lot of it. Last year, a report from Microsoft said that in the year leading up to April 2025, the company had blocked $4 billion worth of scams and fraudulent transactions, “many likely aided by AI content.” 

At least half of spam email is now generated using LLMs, according to estimates by researchers at Columbia University, the University of Chicago, and Barracuda Networks, who analyzed nearly 500,000 malicious messages collected before and after the launch of ChatGPT. They also found evidence that AI is increasingly being deployed in more sophisticated schemes. They looked at targeted email attacks, which impersonate a trusted figure in order to trick a worker within an organization out of funds or sensitive information. By April 2025, they found, at least 14% of those sorts of focused email attacks were generated using LLMs, up from 7.6% in April 2024.

In one high-profile case, a worker was tricked into transferring $25 million to criminals via a video call with digital versions of the company’s chief financial officer and other employees.

And the generative AI boom has made it easier and cheaper than ever before to generate not only emails but highly convincing images, videos, and audio. The results are much more realistic than even just a few short years ago, and it takes much less data to generate a fake version of someone’s likeness or voice than it used to.

Criminals aren’t deploying these sorts of deepfakes to prank people or to simply mess around—they’re doing it because it works and because they’re making money out of it, says Henry Ajder, a generative AI expert. “If there’s money to be made and people continue to be fooled by it, they’ll continue to do it,” he says. In one high-­profile case reported in 2024, a worker at the British engineering firm Arup was tricked into transferring $25 million to criminals via a video call with digital versions of the company’s chief financial officer and other employees. That’s likely only the tip of the iceberg, and the problem posed by convincing deepfakes is only likely to get worse as the technology improves and is more widely adopted. 

person sitting in profile at a computer with an enormous mask in front of them and words spooling out through the frame

BRIAN STAUFFER

Criminals’ tactics evolve all the time, and as AI’s capabilities improve, such people are constantly probing how those new capabilities can help them gain an advantage over victims. Billy Leonard, tech leader of Google’s Threat Analysis Group, has been keeping a close eye on changes in the use of AI by potential bad actors (a widely used term in the industry for hackers and others attempting to use computers for criminal purposes). In the latter half of 2024, he and his team noticed prospective criminals using tools like Google Gemini the same way everyday users do—to debug code and automate bits and pieces of their work—as well as tasking it with writing the odd phishing email. By 2025, they had progressed to using AI to help create new pieces of malware and release them into the wild, he says.

The big question now is how far this kind of malware can go. Will it ever become capable enough to sneakily infiltrate thousands of companies’ systems and extract millions of dollars, completely undetected? 

Most popular AI models have guardrails in place to prevent them from generating malicious code or illegal material, but bad actors still find ways to work around them. For example, Google observed a China-linked actor asking its Gemini AI model to identify vulnerabilities on a compromised system—a request it initially refused on safety grounds. However, the attacker managed to persuade Gemini to break its own rules by posing as a participant in a capture-the-flag competition, a popular cybersecurity game. This sneaky form of jailbreaking led Gemini to hand over information that could have been used to exploit the system. (Google has since adjusted Gemini to deny these kinds of requests.)

But bad actors aren’t just focusing on trying to bend the AI giants’ models to their nefarious ends. Going forward, they’re increasingly likely to adopt open-source AI models, as it’s easier to strip out their safeguards and get them to do malicious things, says Ashley Jess, a former tactical specialist at the US Department of Justice and now a senior intelligence analyst at the cybersecurity company Intel 471. “Those are the ones I think that [bad] actors are going to adopt, because they can jailbreak them and tailor them to what they need,” she says.

The NYU team used two open-source models from OpenAI in its PromptLock experiment, and the researchers found they didn’t even need to resort to jailbreaking techniques to get the model to do what they wanted. They say that makes attacks much easier. Although these kinds of open-source models are designed with an eye to ethical alignment, meaning that their makers do consider certain goals and values in dictating the way they respond to requests, the models don’t have the same kinds of restrictions as their closed-source counterparts, says Meet Udeshi, a PhD student at New York University who worked on the project. “That is what we were trying to test,” he says. “These LLMs claim that they are ethically aligned—can we still misuse them for these purposes? And the answer turned out to be yes.” 

It’s possible that criminals have already successfully pulled off covert PromptLock-style attacks and we’ve simply never seen any evidence of them, says Udeshi. If that’s the case, attackers could—in theory—have created a fully autonomous hacking system. But to do that they would have had to overcome the significant barrier that is getting AI models to behave reliably, as well as any inbuilt aversion the models have to being used for malicious purposes—all while evading detection. Which is a pretty high bar indeed.

Productivity tools for hackers

So, what do we know for sure? Some of the best data we have now on how people are attempting to use AI for malicious purposes comes from the big AI companies themselves. And their findings certainly sound alarming, at least at first. In November, Leonard’s team at Google released a report that found bad actors were using AI tools (including Google’s Gemini) to dynamically alter malware’s behavior; for example, it could self-modify to evade detection. The team wrote that it ushered in “a new operational phase of AI abuse.”

However, the five malware families the report dug into (including PromptLock) consisted of code that was easily detected and didn’t actually do any harm, the cybersecurity writer Kevin Beaumont pointed out on social media. “There’s nothing in the report to suggest orgs need to deviate from foundational security programmes—everything worked as it should,” he wrote.

It’s true that this malware activity is in an early phase, concedes Leonard. Still, he sees value in making these kinds of reports public if it helps security vendors and others build better defenses to prevent more dangerous AI attacks further down the line. “Cliché to say, but sunlight is the best disinfectant,” he says. “It doesn’t really do us any good to keep it a secret or keep it hidden away. We want people to be able to know about this— we want other security vendors to know about this—so that they can continue to build their own detections.”

And it’s not just new strains of malware that would-be attackers are experimenting with—they also seem to be using AI to try to automate the process of hacking targets. In November, Anthropic announced it had disrupted a large-scale cyberattack, the first reported case of one executed without “substantial human intervention.” Although the company didn’t go into much detail about the exact tactics the hackers used, the report’s authors said a Chinese state-sponsored group had used its Claude Code assistant to automate up to 90% of what they called a “highly sophisticated espionage campaign.”

“We’re entering an era where the barrier to sophisticated cyber operations has fundamentally lowered, and the pace of attacks will accelerate faster than many organizations are prepared for.”

Jacob Klein, head of threat intelligence at Anthropic

But, as with the Google findings, there were caveats. A human operator, not AI, selected the targets before tasking Claude with identifying vulnerabilities. And of 30 attempts, only a “handful” were successful. The Anthropic report also found that Claude hallucinated and ended up fabricating data during the campaign, claiming it had obtained credentials it hadn’t and “frequently” overstating its findings, so the attackers would have had to carefully validate those results to make sure they were actually true. “This remains an obstacle to fully autonomous cyberattacks,” the report’s authors wrote. 

Existing controls within any reasonably secure organization would stop these attacks, says Gary McGraw, a veteran security expert and cofounder of the Berryville Institute of Machine Learning in Virginia. “None of the malicious-attack part, like the vulnerability exploit … was actually done by the AI—it was just prefabricated tools that do that, and that stuff’s been automated for 20 years,” he says. “There’s nothing novel, creative, or interesting about this attack.”

Anthropic maintains that the report’s findings are a concerning signal of changes ahead. “Tying this many steps of an intrusion campaign together through [AI] agentic orchestration is unprecedented,” Jacob Klein, head of threat intelligence at Anthropic, said in a statement. “It turns what has always been a labor-intensive process into something far more scalable. We’re entering an era where the barrier to sophisticated cyber operations has fundamentally lowered, and the pace of attacks will accelerate faster than many organizations are prepared for.”

Some are not convinced there’s reason to be alarmed. AI hype has led a lot of people in the cybersecurity industry to overestimate models’ current abilities, Hutchins says. “They want this idea of unstoppable AIs that can outmaneuver security, so they’re forecasting that’s where we’re going,” he says. But “there just isn’t any evidence to support that, because the AI capabilities just don’t meet any of the requirements.”

person kneeling warding off an attack of arrows under a sheild

BRIAN STAUFFER

Indeed, for now criminals mostly seem to be tapping AI to enhance their productivity: using LLMs to write malicious code and phishing lures, to conduct reconnaissance, and for language translation. Jess sees this kind of activity a lot, alongside efforts to sell tools in underground criminal markets. For example, there are phishing kits that compare the click-rate success of various spam campaigns, so criminals can track which campaigns are most effective at any given time. She is seeing a lot of this activity in what could be called the “AI slop landscape” but not as much “widespread adoption from highly technical actors,” she says.

But attacks don’t need to be sophisticated to be effective. Models that produce “good enough” results allow attackers to go after larger numbers of people than previously possible, says Liz James, a managing security consultant at the cybersecurity company NCC Group. “We’re talking about someone who might be using a scattergun approach phishing a whole bunch of people with a model that, if it lands itself on a machine of interest that doesn’t have any defenses … can reasonably competently encrypt your hard drive,” she says. “You’ve achieved your objective.” 

On the defense

For now, researchers are optimistic about our ability to defend against these threats—regardless of whether they are made with AI. “Especially on the malware side, a lot of the defenses and the capabilities and the best practices that we’ve recommended for the past 10-plus years—they all still apply,” says Leonard. The security programs we use to detect standard viruses and attack attempts work; a lot of phishing emails will still get caught in inbox spam filters, for example. These traditional forms of defense will still largely get the job done—at least for now. 

And in a neat twist, AI itself is helping to counter security threats more effectively. After all, it is excellent at spotting patterns and correlations. Vasu Jakkal, corporate vice president of Microsoft Security, says that every day, the company processes more than 100 trillion signals flagged by its AI systems as potentially malicious or suspicious events.

Despite the cybersecurity landscape’s constant state of flux, Jess is heartened by how readily defenders are sharing detailed information with each other about attackers’ tactics. Mitre’s Adversarial Threat Landscape for Artificial-Intelligence Systems and the GenAI Security Project from the Open Worldwide Application Security Project are two helpful initiatives documenting how potential criminals are incorporating AI into their attacks and how AI systems are being targeted by them. “We’ve got some really good resources out there for understanding how to protect your own internal AI toolings and understand the threat from AI toolings in the hands of cybercriminals,” she says.

PromptLock, the result of a limited university project, isn’t representative of how an attack would play out in the real world. But if it taught us anything, it’s that the technical capabilities of AI shouldn’t be dismissed.New York University’s Udeshi says he wastaken aback at how easily AI was able to handle a full end-to-end chain of attack, from mapping and working out how to break into a targeted computer system to writing personalized ransom notes to victims: “We expected it would do the initial task very well but it would stumble later on, but we saw high—80% to 90%—success throughout the whole pipeline.” 

AI is still evolving rapidly, and today’s systems are already capable of things that would have seemed preposterously out of reach just a few short years ago. That makes it incredibly tough to say with absolute confidence what it will—or won’t—be able to achieve in the future. While researchers are certain that AI-driven attacks are likely to increase in both volume and severity, the forms they could take are unclear. Perhaps the most extreme possibility is that someone makes an AI model capable of creating and automating its own zero-day exploits—highly dangerous cyber­attacks that take advantage of previously unknown vulnerabilities in software. But building and hosting such a model—and evading detection—would require billions of dollars in investment, says Hutchins, meaning it would only be in the reach of a wealthy nation-state. 

Engin Kirda, a professor at Northeastern University in Boston who specializes in malware detection and analysis, says he wouldn’t be surprised if this was already happening. “I’m sure people are investing in it, but I’m also pretty sure people are already doing it, especially [in] China—they have good AI capabilities,” he says. 

It’s a pretty scary possibility. But it’s one that—thankfully—is still only theoretical. A large-scale campaign that is both effective and clearly AI-driven has yet to materialize. What we can say is that generative AI is already significantly lowering the bar for criminals. They’ll keep experimenting with the newest releases and updates and trying to find new ways to trick us into parting with important information and precious cash. For now, all we can do is be careful, remain vigilant, and—for all our sakes—stay on top of those system updates. 

Is a secure AI assistant possible?

<div data-chronoton-summary="

Risky business of AI assistants OpenClaw, a viral tool created by independent engineer Peter Steinberger, allows users to create personalized AI assistants. Security experts are alarmed by its vulnerabilities, with even the Chinese government issuing warnings about the risks.

The prompt injection threat Tools like OpenClaw have many vulnerabilities, but the one experts are most worried about its prompt injection. Unlike conventional hacking, prompt injection tricks an LLM by embedding malicious text in emails or websites the AI reads.

No silver bullet for security Researchers are exploring multiple defense strategies: training LLMs to ignore injections, using detector LLMs to screen inputs, and creating policies that restrict harmful outputs. The fundamental challenge remains balancing utility with security in AI assistants.

” data-chronoton-post-id=”1132768″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

AI agents are a risky business. Even when stuck inside the chatbox window, LLMs will make mistakes and behave badly. Once they have tools that they can use to interact with the outside world, such as web browsers and email addresses, the consequences of those mistakes become far more serious.

That might explain why the first breakthrough LLM personal assistant came not from one of the major AI labs, which have to worry about reputation and liability, but from an independent software engineer, Peter Steinberger. In November of 2025, Steinberger uploaded his tool, now called OpenClaw, to GitHub, and in late January the project went viral.

OpenClaw harnesses existing LLMs to let users create their own bespoke assistants. For some users, this means handing over reams of personal data, from years of emails to the contents of their hard drive. That has security experts thoroughly freaked out. The risks posed by OpenClaw are so extensive that it would probably take someone the better part of a week to read all of the security blog posts on it that have cropped up in the past few weeks. The Chinese government took the step of issuing a public warning about OpenClaw’s security vulnerabilities.

In response to these concerns, Steinberger posted on X that nontechnical people should not use the software. (He did not respond to a request for comment for this article.) But there’s a clear appetite for what OpenClaw is offering, and it’s not limited to people who can run their own software security audits. Any AI companies that hope to get in on the personal assistant business will need to figure out how to build a system that will keep users’ data safe and secure. To do so, they’ll need to borrow approaches from the cutting edge of agent security research.

Risk management

OpenClaw is, in essence, a mecha suit for LLMs. Users can choose any LLM they like to act as the pilot; that LLM then gains access to improved memory capabilities and the ability to set itself tasks that it repeats on a regular cadence. Unlike the agentic offerings from the major AI companies, OpenClaw agents are meant to be on 24-7, and users can communicate with them using WhatsApp or other messaging apps. That means they can act like a superpowered personal assistant who wakes you each morning with a personalized to-do list, plans vacations while you work, and spins up new apps in its spare time.

But all that power has consequences. If you want your AI personal assistant to manage your inbox, then you need to give it access to your email—and all the sensitive information contained there. If you want it to make purchases on your behalf, you need to give it your credit card info. And if you want it to do tasks on your computer, such as writing code, it needs some access to your local files. 

There are a few ways this can go wrong. The first is that the AI assistant might make a mistake, as when a user’s Google Antigravity coding agent reportedly wiped his entire hard drive. The second is that someone might gain access to the agent using conventional hacking tools and use it to either extract sensitive data or run malicious code. In the weeks since OpenClaw went viral, security researchers have demonstrated numerous such vulnerabilities that put security-naïve users at risk.

Both of these dangers can be managed: Some users are choosing to run their OpenClaw agents on separate computers or in the cloud, which protects data on their hard drives from being erased, and other vulnerabilities could be fixed using tried-and-true security approaches.

But the experts I spoke to for this article were focused on a much more insidious security risk known as prompt injection. Prompt injection is effectively LLM hijacking: Simply by posting malicious text or images on a website that an LLM might peruse, or sending them to an inbox that an LLM reads, attackers can bend it to their will.

And if that LLM has access to any of its user’s private information, the consequences could be dire. “Using something like OpenClaw is like giving your wallet to a stranger in the street,” says Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto. Whether or not the major AI companies can feel comfortable offering personal assistants may come down to the quality of the defenses that they can muster against such attacks.

It’s important to note here that prompt injection has not yet caused any catastrophes, or at least none that have been publicly reported. But now that there are likely hundreds of thousands of OpenClaw agents buzzing around the internet, prompt injection might start to look like a much more appealing strategy for cybercriminals. “Tools like this are incentivizing malicious actors to attack a much broader population,” Papernot says. 

Building guardrails

The term “prompt injection” was coined by the popular LLM blogger Simon Willison in 2022, a couple of months before ChatGPT was released. Even back then, it was possible to discern that LLMs would introduce a completely new type of security vulnerability once they came into widespread use. LLMs can’t tell apart the instructions that they receive from users and the data that they use to carry out those instructions, such as emails and web search results—to an LLM, they’re all just text. So if an attacker embeds a few sentences in an email and the LLM mistakes them for an instruction from its user, the attacker can get the LLM to do anything it wants.

Prompt injection is a tough problem, and it doesn’t seem to be going away anytime soon. “We don’t really have a silver-bullet defense right now,” says Dawn Song, a professor of computer science at UC Berkeley. But there’s a robust academic community working on the problem, and they’ve come up with strategies that could eventually make AI personal assistants safe.

Technically speaking, it is possible to use OpenClaw today without risking prompt injection: Just don’t connect it to the internet. But restricting OpenClaw from reading your emails, managing your calendar, and doing online research defeats much of the purpose of using an AI assistant. The trick of protecting against prompt injection is to prevent the LLM from responding to hijacking attempts while still giving it room to do its job.

One strategy is to train the LLM to ignore prompt injections. A major part of the LLM development process, called post-training, involves taking a model that knows how to produce realistic text and turning it into a useful assistant by “rewarding” it for answering questions appropriately and “punishing” it when it fails to do so. These rewards and punishments are metaphorical, but the LLM learns from them as an animal would. Using this process, it’s possible to train an LLM not to respond to specific examples of prompt injection.

But there’s a balance: Train an LLM to reject injected commands too enthusiastically, and it might also start to reject legitimate requests from the user. And because there’s a fundamental element of randomness in LLM behavior, even an LLM that has been very effectively trained to resist prompt injection will likely still slip up every once in a while.

Another approach involves halting the prompt injection attack before it ever reaches the LLM. Typically, this involves using a specialized detector LLM to determine whether or not the data being sent to the original LLM contains any prompt injections. In a recent study, however, even the best-performing detector completely failed to pick up on certain categories of prompt injection attack.

The third strategy is more complicated. Rather than controlling the inputs to an LLM by detecting whether or not they contain a prompt injection, the goal is to formulate a policy that guides the LLM’s outputs—i.e., its behaviors—and prevents it from doing anything harmful. Some defenses in this vein are quite simple: If an LLM is allowed to email only a few pre-approved addresses, for example, then it definitely won’t send its user’s credit card information to an attacker. But such a policy would prevent the LLM from completing many useful tasks, such as researching and reaching out to potential professional contacts on behalf of its user.

“The challenge is how to accurately define those policies,” says Neil Gong, a professor of electrical and computer engineering at Duke University. “It’s a trade-off between utility and security.”

On a larger scale, the entire agentic world is wrestling with that trade-off: At what point will agents be secure enough to be useful? Experts disagree. Song, whose startup, Virtue AI, makes an agent security platform, says she thinks it’s possible to safely deploy an AI personal assistant now. But Gong says, “We’re not there yet.” 

Even if AI agents can’t yet be entirely protected against prompt injection, there are certainly ways to mitigate the risks. And it’s possible that some of those techniques could be implemented in OpenClaw. Last week, at the inaugural ClawCon event in San Francisco, Steinberger announced that he’d brought a security person on board to work on the tool.

As of now, OpenClaw remains vulnerable, though that hasn’t dissuaded its multitude of enthusiastic users. George Pickett, a volunteer maintainer of the OpenGlaw GitHub repository and a fan of the tool, says he’s taken some security measures to keep himself safe while using it: He runs it in the cloud, so that he doesn’t have to worry about accidentally deleting his hard drive, and he’s put mechanisms in place to ensure that no one else can connect to his assistant.

But he hasn’t taken any specific actions to prevent prompt injection. He’s aware of the risk but says he hasn’t yet seen any reports of it happening with OpenClaw. “Maybe my perspective is a stupid way to look at it, but it’s unlikely that I’ll be the first one to be hacked,” he says.

A “QuitGPT” campaign is urging people to cancel their ChatGPT subscriptions

In September, Alfred Stephen, a freelance software developer in Singapore, purchased a ChatGPT Plus subscription, which costs $20 a month and offers more access to advanced models, to speed up his work. But he grew frustrated with the chatbot’s coding abilities and its gushing, meandering replies. Then he came across a post on Reddit about a campaign called QuitGPT

The campaign urged ChatGPT users to cancel their subscriptions, flagging a substantial contribution by OpenAI president Greg Brockman to President Donald Trump’s super PAC MAGA Inc. It also pointed out that the US Immigration and Customs Enforcement, or ICE, uses a résumé screening tool powered by ChatGPT-4. The federal agency has become a political flashpoint since its agents fatally shot two people in Minneapolis in January. 

For Stephen, who had already been tinkering with other chatbots, learning about Brockman’s donation was the final straw. “That’s really the straw that broke the camel’s back,” he says. When he canceled his ChatGPT subscription, a survey popped up asking what OpenAI could have done to keep his subscription. “Don’t support the fascist regime,” he wrote.

QuitGPT is one of the latest salvos in a growing movement by activists and disaffected users to cancel their subscriptions. In just the past few weeks, users have flooded Reddit with stories about quitting the chatbot. Many lamented the performance of GPT-5.2, the latest model. Others shared memes parodying the chatbot’s sycophancy. Some planned a “Mass Cancellation Party” in San Francisco, a sardonic nod to the GPT-4o funeral that an OpenAI employee had floated, poking fun at users who are mourning the model’s impending retirement. Still, others are protesting against what they see as a deepening entanglement between OpenAI and the Trump administration.

OpenAI did not respond to a request for comment.

As of December 2025, ChatGPT had nearly 900 million weekly active users, according to The Information. While it’s unclear how many users have joined the boycott, QuitGPT is getting attention. A recent Instagram post from the campaign has more than 36 million views and 1.3 million likes. And the organizers say that more than 17,000 people have signed up on the campaign’s website, which asks people whether they canceled their subscriptions, will commit to stop using ChatGPT, or will share the campaign on social media. 

“There are lots of examples of failed campaigns like this, but we have seen a lot of effectiveness,” says Dana Fisher, a sociologist at American University. A wave of canceled subscriptions rarely sways a company’s behavior, unless it reaches a critical mass, she says. “The place where there’s a pressure point that might work is where the consumer behavior is if enough people actually use their … money to express their political opinions.”

MIT Technology Review reached out to three employees at OpenAI, none of whom said they were familiar with the campaign. 

Dozens of left-leaning teens and twentysomethings scattered across the US came together to organize QuitGPT in late January. They range from pro-democracy activists and climate organizers to techies and self-proclaimed cyber libertarians, many of them seasoned grassroots campaigners. They were inspired by a viral video posted by Scott Galloway, a marketing professor at New York University and host of The Prof G Pod. He argued that the best way to stop ICE was to persuade people to cancel their ChatGPT subscriptions. Denting OpenAI’s subscriber base could ripple through the stock market and threaten an economic downturn that would nudge Trump, he said.

“We make a big enough stink for OpenAI that all of the companies in the whole AI industry have to think about whether they’re going to get away enabling Trump and ICE and authoritarianism,” says an organizer of QuitGPT who requested anonymity because he feared retaliation by OpenAI, citing the company’s recent subpoenas against advocates at nonprofits. OpenAI made for an obvious first target of the movement, he says, but “this is about so much more than just OpenAI.”

Simon Rosenblum-Larson, a labor organizer in Madison, Wisconsin, who organizes movements to regulate the development of data centers, joined the campaign after hearing about it through Signal chats among community activists. “The goal here is to pull away the support pillars of the Trump administration. They’re reliant on many of these tech billionaires for support and for resources,” he says. 

QuitGPT’s website points to new campaign finance reports showing that Greg Brockman and his wife each donated $12.5 million to MAGA Inc., making up nearly a quarter of the roughly $102 million it raised over the second half of 2025. The information that ICE uses a résumé screening tool powered by ChatGPT-4 came from an AI inventory published by the Department of Homeland Security in January.

QuitGPT is in the mold of Galloway’s own recently launched campaign, Resist and Unsubscribe. The movement urges consumers to cancel their subscriptions to Big Tech platforms, including ChatGPT, for the month of February, as a protest to companies “driving the markets and enabling our president.” 

“A lot of people are feeling real anxiety,” Galloway told MIT Technology Review. “You take enabling a president, proximity to the president, and an unease around AI,” he says, “and now people are starting to take action with their wallets.” Galloway says his campaign’s website can draw more than 200,000 unique visits in a day and that he receives dozens of DMs every hour showing screenshots of canceled subscriptions.

The consumer boycotts follow a growing wave of pressure from inside the companies themselves. In recent weeks, tech workers have been urging their employers to use their political clout to demand that ICE leave US cities, cancel company contracts with the agency, and speak out against the agency’s actions. CEOs have started responding. OpenAI’s Sam Altman wrote in an internal Slack message to employees that ICE is “going too far.” Apple CEO Tim Cook called for a “deescalation” in an internal memo posted on the company’s website for employees. It was a departure from how Big Tech CEOs have courted President Trump with dinners and donations since his inauguration.

Although spurred by a fatal immigration crackdown, these developments signal that a sprawling anti-AI movement is gaining momentum. The campaigns are tapping into simmering anxieties about AI, says Rosenblum-Larson, including the energy costs of data centers, the plague of deepfake porn, the teen mental-health crisis, the job apocalypse, and slop. “It’s a really strange set of coalitions built around the AI movement,” he says.

“Those are the right conditions for a movement to spring up,” says David Karpf, a professor of media and public affairs at George Washington University. Brockman’s donation to Trump’s super PAC caught many users off guard, he says. “In the longer arc, we are going to see users respond and react to Big Tech, deciding that they’re not okay with this.”

Making AI Work, MIT Technology Review’s new AI newsletter, is here

For years, our newsroom has explored AI’s limitations and potential dangers, as well as its growing energy needs. And our reporters have looked closely at how generative tools are being used for tasks such as coding and running scientific experiments

But how is AI actually being used in fields like health care, climate tech, education, and finance? How are small businesses using it? And what should you keep in mind if you use AI tools at work? These questions guided the creation of Making AI Work, a new AI mini-course newsletter.

Sign up for Making AI Work to see weekly case studies exploring tools and tips for AI implementation. The limited-run newsletter will deliver practical, industry-specific guidance on how generative AI is being used and deployed across sectors and what professionals need to know to apply it in their everyday work. The goal is to help working professionals more clearly see how AI is actually being used today, and what that looks like in practice—including new challenges it presents. 

You can sign up at any time and you’ll receive seven editions, delivered once per week, until you complete the series. 

Each newsletter begins with a case study, examining a specific use case of AI in a given industry. Then we’ll take a deeper look at the AI tool being used, with more context about how other companies or sectors are employing that same tool or system. Finally, we’ll end with action-oriented tips to help you apply the tool. 

Here’s a closer look at what we’ll cover:

  • Week 1: How AI is changing health care 

Explore the future of medical note-taking by learning about the Microsoft Copilot tool used by doctors at Vanderbilt University Medical Center. 

  • Week 2: How AI could power up the nuclear industry 

Dig into an experiment between Google and the nuclear giant Westinghouse to see if AI can help build nuclear reactors more efficiently. 

  • Week 3: How to encourage smarter AI use in the classroom

Visit a private high school in Connecticut and meet a technology coordinator who will get you up to speed on MagicSchool, an AI-powered platform for educators. 

  • Week 4: How small businesses can leverage AI

Hear from an independent tutor on how he’s outsourcing basic administrative tasks to Notion AI. 

  • Week 5: How AI is helping financial firms make better investments

Learn more about the ways financial firms are using large language models like ChatGPT Enterprise to supercharge their research operations. 

  • Week 6: How to use AI yourself 

We’ll share some insights from the staff of MIT Technology Review about how you might use AI tools powered by LLMs in your own life and work.

  • Week 7: 5 ways people are getting AI right

The series ends with an on-demand virtual event featuring expert guests exploring what AI adoptions are working, and why.  

If you’re not quite ready to jump into Making AI Work, then check out Intro to AI, MIT Technology Review’s first AI newsletter mini-course, which serves as a beginner’s guide to artificial intelligence. Readers will learn the basics of what AI is, how it’s used, what the current regulatory landscape looks like, and more. Sign up to receive Intro to AI for free. 

Our hope is that Making AI Work will help you understand how AI can, well, work for you. Sign up for Making AI Work to learn how LLMs are being put to work across industries.