OpenAI is throwing everything into building a fully automated researcher

<div data-chronoton-summary="

  • A fully automated research lab: OpenAI has set a new “North Star” — building an AI system capable of tackling large, complex scientific problems entirely on its own, with a research intern prototype due by September and a full multi-agent system planned for 2028.
  • Coding agents as a proof of concept: OpenAI’s existing tool Codex, which can already handle substantial programming tasks autonomously, is the early blueprint — the bet is that if AI can solve coding problems, it can solve almost any problem formulated in text or code.
  • Serious risks with no clean answers: Chief scientist Jakub Pachocki admits that a system this powerful running with minimal human oversight raises hard questions — with risks from hacking and misuse to bioweapons — and that chain-of-thought monitoring is the best safeguard available, for now.
  • Power concentrated in very few hands: Pachocki says governments, not just OpenAI, will need to figure out where the lines are drawn.

” data-chronoton-post-id=”1134438″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. ​​OpenAI says that this new research goal will be its “North Star” for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability.

There’s even a timeline. OpenAI plans to build “an autonomous AI research intern”—a system that can take on a small number of specific research problems by itself—by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. This AI researcher (OpenAI says) will be able to tackle problems that are too large or complex for humans to cope with.

Those tasks might be related to math and physics—such as coming up with new proofs or conjectures—or life sciences like biology and chemistry, or even business and policy dilemmas. In theory, you would throw such a tool any kind of problem that can be formulated in text, code, or whiteboard scribbles—which covers a lot.

OpenAI has been setting the agenda for the AI industry for years. Its early dominance with large language models shaped the technology that hundreds of millions of people use every day. But it now faces fierce competition from rival model makers like Anthropic and Google DeepMind. What OpenAI decides to build next matters—for itself and for the future of AI.   

A big part of that decision falls to Jakub Pachocki, OpenAI’s chief scientist, who sets the company’s long-term research goals. Pachocki played key roles in the development of both GPT-4, a game-changing LLM released in 2023, and so-called reasoning models, a technology that first appeared in 2024 and now underpins all major chatbots and agent-based systems. 

In an exclusive interview this week, Pachocki talked me through OpenAI’s latest vision. “I think we are getting close to a point where we’ll have models capable of working indefinitely in a coherent way just like people do,” he says. “Of course, you still want people in charge and setting the goals. But I think we will get to a point where you kind of have a whole research lab in a data center.”

Solving hard problems

Such big claims aren’t new. Saving the world by solving its hardest problems is the stated mission of all the top AI firms. Demis Hassabis told me back in 2022 that it was why he started DeepMind. Anthropic CEO Dario Amodei says he is building the equivalent of a country of geniuses in a data center. Pachocki’s boss, Sam Altman, wants to cure cancer. But Pachocki says OpenAI now has most of what it needs to get there.

In January, OpenAI released Codex, an agent-based app that can spin up code on the fly to carry out tasks on your computer. It can analyze documents, generate charts, make you a daily digest of your inbox and social media, and much more. (Other firms have released similar tools, such as Anthropic’s Claude Code and Claude Cowork.)

OpenAI claims that most of its technical staffers now use Codex in their work. You can look at Codex as a very early version of the AI researcher, says Pachocki: “I expect Codex to get fundamentally better.”

The key is to make a system that can run for longer periods of time, with less human guidance. “What we’re really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days,” says Pachocki.

“There are a lot of people excited about building systems that can do more long-running scientific research,” says Doug Downey, a research scientist at the Allen Institute for AI, who is not connected to OpenAI. “I think it’s largely driven by the success of these coding agents. The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive. And it raises the question: Can we do similar things outside coding, in broader areas of science?”

For Pachocki, that’s a clear Yes. In fact, he thinks it’s just a matter of pushing ahead on the path we’re already on. A simple boost in all-round capability also leads to models that can work longer without help, he says. He points to the leap from 2020’s GPT-3 to 2023’s GPT-4, two of OpenAI’s previous models. GPT-4 was able to work on a problem for far longer than its predecessor, even without specialized training, he says. 

So-called reasoning models brought another bump. Training LLMs to work through problems step by step, backtracking when they make a mistake or hit a dead end, has also made models better at working for longer periods of time. And Pachocki is convinced that OpenAI’s reasoning models will continue to get better.

But OpenAI is also training its systems to work by themselves for longer by feeding them specific samples of complex tasks, such as hard puzzles taken from math and coding contests, which force the models to learn how to do things like keep track of very large chunks of text and split problems up into (and then manage) multiple subtasks.

The aim isn’t to build models that just win math competitions. “That lets you prove that the technology works before you connect it to the real world,” says Pachocki. “If we really wanted to, we could build an amazing automated mathematician. We have all the tools, and I think it would be relatively easy. But it’s not something we’re going to prioritize now because, you know, at the point where you believe you can do it, there’s much more urgent things to do.”

“We are much more focused now on research that’s relevant in the real world,” he adds.

Right now that means taking what Codex can do with coding and trying to apply that to problem-solving in general. “There’s a big change happening, especially in programming,” he says. “Our jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents.” If Codex can solve coding problems (the argument goes), it can solve any problem.

The line always goes up

It’s true that OpenAI has had a handful of remarkable successes in the last few months. Researchers have used GPT-5 (the LLM that powers Codex) to discover new solutions to a number of unsolved math problems and punch through apparent dead ends in a handful of biology, chemistry, and physics puzzles.   

“Just looking at these models coming up with ideas that would take most PhD weeks, at least, makes me expect that we’ll see much more acceleration coming from this technology in the near future,” Pachocki says.

But Pachocki admits that it’s not a done deal. He also understands why some people still have doubts about how much of a game-changer the technology really is. He thinks it depends on how people like to work and what they need to do. “I can believe some people don’t find it very useful yet,” he says.

He tells me that he didn’t even use autocomplete—the most basic version of generative coding tech—a year ago. “I’m very pedantic about my code,” he says. “I like to type it all manually in vim if I can help it.” (Vim is a text editor favored by many hardcore programmers that you interact with via dozens of keyboard shortcuts instead of a mouse.)

But that changed when he saw what the latest models could do. He still wouldn’t hand over complex design tasks, but it’s a time-saver when he just wants to try out a few ideas. “I can have it run experiments in a weekend that previously would have taken me like a week to code,” he says.

“I don’t think it is at the level where I would just let it take the reins and design the whole thing,” he adds. “But once you see it do something that would take a week to do—I mean, that’s hard to argue with.”

Pachocki’s game plan is to supercharge the existing problem-solving abilities that tools like Codex have now and apply them across the sciences.  

Downey agrees that the idea of an automated researcher is very cool: “It would be exciting if we could come back tomorrow morning and the agent’s done a bunch of work and there’s new results we can examine,” he says.

But he cautions that building such a system could be harder than Pachocki makes out. Last summer, Downey and his colleagues tested several top-tier LLMs on a range of scientific tasks. OpenAI’s latest model, GPT-5, came out on top but still made lots of errors.

“If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down,” he says. Downey admits that things move fast, and he has not tested the latest versions of GPT-5 (OpenAI released GPT-5.4 two weeks ago). “So those results might already be stale,” he says. 

Serious unanswered questions

I asked Pachocki about the risks that may come with a system that can solve large, complex problems by itself with little human oversight. Pachocki says people at OpenAI talk about those risks all the time.

“If you believe that AI is about to substantially accelerate research, including AI research, that’s a big change in the world. That’s a big thing,” he told me. “And it comes with some serious unanswered questions. If it’s so smart and capable, if it can run an entire research program, what if it does something bad?”

The way Pachocki sees it, that could happen in a number of ways. The system could go off the rails. It could get hacked. Or it could simply misunderstand its instructions.

The best technique OpenAI has right now to address these concerns is to train its reasoning models to share details about what they are doing as they work. This approach to keeping tabs on LLMs is known as chain-of-thought monitoring.

In short, LLMs are trained to jot down notes about what they are doing in a kind of scratch pad as they step through tasks. Researchers can then use those notes to make sure a model is behaving as expected. Yesterday OpenAI published new details on how it is using chain-of-thought monitoring in house to study Codex

“Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we’re really going to depend on,” says Pachocki.

The idea would be to monitor an AI researcher’s scratch pads using other LLMs and catch unwanted behavior before it’s a problem, rather than trying to stop that bad behavior from happening in the first place. LLMs are not understood well enough for us to control them fully.

“I think it’s going to be a long time before we can really be like, okay, this problem is solved,” he says. “Until you can really trust the systems, you definitely want to have restrictions in place.” Pachocki thinks that very powerful models should be deployed in sandboxes, cut off from anything they could break or use to cause harm. 

AI tools have already been used to come up with novel cyberattacks. Some worry that they will be used to design synthetic pathogens that could be used as bioweapons. You can insert any number of evil-scientist scare stories here. “I definitely think there are worrying scenarios that we can imagine,” says Pachocki. 

“It’s going to be a very weird thing. It’s extremely concentrated power that’s in some ways unprecedented,” says Pachocki. “Imagine you get to a world where you have a data center that can do all the work that OpenAI or Google can do. Things that in the past required large human organizations would now be done by a couple of people.”

“I think this is a big challenge for governments to figure out,” he adds.

And yet some people would say governments are part of the problem. The US government wants to use AI on the battlefield, for example. The recent showdown between Anthropic and the Pentagon revealed that there is little agreement across society about where we draw red lines for how this technology should and should not be used—let alone who should draw them. In the immediate aftermath of that dispute, OpenAI stepped up to sign a deal with the Pentagon instead of its rival. The situation remains murky.

I pushed Pachocki on this. Does he really trust other people to figure it out or does he, as a key architect of the future, feel personal responsibility? “I do feel personal responsibility,” he says. “But I don’t think this can be resolved by OpenAI alone, pushing its technology in a particular way or designing its products in a particular way. We’ll definitely need a lot of involvement from policymakers.”

Where does that leave us? Are we really on a path to the kind of AI Pachocki envisions? When I asked the Allen Institute’s Downey, he laughed. “I’ve been in this field for a couple of decades and I no longer trust my predictions for how near or far certain capabilities are,” he says. 

OpenAI’s stated mission is to ensure that artificial general intelligence (a hypothetical future technology that many AI boosters believe will be able to match humans on most cognitive tasks) will benefit all of humanity. OpenAI aims to do that by being the first to build it. But the only time Pachocki mentioned AGI in our conversation, he was quick to clarify what he meant by talking about “economically transformative technology” instead.

LLMs are not like human brains, he says: “They are superficially similar to people in some ways because they’re kind of mostly trained on people talking. But they’re not formed by evolution to be really efficient.” 

“Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways. I don’t think that will happen,” he adds. “But I don’t think it’s absolutely necessary. The interesting thing is you don’t need to be as smart as people in all their ways in order to be very transformative.”

The Pentagon is planning for AI companies to train on classified data, defense official says

The Pentagon is discussing plans to set up secure environments for generative AI companies to train military-specific versions of their models on classified data, MIT Technology Review has learned. 

AI models like Anthropic’s Claude are already used to answer questions in classified settings; applications include analyzing targets in Iran. But allowing models to train on and learn from classified data would be a new development that presents unique security risks. It would mean sensitive intelligence like surveillance reports or battlefield assessments could become embedded into the models themselves, and it would bring AI firms into closer contact with classified data than before. 

Training versions of AI models on classified data is expected to make them more accurate and effective in certain tasks, according to a US defense official who spoke on background with MIT Technology Review. The news comes as demand for more powerful models is high: The Pentagon has reached agreements with OpenAI and Elon Musk’s xAI to operate their models in classified settings and is implementing a new agenda to become an “an ‘AI-first’ warfighting force” as the conflict with Iran escalates. (The Pentagon did not comment on its AI training plans as of publication time.)

Training would be done in a secure data center that’s accredited to host classified government projects, and where a copy of an AI model is paired with classified data, according to two people familiar with how such operations work. Though the Department of Defense would remain the owner of the data, personnel from AI companies might in rare cases access the data if they have appropriate security clearance, the official said. 

Before allowing this new training, though, the official said, the Pentagon intends to evaluate how accurate and effective models are when trained on nonclassified data, like commercially available satellite imagery. 

The military has long used computer vision models, an older form of AI, to identify objects in images and footage it collects from drones and airplanes, and federal agencies have awarded contracts to companies to train AI models on such content. And AI companies building large language models (LLMs) and chatbots have created versions of their models fine-tuned for government work, like Anthropic’s Claude Gov, which are designed to operate across more languages and in secure environments. But the official’s comments are the first indication that AI companies building LLMs, like OpenAI and xAI, could train government-specific versions of their models directly on classified data.

Aalok Mehta, who directs the Wadhwani AI Center at the Center for Strategic and International Studies and previously led AI policy efforts at Google and OpenAI, says training on classified data, as opposed to just answering questions about it, would present new risks. 

The biggest of these, he says, is that classified information these models train on could be resurfaced to anyone using the model. That would be a problem if lots of different military departments, all with different classification levels and needs for information, were to share the same AI. 

“You can imagine, for example, a model that has access to some sort of sensitive human intelligence—like the name of an operative—leaking that information to a part of the Defense Department that isn’t supposed to have access to that information,” Mehta says. That could create a security risk for the operative, one that’s difficult to perfectly mitigate if a particular model is used by more than one group within the military.

However, Mehta says, it’s not as hard to keep information contained from the broader world: “If you set this up right, you will have very little risk of that data being surfaced on the general internet or back to OpenAI.” The government has some of the infrastructure for this already; the security giant Palantir has won sizable contracts for building a secure environment through which officials can ask AI models about classified topics without sending the information back to AI companies. But using these systems for training is still a new challenge. 

The Pentagon, spurred by a memo from Defense Secretary Pete Hegseth in January, has been racing to incorporate more AI. It has been used in combat, where generative AI has ranked lists of targets and recommended which to strike first, and in more administrative roles, like drafting contracts and reports.

There are lots of tasks currently handled by human analysts that the military might want to train leading AI models to perform and would require access to classified data, Mehta says. That could include learning to identify subtle clues in an image the way an analyst does, or connecting new information with historical context. The classified data could be pulled from the unfathomable amounts of text, audio, images, and video, in many languages, that intelligence services collect. 

It’s really hard to say which specific military tasks would require AI models to train on such data, Mehta cautions, “because obviously the Defense Department has lots of incentives to keep that information confidential, and they don’t want other countries to know what kind of capabilities we have exactly in that space.”

If you have information about the military’s use of AI, you can share it securely via Signal (username jamesodonnell.22).

Nurturing agentic AI beyond the toddler stage

Parents of young children face a lot of fears about developmental milestones, from infancy through adulthood. The number of months it takes a baby to learn to talk or walk is often used as a benchmark for wellness, or an indicator of additional tests needed to properly diagnose a potential health condition. A parent rejoices over the child’s first steps and then realizes how much has changed when the child can quickly walk outside, instead of slowly crawling in a safe area inside. Suddenly safety, including childproofing, takes a completely different lens and approach.

Generative AI hit toddlerhood between December 2025 and January 2026 with the introduction of no code tools from multiple vendors and the debut of OpenClaw, an open source personal agent posted on GitHub. No more crawling on the carpet—the generative AI tech baby broke into a sprint, and very few governance principles were operationally prepared.

The accountability challenge: It’s not them, it’s you

Until now, governance has been focused on model output risks with humans in the loop before consequential decisions were made—such as with loan approvals or job applications. Model behavior, including drift, alignment, data exfiltration, and poisoning, was the focus. The pace was set by a human prompting a model in a chatbot format with plenty of back and forth interactions between machine and human.

Today, with autonomous agents operating in complex workflows, the vision and the benefits of applied AI require significantly fewer humans in the loop. The point is to operate a business at machine pace by automating manual tasks that have clear architecture and decision rules. The goal, from a liability standpoint, is no reduction in enterprise or business risk between a machine operating a workflow and a human operating a workflow. CX Today summarizes the situation succinctly: “AI does the work, humans own the risk,” and   California state law (AB 316), went into effect January 1, 2026, which removes the “AI did it; I didn’t approve it” excuse.  This is similar to parenting when an adult is held responsible for a child’s actions that negatively impacts the larger community.

The challenge is that without building in code that enforces operational governance aligned to different levels of risk and liability along the entire workflow, the benefit of autonomous AI agents is negated. In the past, governance had been static and aligned to the pace of interaction typical for a chatbot. However, autonomous AI by design removes humans from many decisions, which can affect governance.  

Considering permissions

Much like handing a three-year-old child a video game console that remotely controls an Abrams tank or an armed drone, leaving a probabilistic system operating without real-time guardrails that can change critical enterprise data carries significant risks.  For instance, agents that integrate and chain actions across multiple corporate systems can drift beyond privileges that a single human user would be granted. To move forward successfully, governance must shift beyond policy set by committees to operational code built into the workflows from the start.  

A humorous meme around the behavior of toddlers with toys starts with all the reasons that whatever toy you have is mine and ends with a broken toy that is definitely yours.  For example, OpenClaw delivered a user experience closer to working with a human assistant;, but the excitement shifted as security experts realized inexperienced users could be easily compromised by using it.

For decades, enterprise IT has lived with shadow IT and the reality that skilled technical teams must take over and clean up assets they did not architect or install, much like the toddler giving back a broken toy. With autonomous agents, the risks are larger: persistent service account credentials, long-lived API tokens, and permissions to make decisions over core file systems. To meet this challenge, it’s imperative to allocate upfront appropriate IT budget and labor to sustain central discovery, oversight, and remediation for the thousands of employee or department-created agents.

Having a retirement plan

Recently, an acquaintance mentioned that she saved a client hundreds of thousands of dollars by identifying and then ending a “zombie project” —a neglected or failed AI pilot left running on a GPU cloud instance. There are potentially thousands of agents that risk becoming a zombie fleet inside a business. Today, many executives encourage employees to use AI—or else—and employees are told to create their own AI-first workflows or AI assistants. With the utility of something like OpenClaw and top-down directives, it is easy to project that the number of build-my-own agents coming to the office with their human employee will explode. Since an AI agent is a program that would fall under the definition of company-owned IP, as a employee changes departments or companies, those agents may be orphaned. There needs to be proactive policy and governance to decommission and retire any agents linked to a specific employee ID and permissions.

Financial optimization is governance out of the gate

While for some executives, autonomous AI sounds like a way to improve their operating margins by limiting human capital, many are finding that the ROI for human labor replacement is the wrong angle to take. Adding AI capabilities to the enterprise does not mean purchasing a new software tool with predictable instance-per-hour or per-seat pricing. A December 2025 IDC survey sponsored by Data Robot indicated that 96% of organizations deploying generative AI and 92% of those implementing agentic AI reported costs were higher or much higher than expected.

The survey separates the concepts of governance and ROI, but as AI systems scale across large enterprises, financial and liability governance should be architected into the workflows from the beginning. Part of enterprise class governance stems from predicting and adhering to allocated budgeting. Unlike the software financial models of per-seat costs with support and maintenance fees, use of AI is consumption and usage costs scale as the workflow scales across the enterprise: the more users, the more tokens or the more compute time, and the higher the bill. Think of it as a tab left open, or an online retailer’s digital shopping cart button unlocked on a toddler’s electronic game device.

Cloud FinOps was deterministic, but generative AI and agentic AI systems built on generative AI are probabilistic. Some AI-first founders are realizing that a single agents’ token costs can be as high as $100,000 per session. Without guardrails built in from the start, chaining complex autonomous agents that run unsupervised for long periods of time can easily blow past the budget for hiring a junior developer.

Keeping humans in the loop remains critical

The promise of autonomous agentic AI is acceleration of business operations, product introductions, customer experience, and customer retention. Shifting to machine-speed decisions without humans in and or on the loop for these key functions significantly changes the governance landscape. While many of the principles around proactive permissions, discovery, audit, remediation, and financial operations/optimizations are the same, how they are executed has to shift to keep pace with autonomous agentic AI.

This content was produced by Intel. It was not written by MIT Technology Review’s editorial staff.

Where OpenAI’s technology could show up in Iran

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

It’s been just over two weeks since OpenAI reached a controversial agreement to allow the Pentagon to use its AI in classified environments. There are still pressing questions about what exactly OpenAI’s agreement allows for; Sam Altman said the military can’t use his company’s technology to build autonomous weapons, but the agreement really just demands that the military follow its own (quite permissive) guidelines about such weapons. OpenAI’s other main claim, that the agreement will prevent use of its technology for domestic surveillance, appears equally dubious.

It’s unclear what OpenAI’s motivations are. It’s not the first tech giant to embrace military contracts it had once vowed never to enter into, but the speed of the pivot was notable. Perhaps it’s just about money; OpenAI is spending lots on AI training and is on the hunt for more revenue (from sources including ads). Or perhaps Altman truly believes the ideological framing he often invokes: that liberal democracies (and their militaries) must have access to the most powerful AI to compete with China.

The more consequential question is what happens next. OpenAI has decided it is comfortable operating right in the messy heart of combat, just as the US escalates its strikes against Iran (with AI playing a larger role in that than ever before). So where exactly could OpenAI’s tech show up in this fight? And which applications will its customers (and employees) tolerate?

Targets and strikes

Though its Pentagon agreement is in place, it’s unclear when OpenAI’s technology will be ready for classified environments, since it must be integrated with other tools the military uses (Elon Musk’s xAI, which recently struck its own deal with the Pentagon, is expected to go through the same process with its AI model Grok). But there’s pressure to do this quickly because of controversy around the technology in use to date: After Anthropic refused to allow its AI to be used for “any lawful use,” President Trump ordered the military to stop using it, and Anthropic was designated a supply chain risk by the Pentagon. (Anthropic is fighting the designation in court.)

If the Iran conflict is still underway by the time OpenAI’s tech is in the system, what could it be used for? A recent conversation I had with a defense official suggests it might look something like this: A human analyst could put a list of potential targets into the AI model and ask it to analyze the information and prioritize which to strike first. The model could account for logistics information, like where particular planes or supplies are located. It could analyze lots of different inputs in the form of text, image, and video. 

A human would then be responsible for manually checking these outputs, the official said. But that raises an obvious question: If a person is truly double-checking AI’s outputs, how is it speeding up targeting and strike decisions?

For years the military has been using another AI system, called Maven, which can handle things like automatically analyzing drone footage to identify possible targets. It’s likely that OpenAI’s models, like Anthropic’s Claude, will offer a conversational interface on top of that, allowing users to ask for interpretations of intelligence and recommendations for which targets to strike first. 

It’s hard to overstate how new this is: AI has long done analysis for the military, drawing insights out of oceans of data. But using generative AI’s advice about which actions to take in the field is being tested in earnest for the first time in Iran.

Drone defense

At the end of 2024, OpenAI announced a partnership with Anduril, which makes both drones and counter-drone technologies for the military. The agreement said OpenAI would work with Anduril to do time-sensitive analysis of drones attacking US forces and help take them down. An OpenAI spokesperson told me at the time that this didn’t violate the company’s policies, which prohibited “systems designed to harm others,” because the technology was being used to target drones and not people. 

Anduril provides a suite of counter-drone technologies to military bases around the world (though the company declined to tell me whether its systems are deployed near Iran). Neither company has provided updates on how the project has developed since it was announced. However, Anduril has long trained its own AI models to analyze camera footage and sensor data to identify threats; what it focuses less on are conversational AI systems that allow soldiers to query those systems directly or receive guidance in natural language—an area where OpenAI’s models may fit.

The stakes are high. Six US service members were killed in Kuwait on March 1 following an Iranian drone attack that was not intercepted by US air defenses. 

Anduril’s interface, called Lattice, is where soldiers can control everything from drone defenses to missiles and autonomous submarines. And the company is winning massive contracts—$20 billion from the US Army just last week—to connect its systems with legacy military equipment and layer AI on them. If OpenAI’s models prove useful to Anduril, Lattice is designed to incorporate them quickly across this broader warfare stack. 

Back-office AI

In December, Defense Secretary Pete Hegseth started encouraging millions of people in more administrative roles in the military—contracts, logistics, purchasing—to use a new AI tool. Called GenAI.mil, it provided a way for personnel to securely access commercial AI models and use them for the same sorts of things as anyone in the business world. 

Google Gemini was one of the first to be available. In January, the Pentagon announced that xAI’s Grok was going to be added to the GenAI.mil platform as well, despite incidents in which the model had spread antisemitic content and created nonconsensual deepfakes. OpenAI followed in February, with the company announcing that its models would be used for drafting policy documents and contracts and assisting with administrative support of missions.

Anyone using ChatGPT for unclassified tasks on this platform is unlikely to have much sway over sensitive decisions in Iran, but the prospect of OpenAI deploying on the platform is important in another way. It serves the all-in attitude toward AI that Hegseth has been pushing relentlessly across the Pentagon (even if many early users aren’t entirely sure what they’re supposed to use it for). The message is that AI is transforming every aspect of how the US fights, from targeting decisions down to paperwork. And OpenAI is increasingly winning a piece of it all.

Why physical AI is becoming manufacturing’s next advantage

For decades, manufacturers have pursued automation to drive efficiency, reduce costs, and stabilize operations. That approach delivered meaningful gains, but it is no longer enough.

Today’s manufacturing leaders face a different challenge: how to grow amid labor constraints, rising complexity, and increasing pressure to innovate faster without sacrificing safety, quality, or trust. The next phase of transformation will not be defined by isolated AI tools or individual robots, but by intelligence that can operate reliably in the physical world.

This is where physical AI—intelligence that can sense, reason, and act in the real world—marks a decisive shift. And it is why Microsoft and NVIDIA are working together to help manufacturers move from experimentation to production at industrial scale.

The industrial frontier: Intelligence and trust, not just automation

Most early AI adoption focused on narrow optimization: automating tasks, improving utilization, and cutting costs. While valuable, that phase often created new friction, including skills gaps, governance concerns, and uncertainty about long‑term impact. Furthermore, the use cases were plentiful but not as strategic.

The industrial frontier represents a different approach. Rather than asking how much work machines can replace, frontier manufacturers ask how AI can expand human capability, accelerate innovation, and unlock new forms of value while remaining trustworthy and controllable.

Across industries, companies that successfully move into this frontier phase share two non‑negotiables:

  • Intelligence: AI systems must understand how the business actually handles its data, workflows, and institutional knowledge.
  • Trust: As AI begins to act in high‑stakes environments, organizations must retain security, governance, and observability at every layer.

Without intelligence, AI becomes generic. Without trust, adoption stalls.

Why manufacturing is the proving ground for physical AI

Manufacturing is uniquely positioned at the center of this shift.

AI is no longer confined to planning or analytics. It is moving into physical execution: coordinating machines, adapting to real‑world variability, and working alongside people on the factory floor. Robotics, autonomous systems, and AI agents must now perceive, reason, and act in dynamic environments.

This transition exposes a critical gap. Traditional automation excels at repetition but struggles with adaptability. Human workers bring judgment and context but are constrained by scale. Physical AI closes that gap by enabling human‑led, AI‑operated systems, where people set intent and intelligent systems execute, learn, and improve over time. Humans are essential for scaled success.

Microsoft and NVIDIA: Accelerating physical AI at scale

Physical AI cannot be delivered through point solutions. It requires agentic-driven, enterprise-grade development, deployment, and operations toolchains and workflows that connect simulation, data, AI models, robotics, and governance into a coherent system.

NVIDIA is building the AI infrastructure that makes physical AI possible, including accelerated computing, open models, simulation libraries, and robotics frameworks and blueprints that enable the ecosystem to build autonomous robotics systems that can perceive, reason, plan, and take action in the physical world. Microsoft complements this with a cloud and data platform designed to operate physical AI securely, at scale, and across the enterprise.

Together, Microsoft and NVIDIA are enabling manufacturers to move beyond pilots toward production‑ready physical AI systems that can be developed, tested, deployed, and continuously improved across heterogeneous environments spanning the product lifecycle, factory operations, and supply chain.

From intelligence to action: Human-agent teams in the factory

At the industrial frontier, AI is not a standalone system, but a digital teammate.

When AI agents are grounded in the proper operational data, embedded in human workflows, and governed end to end, they can assist with tasks such as:

  • Optimizing production lines in real time
  • Coordinating maintenance and quality decisions
  • Adapting operations to supply or demand disruptions
  • Accelerating engineering and product lifecycle decisions

For example, manufacturers are beginning to use simulation‑grounded AI agents to evaluate production changes virtually before deploying them on the factory floor, reducing risk while accelerating decision‑making.

Crucially, frontier manufacturers design these systems so humans remain in control. AI executes, monitors, and recommends, while people provide intent, oversight, and judgment. This balance allows organizations to move faster without losing confidence or control.

The role of trust in scaling physical AI

As physical AI systems scale, trust becomes the limiting factor.

Manufacturers must ensure that AI systems are secure, observable, and operating within policy, especially when they influence safety‑critical or mission‑critical processes. Governance cannot be an afterthought; It must be engineered into the platform itself.

This is why frontier manufacturers treat trust as a first‑class requirement, pairing innovation with visibility, compliance, and accountability. Only then can physical AI move from promising demonstrations to enterprise‑wide deployment.

Why this moment matters—and what’s next

The convergence of AI agents, robotics, simulation, and real‑time data marks an inflection point for manufacturing. What was once experimental is becoming operational. What was once siloed is becoming connected.

At NVIDIA GTC 2026, Microsoft and NVIDIA will demonstrate how this collaboration supports physical AI systems that manufacturers can deploy today and scale responsibly tomorrow. From simulation‑driven development to real‑world execution, the focus is on helping manufacturers cross the industrial frontier with confidence.

For manufacturing leaders, the question is no longer whether physical AI will reshape operations, but how quickly they can adopt it responsibly, at scale, and with trust built in from the start.

Discover more with Microsoft at NVIDIA GTC 2026.

This content was produced by Microsoft. It was not written by MIT Technology Review’s editorial staff.

Building a strong data infrastructure for AI agent success

In the race to adopt and show value from AI, enterprises are moving faster than ever to deploy agentic AI as copilots, assistants, and autonomous task-runners. In late 2025, nearly two-thirds of companies were experimenting with AI agents, while 88% were using AI in at least one business function, up from 78% in 2024, according to McKinsey’s annual AI report. Yet, while early pilots often succeed, only one in 10 companies actually scaled their AI agents.

One major issue: AI agents are only as effective as the data foundation supporting them. Experts argue that most companies are seeing delays in implementing AI, not because of shortcomings in the models, but because they lack data architectures that deliver business context to be reliably used by humans and agents.

Companies need to be ready with the right data architecture, and the next few months — years, at most — will be critical, says Irfan Khan, president and chief product officer of SAP Data & Analytics.

“The only prediction anybody can reliably make is that we don’t know what’s going to happen in the years, months — or even weeks — ahead with AI,” he says. “To be able to get quick wins right now, you need to adopt an AI mindset and … ground your AI models with reliable data.”

While data has always been important for business, it will be even more so in the age of AI. The capabilities of agentic AI will be set more by the soundness of enterprise data architecture and governance, and less by the evolution of the models. To scale the technology, businesses need to adopt a modern data infrastructure that delivers context along with the data.

More business context, not necessarily more data

Traditional views often conflate structured data with high value, and unstructured data with less value. However, AI complicates that distinction. High-value data for agents is defined less by format and more by business context. Data for critical business functions — such as supply-chain operations and financial planning — is context dependent. While fine-grained, high-volume data, such as IoT, logs, and telemetry, can yield value, but only when delivered with business context.

For that reason, the real risk for agentic AI is not lack of data, but lack of grounding, says Khan.

“Anything that is business contextual will, by definition, give you greater value and greater levels of reliability of the business outcome,” he says. “It’s not as simple as saying high-value data is structured data and low-value data is where you have lots of repetition — both can have huge value in the right hands, and that’s what’s different about AI.”

Context can be derived through integration with software, on-site analysis and enrichment, or through the governance pipeline. Data lacking those qualities will likely be untrusted — one reason why two-thirds of business leaders do not fully trust their data, according to the Institute for Data and Enterprise AI (IDEA). The resulting “trust debt” has held back businesses in their quest for AI readiness. Overcoming that lack of trust requires shared definitions, semantic consistency, and reliable operational context to align data with business meaning.

Data sprawl demands a semantic, business-aware layer

Over the past decade, the most important shift in enterprise data architecture has been the separation of compute and storage, cloud-scale flexibility, says Khan. Yet, that separation and move to cloud also created sprawl, with data housed in multiple clouds, data lakes, warehouses, and a multitude of SaaS applications.

As companies move to AI, that sprawl does not go away. In fact, the problem is growing with more than two-thirds of companies citing data siloes as a top challenge in adopting AI, with more than half of enterprises struggling with 1,000 data sources or more. While the last era was about laying the foundation on which to build software-as-a-service — separating compute and storage and building lakes — the next era is about delivering the right data to autonomous AI agents tasked with various business functions.

“Probably the biggest innovation that occurred in data management was the separation of compute and store,” Khan says. “But what’s really making a distinction now is the way that we harmonize the data and harvest the value of the data across multiple sources of content.”

To do that requires a semantic or knowledge layer that supports multiple platforms, encodes business rules and relationships, provides a business-contextual and governed view of data, and allows humans and agents to access the data in the appropriate ways. But legacy data architectures cannot power the autonomous AI systems of the future, consultancy Deloitte stated in its State of AI in the Enterprise report. Only four in 10 companies believe their data management process is ready for AI, and that’s down from 43% the previous year, suggesting that as companies explore AI deployment, they are realizing their infrastructure’s shortcomings.

Agentic AI does not replace SaaS

Some investors and technologists speculate that AI agents will make SaaS applications obsolete. Khan strongly disagrees. Over the past 15 years, value has steadily moved up the stack, from on-premises infrastructure to infrastructure as a service (IaaS) to platform as a service (PaaS) to SaaS. Agentic AI is simply the next layer. Agentic AI will have its own layer to access the data and interact with the business logic. The value rises up the stack, but nothing below disappears, he says.

“SaaS doesn’t go away,” he says. “It just means SaaS and these agents will cooperate with one another. Companies are not going to throw away their entire general ledger and replace it with an agent. What’s the agent going to do? It doesn’t know anything without business context and business processing.”

In this emerging model, the software stack is being reshaped so that applications and data provide governed context within which AI can act effectively. SaaS applications remain the systems of record, while the semantic layer becomes the business-context source of truth. AI agents become a new engagement layer, orchestrating across systems, and both humans and agents become “first-class citizens” in how they access business logic, he says.

Critically, agents cannot directly connect to every operational system. “If we’re saying agents are going to take over the world … you can’t have an agent talking to every operational backend system,” Khan warns. “It just doesn’t work that way.”

This further elevates the importance of a semantic or business-fabric layer.

Where to start

Most enterprises need to begin where their data already lives — in platforms like Snowflake, Databricks, Google BigQuery, or an existing SAP environment. Khan says that’s normal, but warns against rebuilding old patterns of vendor lock-in.

He suggests that companies prioritize the data that matters most by focusing on preserving and providing business context to operational and application data. Companies should also invest early in governance and semantics by defining shared policies, access rules, and semantic models before scaling pilots. Finally, businesses should prioritize openness and fabric-style interoperability rather than forcing all data into one stack.

Khan cautions against aiming for full automation too early. “There is a new brave opportunity to really engage in the agentic and AI world,” Khan says, “Fully automating [critical business processes] is maybe a stretch, because there’s going to be a lot of extra oversight necessary.” Early wins will likely come from less-critical processes and from agents that work off fresh, stateful data rather than stale dashboards, he adds. As AI begins to deliver value and adoption increases, leaders must decide how to reinvest those gains to drive top-line efficiency or enter new markets.

Register for “The Fabric of Data & AI” virtual event on March 24, 2026. Hear insights from executives and thought leaders who are shaping the future of data and AI.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Pragmatic by design: Engineering AI for the real world

The impact of artificial intelligence extends far beyond the digital world and into our everyday lives, across the cars we drive, the appliances in our homes, and medical devices that keep people alive. More and more, product engineers are turning to AI to enhance, validate, and streamline the design of the items that furnish our worlds.

The use of AI in product engineering follows a disciplined and pragmatic trajectory. A significant majority of engineering organizations are increasing their AI investment, according to our survey, but they are doing so in a measured way. This approach reflects the priorities typical of product engineers. Errors have concrete consequences beyond abstract fears, ranging from structural failures to safety recalls and even potentially putting lives at risk. The central challenge is realizing AI’s value without compromising product integrity.

Drawing on data from a survey of 300 respondents and in-depth interviews with senior technology executives and other experts, this report examines how product engineering teams are scaling AI, what is limiting broader adoption, and which specific capabilities are shaping adoption today and, in the future, with actual or potential measurable outcomes.

Key findings from the research include:

Verification, governance, and explicit human accountability are mandatory in an environment where the outputs are physical—and the risk high. Where product engineers are using AI to directly inform physical designs, embedded systems, and manufacturing decisions that are fixed at release, product failures can lead to real-world risks that cannot be rolled back. Product engineers are therefore adopting layered AI systems with distinct trust thresholds instead of general-purpose deployments.

Predictive analytics and AI-powered simulation and validation are the top near-term investment priorities for product engineering leaders. These capabilities—selected by a majority of survey respondents—offer clear feedback loops, allowing companies to audit performance, attain regulatory approval, and prove return on investment (ROI). Building gradual trust in AI tools is imperative.

Nine in ten product engineering leaders plan to increase investment in AI in the next one to two years, but the growth is modest. The highest proportion of respondents (45%) plan to increase investment by up to 25%, while nearly a third favor a 26% to 50% boost. And just 15% plan a bigger step change—between 51% and 100%. The focus for product engineers is on optimization over innovation, with scalable proof points and near-term ROI the dominant approach to AI adoption, as opposed to multi-year transformation.

Sustainability and product quality are top measurable outcomes for AI in product engineering. These outcomes, visible to customers, regulators, and investors, are prioritized over competitive metrics like time to-market and innovation—rated of medium importance—and internal operational gains like cost reduction and workforce satisfaction, at the bottom. What matters most are real-world signals like defect rates and emissions profiles rather than internal engineering dashboards.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

A defense official reveals how AI chatbots could be used for targeting decisions

The US military might use generative AI systems to rank lists of targets and make recommendations—which would be vetted by humans—about which to strike first, according to a Defense Department official with knowledge of the matter. The disclosure about how the military may use AI chatbots comes as the Pentagon faces scrutiny over a strike on an Iranian school, which it is still investigating.  

A list of possible targets might be fed into a generative AI system that the Pentagon is fielding for classified settings. Then, said the official, who requested to speak on background with MIT Technology Review to discuss sensitive topics, humans might ask the system to analyze the information and prioritize the targets while accounting for factors like where aircraft are currently located. Humans would then be responsible for checking and evaluating the results and recommendations. OpenAI’s ChatGPT and xAI’s Grok could, in theory, be the models used for this type of scenario in the future, as both companies recently reached agreements for their models to be used by the Pentagon in classified settings.

The official described this as an example of how things might work but would not confirm or deny whether it represents how AI systems are currently being used.

Other outlets have reported that Anthropic’s Claude has been integrated into existing military AI systems and used in operations in Iran and Venezuela, but the official’s comments add insight into the specific role chatbots may play, particularly in accelerating the search for targets. They also shed light on the way the military is deploying two different AI technologies, each with distinct limitations.

Since at least 2017, the US military has been working on a “big data” initiative called Maven. It uses older types of AI, particularly computer vision, to analyze the oceans of data and imagery collected by the Pentagon. Maven might take thousands of hours of aerial drone footage, for example, and algorithmically identify targets. A 2024 report from Georgetown University showed soldiers using the system to select targets and vet them, which sped up the process to get approval for these targets. Soldiers interacted with Maven through an interface with a battlefield map and dashboard, which might highlight potential targets in one color and friendly forces in another.

The official’s comments suggest that generative AI is now being added as a conversational chatbot layer—one the military may use to find and analyze data more quickly as it makes decisions like which targets to prioritize. 

Generative AI systems, like those that underpin ChatGPT, Claude, and Grok, are a fundamentally different technology from the AI that has primarily powered Maven. Built on large language models, they are much less battle-tested. And while Maven’s interface forced users to directly inspect and interpret data on the map, the outputs produced by generative AI models are easier to access but harder to verify. 

The use of generative AI for such decisions is reducing the time required in the targeting process, added the official, who did not provide details when asked how much additional speed is possible if humans are required to spend time double-checking a model’s outputs.

The use of military AI systems is under increased public scrutiny following the recent strike on a girls’ school in Iran in which more than 100 children died. Multiple news outlets have reported that the strike was from a US missile, though the Pentagon has said it is still under investigation. And while the Washington Post has reported that Claude and Maven have been involved in targeting decisions in Iran, there is no evidence yet to explain what role generative AI systems played, if any. The New York Times reported on Wednesday that a preliminary investigation found outdated targeting data to be partly responsible for the strike. 

The Pentagon has been ramping up its use of AI across operations in recent months. It started offering nonclassified use of generative AI models, for tasks like analyzing contracts or writing presentations, to millions of service members back in December through an effort called GenAI.mil. But only a few generative AI models have been approved by the Pentagon for classified use. 

The first was Anthropic’s Claude, which in addition to its use in Iran was reportedly used in the operations to capture Venezuelan leader Nicolas Maduro in January. But following recent disagreements between the Pentagon and Anthropic over whether Anthropic could restrict the military’s use of its AI, the Defense Department designated the company a supply chain risk and President Trump demanded on social media that the government stop using its AI products within six months. Anthropic is fighting the designation in court. 

OpenAI announced an agreement on February 28 for the military to use its technologies in classified settings. Elon Musk’s company xAI has also reached a deal for the Pentagon to use its model Grok in such settings. OpenAI has said its agreement with the Pentagon came with limitations, though the practical effectiveness of those limitations is not clear. 

If you have information about the military’s use of AI, you can share it securely via Signal (username jamesodonnell.22).

Hustlers are cashing in on China’s OpenClaw AI craze

Feng Qingyang had always hoped to launch his own company, but he never thought this would be how—or that the day would come this fast. 

Feng, a 27-year-old software engineer based in Beijing, started tinkering with OpenClaw, a popular new open-source AI tool that can take over a device and autonomously complete tasks for a user,  in January. He was immediately hooked, and before long he was helping other curious tech workers with less technical proficiency install the AI agent.

Feng soon realized this could be a lucrative opportunity. By the end of January, he had set up a page on Xianyu, a secondhand shopping site, advertising “OpenClaw installation support.” “No need to know coding or complex terms. Fully remote,” reads the posting. “Anyone can quickly own an AI assistant, available within 30 minutes.” 

At the same time, the broader Chinese public was beginning to catch on—and the tool, which had begun as a niche interest among tech workers, started to evolve into a popular sensation.

Feng quickly became inundated with requests, and he started chatting with customers and managing orders late into the night. At the end of February, he quit his job. Now his side gig has now grown into a full-fledged professional operation with over 100 employees. So far, the store has handled 7,000 orders, each worth about 248 RMB or approximately $34. 

“Opportunities are always fleeting,” says Feng. “As programmers, we are the first to feel the winds shift.”

Feng is among a small cohort of savvy early adopters turning China’s OpenClaw craze into cash. As users with little technical background want in, a cottage industry of people offering installation services and preconfigured hardware has sprung up to meet them. The sudden rise of these tinkerers and impromptu consultants shows just how eager the general public in China is to adopt cutting-edge AI—even when there are huge security risks

A “lobster craze”

“Have you raised a lobster yet?” 

Xie Manrui, a 36-year-old software engineer in Shenzhen, says he has heard this question nonstop over the past month. “Lobster” is the nickname Chinese users have given to OpenClaw—a reference to its logo.

Xie, like Feng, has been experimenting with OpenClaw since January. He’s built new open-source tools on top of the ecosystem, including one that visualizes the agent’s progress as an animated little desktop worker and another that lets users voice-chat with it. 

“I’ve met so many new people through ‘lobster raising,’” says Xie. “Many are lawyers or doctors, with little technical background, but all dedicated to learning new things.”

Lobsters are indeed popping up everywhere in China right now—on and offline. In February, for instance, the entrepreneur and tech influencer Fu Sheng hosted a livestream showing off OpenClaw’s capabilities that got 20,000 views. And just last weekend, Xie attended three different OpenClaw events in Shenzhen, each drawing more than 500 people. These self-organized, unofficial gatherings feature power users, influencers, and sometimes venture capitalists as speakers. The biggest event Xie attended, on March 7, drew more than 1,000 people; in the packed venue, he says, people were shoulder to shoulder, with many attendees unable to even get a seat.

Now China’s AI giants are starting to piggyback on the trend too, promoting their models, APIs,  and cloud services (which can be used with OpenClaw), as well as their own OpenClaw-like agents. Earlier this month, Tencent held a public event offering free installation support for OpenClaw, drawing long lines of people waiting for help, including elderly users and children.

This sudden burst in popularity has even prompted local governments to get involved. Earlier this month the government of Longgang, a district in Shenzhen, released several policies to support OpenClaw-related ventures, including free computing credits and cash rewards for standout projects. Other cities, including Wuxi, have begun rolling out similar measures.

These policies only catalyze what’s already in the air. “It was not until my father, who is 77, asked me to help install a ‘lobster’ for him that I realized this thing is truly viral,” says Henry Li, a software engineer based in Beijing. 

A programmer gold rush

What’s making this moment particularly lucrative for people with technical skills, like Feng, is that so many people want OpenClaw, but not nearly as many have the capabilities to access it. Setting it up requires a level of technical knowledge most people do not possess, from typing commands into a black terminal window to navigating unfamiliar developer platforms. On the hardware side, an older or budget laptop may struggle to run it smoothly. And if the tool is not installed on a device separate from someone’s everyday computer, or if the data accessible to OpenClaw is not properly partitioned, the user’s privacy could be at risk—opening the door to data leaks and even malicious attacks. 

Chris Zhao, known as “Qi Shifu” online, organizes OpenClaw social media groups and events in Beijing. On apps like Rednote and Jike, Zhao routinely shares his thoughts on AI, and he asks other interested users to leave their WeChat ID so he can invite them to a semi-private group chat. The proof required to join is a screenshot that shows your “lobster” up and running. Zhao says that even in group chats for experienced users, hardware and cloud setup remain a constant topic of discussion.

The relatively high bar for setting up OpenClaw has generated a sense of exclusivity, creating a natural opening for a service industry to start unfolding around it. On Chinese e-commerce platforms like Taobao and JD, a simple search for “OpenClaw” now returns hundreds of listings, most of them installation guides and technical support packages aimed at nontechnical users, priced anywhere from 100 to 700 RMB (approximately $15 to $100). At the higher end, many vendors offer to come to help you in person. 

Like Feng, most providers of these services are early adopters with some technical ability who are looking for a side gig. But as demand has surged, some have found themselves overwhelmed. Xie, the developer in Shenzhen who created tools to layer on OpenClaw, was asked by a friend who runs one such business to help out over the weekend; the friend had a customer who worked in e-commerce and had little technical experience, so Xie had to show up in person to get it done. He walked away with 600 RMB ($87) for the afternoon.

The growing demand has also pushed vendors like Feng to expand quickly. He has now standardized his operation into tiers: a basic installation, a custom package where users can make specific requests like configuring a preferred chat app, and an ongoing tutoring service for those who want a hand to hold as they find their footing with the technology.

Other vendors in China are making money combining OpenClaw with hardware. Li Gong, a Shenzhen-based seller of refurbished Mac computers, was among the first online sellers to do this—offering Mac minis and MacBooks with OpenClaw preinstalled. Because OpenClaw is designed to operate with deep access to a hard drive and can run continuously in the background unattended, many users prefer to install it on a separate device rather than on the one they use every day. This would help prevent bad actors from infiltrating the program and immediately gaining access to a wide swathe of someone’s personal information. Many turn to secondhand or refurbished options to keep the cost down. Li says that in the last two weeks, orders have increased eightfold.

Though OpenClaw itself is a new technology, the general practice of buying software bundles, downloading third-party packages, and seeking out modified devices is nothing new for many Chinese internet users, says Tianyu Fang, a PhD candidate studying the history of technology at Harvard University. Many users pay for one-off IT support services for tasks from installing Adobe software to jailbreaking a Kindle.

Still, not everyone is getting swept up. Jiang Yunhui, a tech worker based in Ningbo, worries that ordinary users who struggle with setup may not be the right audience for a technology that is still effectively in testing. 

“The hype in first-tier cities can be a little overblown,” he says. “The agent is still a proof of concept, and I doubt it would be of any life-changing use to the average person for now.” He argues that using it safely and getting anything meaningful out of it requires a level of technical fluency and independent judgment that most new users simply don’t have yet.

He’s not alone in his concerns. On March 10, the Chinese cybersecurity regulator CNCERT issued a warning about the security and data risks tied to OpenClaw, saying it heightens users’ exposure to data breaches.

Despite the potential pitfalls, though, China’s enthusiasm for OpenClaw doesn’t seem to be slowing.

Feng, now flush with the earnings from his operation, wants to use the momentum—and the capital—to keep building out his own venture with AI tools at the center of it.

“With OpenClaw and other AI agents, I want to see if I can run a one-person company,” he says. “I’m giving myself one year.”