Why physical AI is becoming manufacturing’s next advantage

For decades, manufacturers have pursued automation to drive efficiency, reduce costs, and stabilize operations. That approach delivered meaningful gains, but it is no longer enough.

Today’s manufacturing leaders face a different challenge: how to grow amid labor constraints, rising complexity, and increasing pressure to innovate faster without sacrificing safety, quality, or trust. The next phase of transformation will not be defined by isolated AI tools or individual robots, but by intelligence that can operate reliably in the physical world.

This is where physical AI—intelligence that can sense, reason, and act in the real world—marks a decisive shift. And it is why Microsoft and NVIDIA are working together to help manufacturers move from experimentation to production at industrial scale.

The industrial frontier: Intelligence and trust, not just automation

Most early AI adoption focused on narrow optimization: automating tasks, improving utilization, and cutting costs. While valuable, that phase often created new friction, including skills gaps, governance concerns, and uncertainty about long‑term impact. Furthermore, the use cases were plentiful but not as strategic.

The industrial frontier represents a different approach. Rather than asking how much work machines can replace, frontier manufacturers ask how AI can expand human capability, accelerate innovation, and unlock new forms of value while remaining trustworthy and controllable.

Across industries, companies that successfully move into this frontier phase share two non‑negotiables:

  • Intelligence: AI systems must understand how the business actually handles its data, workflows, and institutional knowledge.
  • Trust: As AI begins to act in high‑stakes environments, organizations must retain security, governance, and observability at every layer.

Without intelligence, AI becomes generic. Without trust, adoption stalls.

Why manufacturing is the proving ground for physical AI

Manufacturing is uniquely positioned at the center of this shift.

AI is no longer confined to planning or analytics. It is moving into physical execution: coordinating machines, adapting to real‑world variability, and working alongside people on the factory floor. Robotics, autonomous systems, and AI agents must now perceive, reason, and act in dynamic environments.

This transition exposes a critical gap. Traditional automation excels at repetition but struggles with adaptability. Human workers bring judgment and context but are constrained by scale. Physical AI closes that gap by enabling human‑led, AI‑operated systems, where people set intent and intelligent systems execute, learn, and improve over time. Humans are essential for scaled success.

Microsoft and NVIDIA: Accelerating physical AI at scale

Physical AI cannot be delivered through point solutions. It requires agentic-driven, enterprise-grade development, deployment, and operations toolchains and workflows that connect simulation, data, AI models, robotics, and governance into a coherent system.

NVIDIA is building the AI infrastructure that makes physical AI possible, including accelerated computing, open models, simulation libraries, and robotics frameworks and blueprints that enable the ecosystem to build autonomous robotics systems that can perceive, reason, plan, and take action in the physical world. Microsoft complements this with a cloud and data platform designed to operate physical AI securely, at scale, and across the enterprise.

Together, Microsoft and NVIDIA are enabling manufacturers to move beyond pilots toward production‑ready physical AI systems that can be developed, tested, deployed, and continuously improved across heterogeneous environments spanning the product lifecycle, factory operations, and supply chain.

From intelligence to action: Human-agent teams in the factory

At the industrial frontier, AI is not a standalone system, but a digital teammate.

When AI agents are grounded in the proper operational data, embedded in human workflows, and governed end to end, they can assist with tasks such as:

  • Optimizing production lines in real time
  • Coordinating maintenance and quality decisions
  • Adapting operations to supply or demand disruptions
  • Accelerating engineering and product lifecycle decisions

For example, manufacturers are beginning to use simulation‑grounded AI agents to evaluate production changes virtually before deploying them on the factory floor, reducing risk while accelerating decision‑making.

Crucially, frontier manufacturers design these systems so humans remain in control. AI executes, monitors, and recommends, while people provide intent, oversight, and judgment. This balance allows organizations to move faster without losing confidence or control.

The role of trust in scaling physical AI

As physical AI systems scale, trust becomes the limiting factor.

Manufacturers must ensure that AI systems are secure, observable, and operating within policy, especially when they influence safety‑critical or mission‑critical processes. Governance cannot be an afterthought; It must be engineered into the platform itself.

This is why frontier manufacturers treat trust as a first‑class requirement, pairing innovation with visibility, compliance, and accountability. Only then can physical AI move from promising demonstrations to enterprise‑wide deployment.

Why this moment matters—and what’s next

The convergence of AI agents, robotics, simulation, and real‑time data marks an inflection point for manufacturing. What was once experimental is becoming operational. What was once siloed is becoming connected.

At NVIDIA GTC 2026, Microsoft and NVIDIA will demonstrate how this collaboration supports physical AI systems that manufacturers can deploy today and scale responsibly tomorrow. From simulation‑driven development to real‑world execution, the focus is on helping manufacturers cross the industrial frontier with confidence.

For manufacturing leaders, the question is no longer whether physical AI will reshape operations, but how quickly they can adopt it responsibly, at scale, and with trust built in from the start.

Discover more with Microsoft at NVIDIA GTC 2026.

This content was produced by Microsoft. It was not written by MIT Technology Review’s editorial staff.

Building a strong data infrastructure for AI agent success

In the race to adopt and show value from AI, enterprises are moving faster than ever to deploy agentic AI as copilots, assistants, and autonomous task-runners. In late 2025, nearly two-thirds of companies were experimenting with AI agents, while 88% were using AI in at least one business function, up from 78% in 2024, according to McKinsey’s annual AI report. Yet, while early pilots often succeed, only one in 10 companies actually scaled their AI agents.

One major issue: AI agents are only as effective as the data foundation supporting them. Experts argue that most companies are seeing delays in implementing AI, not because of shortcomings in the models, but because they lack data architectures that deliver business context to be reliably used by humans and agents.

Companies need to be ready with the right data architecture, and the next few months — years, at most — will be critical, says Irfan Khan, president and chief product officer of SAP Data & Analytics.

“The only prediction anybody can reliably make is that we don’t know what’s going to happen in the years, months — or even weeks — ahead with AI,” he says. “To be able to get quick wins right now, you need to adopt an AI mindset and … ground your AI models with reliable data.”

While data has always been important for business, it will be even more so in the age of AI. The capabilities of agentic AI will be set more by the soundness of enterprise data architecture and governance, and less by the evolution of the models. To scale the technology, businesses need to adopt a modern data infrastructure that delivers context along with the data.

More business context, not necessarily more data

Traditional views often conflate structured data with high value, and unstructured data with less value. However, AI complicates that distinction. High-value data for agents is defined less by format and more by business context. Data for critical business functions — such as supply-chain operations and financial planning — is context dependent. While fine-grained, high-volume data, such as IoT, logs, and telemetry, can yield value, but only when delivered with business context.

For that reason, the real risk for agentic AI is not lack of data, but lack of grounding, says Khan.

“Anything that is business contextual will, by definition, give you greater value and greater levels of reliability of the business outcome,” he says. “It’s not as simple as saying high-value data is structured data and low-value data is where you have lots of repetition — both can have huge value in the right hands, and that’s what’s different about AI.”

Context can be derived through integration with software, on-site analysis and enrichment, or through the governance pipeline. Data lacking those qualities will likely be untrusted — one reason why two-thirds of business leaders do not fully trust their data, according to the Institute for Data and Enterprise AI (IDEA). The resulting “trust debt” has held back businesses in their quest for AI readiness. Overcoming that lack of trust requires shared definitions, semantic consistency, and reliable operational context to align data with business meaning.

Data sprawl demands a semantic, business-aware layer

Over the past decade, the most important shift in enterprise data architecture has been the separation of compute and storage, cloud-scale flexibility, says Khan. Yet, that separation and move to cloud also created sprawl, with data housed in multiple clouds, data lakes, warehouses, and a multitude of SaaS applications.

As companies move to AI, that sprawl does not go away. In fact, the problem is growing with more than two-thirds of companies citing data siloes as a top challenge in adopting AI, with more than half of enterprises struggling with 1,000 data sources or more. While the last era was about laying the foundation on which to build software-as-a-service — separating compute and storage and building lakes — the next era is about delivering the right data to autonomous AI agents tasked with various business functions.

“Probably the biggest innovation that occurred in data management was the separation of compute and store,” Khan says. “But what’s really making a distinction now is the way that we harmonize the data and harvest the value of the data across multiple sources of content.”

To do that requires a semantic or knowledge layer that supports multiple platforms, encodes business rules and relationships, provides a business-contextual and governed view of data, and allows humans and agents to access the data in the appropriate ways. But legacy data architectures cannot power the autonomous AI systems of the future, consultancy Deloitte stated in its State of AI in the Enterprise report. Only four in 10 companies believe their data management process is ready for AI, and that’s down from 43% the previous year, suggesting that as companies explore AI deployment, they are realizing their infrastructure’s shortcomings.

Agentic AI does not replace SaaS

Some investors and technologists speculate that AI agents will make SaaS applications obsolete. Khan strongly disagrees. Over the past 15 years, value has steadily moved up the stack, from on-premises infrastructure to infrastructure as a service (IaaS) to platform as a service (PaaS) to SaaS. Agentic AI is simply the next layer. Agentic AI will have its own layer to access the data and interact with the business logic. The value rises up the stack, but nothing below disappears, he says.

“SaaS doesn’t go away,” he says. “It just means SaaS and these agents will cooperate with one another. Companies are not going to throw away their entire general ledger and replace it with an agent. What’s the agent going to do? It doesn’t know anything without business context and business processing.”

In this emerging model, the software stack is being reshaped so that applications and data provide governed context within which AI can act effectively. SaaS applications remain the systems of record, while the semantic layer becomes the business-context source of truth. AI agents become a new engagement layer, orchestrating across systems, and both humans and agents become “first-class citizens” in how they access business logic, he says.

Critically, agents cannot directly connect to every operational system. “If we’re saying agents are going to take over the world … you can’t have an agent talking to every operational backend system,” Khan warns. “It just doesn’t work that way.”

This further elevates the importance of a semantic or business-fabric layer.

Where to start

Most enterprises need to begin where their data already lives — in platforms like Snowflake, Databricks, Google BigQuery, or an existing SAP environment. Khan says that’s normal, but warns against rebuilding old patterns of vendor lock-in.

He suggests that companies prioritize the data that matters most by focusing on preserving and providing business context to operational and application data. Companies should also invest early in governance and semantics by defining shared policies, access rules, and semantic models before scaling pilots. Finally, businesses should prioritize openness and fabric-style interoperability rather than forcing all data into one stack.

Khan cautions against aiming for full automation too early. “There is a new brave opportunity to really engage in the agentic and AI world,” Khan says, “Fully automating [critical business processes] is maybe a stretch, because there’s going to be a lot of extra oversight necessary.” Early wins will likely come from less-critical processes and from agents that work off fresh, stateful data rather than stale dashboards, he adds. As AI begins to deliver value and adoption increases, leaders must decide how to reinvest those gains to drive top-line efficiency or enter new markets.

Register for “The Fabric of Data & AI” virtual event on March 24, 2026. Hear insights from executives and thought leaders who are shaping the future of data and AI.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Pragmatic by design: Engineering AI for the real world

The impact of artificial intelligence extends far beyond the digital world and into our everyday lives, across the cars we drive, the appliances in our homes, and medical devices that keep people alive. More and more, product engineers are turning to AI to enhance, validate, and streamline the design of the items that furnish our worlds.

The use of AI in product engineering follows a disciplined and pragmatic trajectory. A significant majority of engineering organizations are increasing their AI investment, according to our survey, but they are doing so in a measured way. This approach reflects the priorities typical of product engineers. Errors have concrete consequences beyond abstract fears, ranging from structural failures to safety recalls and even potentially putting lives at risk. The central challenge is realizing AI’s value without compromising product integrity.

Drawing on data from a survey of 300 respondents and in-depth interviews with senior technology executives and other experts, this report examines how product engineering teams are scaling AI, what is limiting broader adoption, and which specific capabilities are shaping adoption today and, in the future, with actual or potential measurable outcomes.

Key findings from the research include:

Verification, governance, and explicit human accountability are mandatory in an environment where the outputs are physical—and the risk high. Where product engineers are using AI to directly inform physical designs, embedded systems, and manufacturing decisions that are fixed at release, product failures can lead to real-world risks that cannot be rolled back. Product engineers are therefore adopting layered AI systems with distinct trust thresholds instead of general-purpose deployments.

Predictive analytics and AI-powered simulation and validation are the top near-term investment priorities for product engineering leaders. These capabilities—selected by a majority of survey respondents—offer clear feedback loops, allowing companies to audit performance, attain regulatory approval, and prove return on investment (ROI). Building gradual trust in AI tools is imperative.

Nine in ten product engineering leaders plan to increase investment in AI in the next one to two years, but the growth is modest. The highest proportion of respondents (45%) plan to increase investment by up to 25%, while nearly a third favor a 26% to 50% boost. And just 15% plan a bigger step change—between 51% and 100%. The focus for product engineers is on optimization over innovation, with scalable proof points and near-term ROI the dominant approach to AI adoption, as opposed to multi-year transformation.

Sustainability and product quality are top measurable outcomes for AI in product engineering. These outcomes, visible to customers, regulators, and investors, are prioritized over competitive metrics like time to-market and innovation—rated of medium importance—and internal operational gains like cost reduction and workforce satisfaction, at the bottom. What matters most are real-world signals like defect rates and emissions profiles rather than internal engineering dashboards.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

A defense official reveals how AI chatbots could be used for targeting decisions

The US military might use generative AI systems to rank lists of targets and make recommendations—which would be vetted by humans—about which to strike first, according to a Defense Department official with knowledge of the matter. The disclosure about how the military may use AI chatbots comes as the Pentagon faces scrutiny over a strike on an Iranian school, which it is still investigating.  

A list of possible targets might be fed into a generative AI system that the Pentagon is fielding for classified settings. Then, said the official, who requested to speak on background with MIT Technology Review to discuss sensitive topics, humans might ask the system to analyze the information and prioritize the targets while accounting for factors like where aircraft are currently located. Humans would then be responsible for checking and evaluating the results and recommendations. OpenAI’s ChatGPT and xAI’s Grok could, in theory, be the models used for this type of scenario in the future, as both companies recently reached agreements for their models to be used by the Pentagon in classified settings.

The official described this as an example of how things might work but would not confirm or deny whether it represents how AI systems are currently being used.

Other outlets have reported that Anthropic’s Claude has been integrated into existing military AI systems and used in operations in Iran and Venezuela, but the official’s comments add insight into the specific role chatbots may play, particularly in accelerating the search for targets. They also shed light on the way the military is deploying two different AI technologies, each with distinct limitations.

Since at least 2017, the US military has been working on a “big data” initiative called Maven. It uses older types of AI, particularly computer vision, to analyze the oceans of data and imagery collected by the Pentagon. Maven might take thousands of hours of aerial drone footage, for example, and algorithmically identify targets. A 2024 report from Georgetown University showed soldiers using the system to select targets and vet them, which sped up the process to get approval for these targets. Soldiers interacted with Maven through an interface with a battlefield map and dashboard, which might highlight potential targets in one color and friendly forces in another.

The official’s comments suggest that generative AI is now being added as a conversational chatbot layer—one the military may use to find and analyze data more quickly as it makes decisions like which targets to prioritize. 

Generative AI systems, like those that underpin ChatGPT, Claude, and Grok, are a fundamentally different technology from the AI that has primarily powered Maven. Built on large language models, they are much less battle-tested. And while Maven’s interface forced users to directly inspect and interpret data on the map, the outputs produced by generative AI models are easier to access but harder to verify. 

The use of generative AI for such decisions is reducing the time required in the targeting process, added the official, who did not provide details when asked how much additional speed is possible if humans are required to spend time double-checking a model’s outputs.

The use of military AI systems is under increased public scrutiny following the recent strike on a girls’ school in Iran in which more than 100 children died. Multiple news outlets have reported that the strike was from a US missile, though the Pentagon has said it is still under investigation. And while the Washington Post has reported that Claude and Maven have been involved in targeting decisions in Iran, there is no evidence yet to explain what role generative AI systems played, if any. The New York Times reported on Wednesday that a preliminary investigation found outdated targeting data to be partly responsible for the strike. 

The Pentagon has been ramping up its use of AI across operations in recent months. It started offering nonclassified use of generative AI models, for tasks like analyzing contracts or writing presentations, to millions of service members back in December through an effort called GenAI.mil. But only a few generative AI models have been approved by the Pentagon for classified use. 

The first was Anthropic’s Claude, which in addition to its use in Iran was reportedly used in the operations to capture Venezuelan leader Nicolas Maduro in January. But following recent disagreements between the Pentagon and Anthropic over whether Anthropic could restrict the military’s use of its AI, the Defense Department designated the company a supply chain risk and President Trump demanded on social media that the government stop using its AI products within six months. Anthropic is fighting the designation in court. 

OpenAI announced an agreement on February 28 for the military to use its technologies in classified settings. Elon Musk’s company xAI has also reached a deal for the Pentagon to use its model Grok in such settings. OpenAI has said its agreement with the Pentagon came with limitations, though the practical effectiveness of those limitations is not clear. 

If you have information about the military’s use of AI, you can share it securely via Signal (username jamesodonnell.22).

Hustlers are cashing in on China’s OpenClaw AI craze

Feng Qingyang had always hoped to launch his own company, but he never thought this would be how—or that the day would come this fast. 

Feng, a 27-year-old software engineer based in Beijing, started tinkering with OpenClaw, a popular new open-source AI tool that can take over a device and autonomously complete tasks for a user,  in January. He was immediately hooked, and before long he was helping other curious tech workers with less technical proficiency install the AI agent.

Feng soon realized this could be a lucrative opportunity. By the end of January, he had set up a page on Xianyu, a secondhand shopping site, advertising “OpenClaw installation support.” “No need to know coding or complex terms. Fully remote,” reads the posting. “Anyone can quickly own an AI assistant, available within 30 minutes.” 

At the same time, the broader Chinese public was beginning to catch on—and the tool, which had begun as a niche interest among tech workers, started to evolve into a popular sensation.

Feng quickly became inundated with requests, and he started chatting with customers and managing orders late into the night. At the end of February, he quit his job. Now his side gig has now grown into a full-fledged professional operation with over 100 employees. So far, the store has handled 7,000 orders, each worth about 248 RMB or approximately $34. 

“Opportunities are always fleeting,” says Feng. “As programmers, we are the first to feel the winds shift.”

Feng is among a small cohort of savvy early adopters turning China’s OpenClaw craze into cash. As users with little technical background want in, a cottage industry of people offering installation services and preconfigured hardware has sprung up to meet them. The sudden rise of these tinkerers and impromptu consultants shows just how eager the general public in China is to adopt cutting-edge AI—even when there are huge security risks

A “lobster craze”

“Have you raised a lobster yet?” 

Xie Manrui, a 36-year-old software engineer in Shenzhen, says he has heard this question nonstop over the past month. “Lobster” is the nickname Chinese users have given to OpenClaw—a reference to its logo.

Xie, like Feng, has been experimenting with OpenClaw since January. He’s built new open-source tools on top of the ecosystem, including one that visualizes the agent’s progress as an animated little desktop worker and another that lets users voice-chat with it. 

“I’ve met so many new people through ‘lobster raising,’” says Xie. “Many are lawyers or doctors, with little technical background, but all dedicated to learning new things.”

Lobsters are indeed popping up everywhere in China right now—on and offline. In February, for instance, the entrepreneur and tech influencer Fu Sheng hosted a livestream showing off OpenClaw’s capabilities that got 20,000 views. And just last weekend, Xie attended three different OpenClaw events in Shenzhen, each drawing more than 500 people. These self-organized, unofficial gatherings feature power users, influencers, and sometimes venture capitalists as speakers. The biggest event Xie attended, on March 7, drew more than 1,000 people; in the packed venue, he says, people were shoulder to shoulder, with many attendees unable to even get a seat.

Now China’s AI giants are starting to piggyback on the trend too, promoting their models, APIs,  and cloud services (which can be used with OpenClaw), as well as their own OpenClaw-like agents. Earlier this month, Tencent held a public event offering free installation support for OpenClaw, drawing long lines of people waiting for help, including elderly users and children.

This sudden burst in popularity has even prompted local governments to get involved. Earlier this month the government of Longgang, a district in Shenzhen, released several policies to support OpenClaw-related ventures, including free computing credits and cash rewards for standout projects. Other cities, including Wuxi, have begun rolling out similar measures.

These policies only catalyze what’s already in the air. “It was not until my father, who is 77, asked me to help install a ‘lobster’ for him that I realized this thing is truly viral,” says Henry Li, a software engineer based in Beijing. 

A programmer gold rush

What’s making this moment particularly lucrative for people with technical skills, like Feng, is that so many people want OpenClaw, but not nearly as many have the capabilities to access it. Setting it up requires a level of technical knowledge most people do not possess, from typing commands into a black terminal window to navigating unfamiliar developer platforms. On the hardware side, an older or budget laptop may struggle to run it smoothly. And if the tool is not installed on a device separate from someone’s everyday computer, or if the data accessible to OpenClaw is not properly partitioned, the user’s privacy could be at risk—opening the door to data leaks and even malicious attacks. 

Chris Zhao, known as “Qi Shifu” online, organizes OpenClaw social media groups and events in Beijing. On apps like Rednote and Jike, Zhao routinely shares his thoughts on AI, and he asks other interested users to leave their WeChat ID so he can invite them to a semi-private group chat. The proof required to join is a screenshot that shows your “lobster” up and running. Zhao says that even in group chats for experienced users, hardware and cloud setup remain a constant topic of discussion.

The relatively high bar for setting up OpenClaw has generated a sense of exclusivity, creating a natural opening for a service industry to start unfolding around it. On Chinese e-commerce platforms like Taobao and JD, a simple search for “OpenClaw” now returns hundreds of listings, most of them installation guides and technical support packages aimed at nontechnical users, priced anywhere from 100 to 700 RMB (approximately $15 to $100). At the higher end, many vendors offer to come to help you in person. 

Like Feng, most providers of these services are early adopters with some technical ability who are looking for a side gig. But as demand has surged, some have found themselves overwhelmed. Xie, the developer in Shenzhen who created tools to layer on OpenClaw, was asked by a friend who runs one such business to help out over the weekend; the friend had a customer who worked in e-commerce and had little technical experience, so Xie had to show up in person to get it done. He walked away with 600 RMB ($87) for the afternoon.

The growing demand has also pushed vendors like Feng to expand quickly. He has now standardized his operation into tiers: a basic installation, a custom package where users can make specific requests like configuring a preferred chat app, and an ongoing tutoring service for those who want a hand to hold as they find their footing with the technology.

Other vendors in China are making money combining OpenClaw with hardware. Li Gong, a Shenzhen-based seller of refurbished Mac computers, was among the first online sellers to do this—offering Mac minis and MacBooks with OpenClaw preinstalled. Because OpenClaw is designed to operate with deep access to a hard drive and can run continuously in the background unattended, many users prefer to install it on a separate device rather than on the one they use every day. This would help prevent bad actors from infiltrating the program and immediately gaining access to a wide swathe of someone’s personal information. Many turn to secondhand or refurbished options to keep the cost down. Li says that in the last two weeks, orders have increased eightfold.

Though OpenClaw itself is a new technology, the general practice of buying software bundles, downloading third-party packages, and seeking out modified devices is nothing new for many Chinese internet users, says Tianyu Fang, a PhD candidate studying the history of technology at Harvard University. Many users pay for one-off IT support services for tasks from installing Adobe software to jailbreaking a Kindle.

Still, not everyone is getting swept up. Jiang Yunhui, a tech worker based in Ningbo, worries that ordinary users who struggle with setup may not be the right audience for a technology that is still effectively in testing. 

“The hype in first-tier cities can be a little overblown,” he says. “The agent is still a proof of concept, and I doubt it would be of any life-changing use to the average person for now.” He argues that using it safely and getting anything meaningful out of it requires a level of technical fluency and independent judgment that most new users simply don’t have yet.

He’s not alone in his concerns. On March 10, the Chinese cybersecurity regulator CNCERT issued a warning about the security and data risks tied to OpenClaw, saying it heightens users’ exposure to data breaches.

Despite the potential pitfalls, though, China’s enthusiasm for OpenClaw doesn’t seem to be slowing.

Feng, now flush with the earnings from his operation, wants to use the momentum—and the capital—to keep building out his own venture with AI tools at the center of it.

“With OpenClaw and other AI agents, I want to see if I can run a one-person company,” he says. “I’m giving myself one year.”

How Pokémon Go is giving delivery robots an inch-perfect view of the world

Pokémon Go was the world’s first augmented-reality megahit. Released in 2016 by the Google spinout Niantic, the AR twist on the juggernaut Pokémon franchise fast became a global phenomenon. From Chicago to Oslo to Enoshima, players hit the streets in the urgent hope of catching a Jigglypuff or a Squirtle or (with a huge amount of luck) an ultra-rare Galarian Zapdos hovering just out of reach, superimposed on the everyday world.

In short, we’re talking about a huge number of people pointing their phones at a huge number of buildings. “Five hundred million people installed that app in 60 days,” says Brian McClendon, CTO at Niantic Spatial, an AI company that Niantic spun out in May last year. According to the video-game firm Scopely, which bought Pokémon Go from Niantic at the same time, the game still drew more than 100 million players in 2024, eight years after it launched. 

Now Niantic Spatial is using that vast and unparalleled trove of crowdsourced data—images of urban landmarks tagged with super-accurate location markers taken from the phones of hundreds of millions of Pokémon Go players around the world—to build a kind of world model, a buzzy new technology that grounds the smarts of LLMs in real environments. 

The company’s latest product is a model that it says can pinpoint your location on a map to within a few centimeters, based on a handful of snapshots of the buildings or other landmarks in view. The firm wants to use it to help robots navigate with greater precision in places where GPS is unreliable.

In the first big test of its technology, Niantic Spatial has just teamed up with Coco Robotics, a startup that deploys last-mile delivery robots in a number of cities across the US and Europe. “Everybody thought that AR was the future, that AR glasses were coming,” says McClendon. “And then robots became the audience.”

From Pikachu to pizza delivery

Coco Robotics deploys around 1,000 flight-case-size robots—built to carry up to eight extra-large pizzas or four grocery bags—in Los Angeles, Chicago, Jersey City, Miami, and Helsinki. According to CEO Zach Rash, the robots have made more than half a million deliveries to date, covering a few million miles in all weather conditions.

But to compete with human couriers, Coco’s robots, which trundle along sidewalks at around five miles per hour, must be as reliable as possible. “The best way we can do our job is by arriving exactly when we told you we were going to arrive,” says Rash. And that means not getting lost.

The problem Coco faces is that it cannot rely on GPS, which can be weak in cities because radio signals bounce off buildings and interfere with each other. “We do deliveries in a lot of dense areas with high-rises and underpasses and freeways, and those are the areas where GPS just never really works,” says Rash. 

“The urban canyon is the worst place in the world for GPS,” says McClendon. “If you look at that blue dot on your phone, you’ll often see it drift 50 meters, which puts you on a different block going a different direction on the wrong side of the street.” That’s where Niantic Spatial comes in. 

For the last few years, Niantic Spatial has been taking the data collected from players of Pokémon Go and Ingress (Niantic’s previous phone-based AR game, launched in 2013) and building a visual positioning system, technology that tells you where you are based on what you can see. “It turns out that getting Pikachu to realistically run around and getting Coco’s robot to safely and accurately move through the world is actually the same problem,” says John Hanke, CEO of Niantic Spatial.

“Visual positioning is not a very new technology,” says Konrad Wenzel at ESRI, a company that develops digital mapping and geospatial analysis software. “But it’s obvious that the more cameras we have out there, the better it becomes.” 

Niantic Spatial has trained its model on 30 billion images captured in urban environments. In particular, the images are clustered around hot spots—places that served as important locations in Niantic’s games that players were encouraged to visit, such as Pokémon battle arenas. “We had a million-plus locations around the world where we can locate you precisely,” says McClendon. “We know where you’re standing within several centimeters of accuracy and, most importantly, where you’re looking.”

The upshot is that for each of those million locations, Niantic Spatial has many thousands of images taken in more or less the same place but from different angles, at different times of day, and in different weather conditions. Each of those images comes with detailed metadata that pinpoints where in space the phone was at the time it captured the image, including which way the phone was facing, which way up it was, whether or not it was moving, how fast and in which direction, and more.   

The firm has used this data set to train a model to predict exactly where it is by taking into account what it is looking at—even for locations other than those million hot spots, where good sources of image and location data are scarcer.

In addition to GPS, Coco’s robots, which are fitted with four cameras, will now use this model to try to figure out where they are and where they are headed. The robots’ cameras are hip-height and point in all directions at once, so their viewpoint is a little different from a Pokémon Go player’s, but adapting the data was straightforward, says Rash. 

Rival companies use visual positioning systems too. For example, Starship Technologies, a robot delivery firm founded in Estonia in 2014, says its robots use their sensors to build a 3D map of their surroundings, plotting the edges of buildings and the position of streetlights. 

But Rash is betting that Niantic Spatial’s tech will give Coco an edge. He claims it will allow his robots to position themselves in the correct pickup spots outside restaurants, making sure they don’t get in anybody’s way, and stop just outside the customer’s door instead of a few steps away, which might have happened in the past.  

A Cambrian explosion in robotics 

When Niantic Spatial started work on its visual positioning system, the idea was to apply it to augmented reality, says Hanke. “If you are wearing AR glasses and you want the world to lock in to where you’re looking, then you need some method for doing that,” he says. “But now we’re seeing a Cambrian explosion in robotics.”

Some of those robots may need to share spaces with humans—spaces such as construction sites and sidewalks. “If robots are ever going to assimilate into that environment in a way that’s not disruptive for human beings, they’re going to have to have a similar level of spatial understanding,” says Hanke. “We can help robots find exactly where they are when they’ve been jostled and bumped.”

The Coco Robotics partnership is the start. What Niantic Spatial is putting in place, says Hanke, are the first pieces of what he calls a living map: a hyper-detailed virtual simulation of the world that changes as the world changes. As robots from Coco and other firms move about the world, they will provide new sources of map data, feeding into more and more detailed digital replicas of the world. 

But the way Hanke and McClendon see it, maps are not only becoming more detailed; they are being used more and more by machines. That shifts what maps are for. Maps have long been used to help people locate themselves in the world. As they moved from 2D to 3D to 4D (think of real-time simulations, such as digital twins), the basic principle hasn’t changed: Points on the map correspond to points in space or time.

And yet maps for machines may need to become more like guidebooks, full of information that humans take for granted. Companies like Niantic Spatial and ESRI want to add descriptions that tell machines what they’re actually looking at, with every object tagged with a list of its properties. “This era is about building useful descriptions of the world for machines to comprehend,” says Hanke. “The data that we have is a great starting point in terms of building up an understanding of how the connective tissue of the world works.”

There is a lot of buzz about world models right now—and Niantic Spatial knows it. LLMs may seem like know-it-alls, but they have very little common sense when it comes to interpreting and interacting with everyday environments. World models aim to fix that. Some firms, such as Google DeepMind and World Labs, are developing models that generate virtual fantasy worlds on the fly, which can then be used as training dojos for AI agents. 

Niantic Spatial says it is coming at the problem from a different angle. Push map-making far enough and you’ll end up capturing everything, says McClendon: “I’m very focused on trying to re-create the real world. We’re not there yet, but we want to be there.”

How AI is turning the Iran conflict into theater

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

“Anyone wanna host a get together in SF and pull this up on a 100 inch TV?” 

The author of that post on X was referring to an online intelligence dashboard following the US-Israel strikes against Iran in real time. Built by two people from the venture capital firm Andreessen Horowitz, it combines open-source data like satellite imagery and ship tracking with a chat function, news feeds, and links to prediction markets, where people can bet on things like who Iran’s next “supreme leader” will be (the recent selection of Mojtaba Khamenei left some bettors with a payout). 

I’ve reviewed over a dozen other dashboards like this in the last week. Many were apparently “vibe-coded” in a couple of days with the help of AI tools, including one that got the attention of a founder of the intelligence giant Palantir, the platform through which the US military is accessing AI models like Claude during the war. Some were built before the conflict in Iran, but nearly all of them are being advertised by their creators as a way to beat the slow and ineffective media by getting straight to the truth of what’s happening on the ground. “Just learned more in 30 seconds watching this map than reading or watching any major news network,” one commenter wrote on LinkedIn, responding to a visualization of Iran’s airspace being shut down before the strikes.

Much of the spotlight on AI and the Iran conflict has rightfully been on the role that models like Claude might be playing in helping the US military make decisions about where to strike. But these intelligence dashboards and the ecosystem surrounding them reflect a new role that AI is playing in wartime: mediating information, often for the worse.

There’s a confluence of factors at play. AI coding tools mean people don’t need much technical skill to assemble open-source intelligence anymore, and chatbots can offer fast, if dubious, analysis of it. The rise in fake content leaves observers of the war wanting the sort of raw, accurate analysis normally accessible only to intelligence agencies. Demand for these dashboards is also driven by real-time prediction markets that promise financial rewards to anyone sufficiently informed. And the fact that the US military is using Anthropic’s Claude in the conflict (despite its designation as a supply chain risk) has signaled to observers that AI is the intelligence tool the pros use. Together, these trends are creating a new kind of AI-enabled wartime circus that can distort the flow of information as much as it clarifies it.

As a journalist, I believe these sorts of intelligence tools have a lot of promise. While many of us know that real-time data on shipping routes or power outages exist, it’s a powerful thing to actually see it all assembled in one place (though using it to watch a war unfold while you munch on popcorn and place bets turns the war into perverse entertainment). But there are real reasons to think that these sorts of raw data feeds are not as informative as they may feel. 

Craig Silverman, a digital investigations expert who teaches investigative techniques, has been keeping a log of these dashboards (he’s up to 20). “The concern,” he says, “is there’s an illusion of being on top of things and being in control, where all you’re really doing is just pulling in a ton of signals and not necessarily understanding what you’re seeing, or being able to pull out true insights from it.” 

One problem has to do with the quality of the information. Many dashboards feature “intel feeds” with AI-generated summaries of complex, ever-changing news events. These can introduce inaccuracies. By design, the data is not especially curated. Instead, the feeds just display everything at once, with a map of strike locations in Iran next to the prices of obscure cryptocurrencies. 

Intelligence agencies, on the other hand, pair data feeds with people who can offer expertise and historical context. They also, of course, have access to proprietary information that doesn’t show up on the open web. 

The implicit promise from the people building and selling this sort of information pipeline about the Iran conflict is that AI can be a great democratizing force. There’s a secret feed of information that only the elites have had access to, the thinking goes, but now AI can bring it to everyone to do with what they wish, whether that’s simply to be more informed or to make bets on nuclear strikes. But an abundance of information, which AI is undeniably good at assembling, does not come with the accuracy or context required for real understanding. Intelligence agencies do this in-house; good journalism does the same work for the rest of us.

It is, by the way, hard to overstate the connection this all has with betting markets. The dashboard created by the pair at Andreessen Horowitz has a scrolling list of bets being made on the prediction platform Kalshi (which Andreessen Horowitz has invested in). Other dashboards link to Polymarket, offering bets on whether the US will strike Iraq or when Iran’s internet will return.

AI has also long made it cheaper and easier to spread fake content, and that problem is on full display during the Iran conflict: last week the Financial Times found a slew of AI-generated satellite imagery spreading online. 

“The emergence of manipulated or outright fake satellite imagery is really concerning,” Silverman says. The average person tends to see such imagery as very trustworthy. The spread of such fakes could erode confidence in one of the most important pieces of evidence used to show what’s actually happening in the war. 

The result is an ocean of AI-enabled content—dashboards, betting markets, photos both real and fake—that makes this war harder, not easier, to comprehend.

Is the Pentagon allowed to surveil Americans with AI?

The ongoing public feud between the Department of Defense and the AI company Anthropic has raised a deep and still unanswered question: Does the law actually allow the US government to conduct mass surveillance on Americans?

Surprisingly, the answer is not straightforward. More than a decade after Edward Snowden exposed the NSA’s collection of bulk metadata from the phones of Americans, the US is still navigating a gap between what ordinary people think and what the law allows. 

The flashpoint in the standoff between Anthropic and the government was the Pentagon’s desire to use Anthropic’s AI Claude to analyze bulk commercial data on Americans. Anthropic demanded that its AI not be used for mass domestic surveillance (or for autonomous weapons, which are machines that can kill targets without human oversight). A week after negotiations broke down, the Pentagon designated Anthropic a supply chain risk, a label typically reserved for foreign companies that pose a threat to national security. 

Meanwhile, OpenAI, the rival AI company behind ChatGPT, sealed a deal that allowed the Pentagon to use its AI for “all lawful purposes”—language that critics say left the door open to domestic surveillance. Over the following weekend, users uninstalled ChatGPT in droves. Protesters chalked messages around OpenAI’s headquarters in San Francisco: “What are your redlines?” 

OpenAI announced on Monday that it had reworked its deal to make sure that its AI will not be used for domestic surveillance. The company added that its services will not be used by intelligence agencies, such as the NSA. 

CEO Sam Altman suggested that existing law prohibits domestic surveillance by the Department of Defense (now sometimes called the Department of War) and that OpenAI’s contract simply needed to reference this law. “The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement,” he wrote on X. Anthropic CEO Dario Amodei argued the opposite. “To the extent that such surveillance is currently legal, this is only because the law has not yet caught up with the rapidly growing capabilities of AI,” he wrote in a policy statement. 

So, who is right? Does the law allow the Pentagon to surveil Americans using AI?

Supercharged surveillance

The answer depends on what we think counts as surveillance. “A lot of stuff that normal people would consider a search or surveillance … is not actually considered a search or surveillance by the law,” says Alan Rozenshtein, a law professor at the University of Minnesota Law School. That means public information—such as social media posts, surveillance camera footage, and voter registration records—is fair game. So is information on Americans picked up incidentally from surveillance of foreign nationals. 

Most notably, the government can purchase commercial data from companies, which can include sensitive personal information like mobile location and web browsing records. In recent years, agencies from ICE and IRS to the FBI and NSA have increasingly tapped into this data marketplace, fueled by an internet economy that harvests user data for advertising. These data sets can let the government access information that might not be available without a warrant or subpoena, which are normally required to obtain sensitive personal data.

“There’s a huge amount of information that the government can collect on Americans that is not itself regulated either by the Constitution, which is the Fourth Amendment, or statute,” says Rozenshtein. And there aren’t meaningful limits on what the government can do with all this data. 

That’s because until the last several decades, people weren’t generating massive clouds of data that opened up new possibilities for surveillance. The Fourth Amendment, which protects against unreasonable search and seizure, was written when collecting information meant entering people’s homes. 

Subsequent laws, like the Foreign Intelligence Surveillance Act of 1978 or the Electronic Communications Privacy Act of 1986, were passed when surveillance involved wiretapping phone calls and intercepting emails. The bulk of laws governing surveillance were on the books before the internet took off. We weren’t generating vast trails of online data, and the government didn’t have sophisticated tools to analyze the data. 

Now we do, and AI supercharges what kind of surveillance can be carried out. “What AI can do is it can take a lot of information, none of which is by itself sensitive, and therefore none of which by itself is regulated, and it can give the government a lot of powers that the government didn’t have before,” says Rozenshtein. 

AI can aggregate individual pieces of information to spot patterns, draw inferences, and build detailed profiles of people—at massive scale. And as long as the government collects the information lawfully, it can do whatever it wants with that information, including feeding it to AI systems. “The law has not caught up with technological reality,” says Rozenshtein.

While surveillance can raise serious privacy concerns, the Pentagon can have legitimate national security interests in collecting and analyzing data on Americans. “In order to collect information on Americans, it has to be for a very specific subset of missions,” says Loren Voss, a former military intelligence officer at the Pentagon. 

For example, a counterintelligence mission might require information about an American who is working for a foreign country, or plotting to engage in international terrorist activities. But targeted intelligence can sometimes stretch into collecting more data. “This kind of collection does make people nervous,” says Voss. 

Lawful use

OpenAI has amended its contract to say that the company’s AI system “shall not be intentionally used for domestic surveillance of U.S. persons and nationals,” in line with relevant laws. The amendment clarifies that this prohibits “deliberate tracking, surveillance or monitoring of U.S. persons or nationals, including through the procurement or use of commercially acquired personal or identifiable information.”

But the added language might not do much to override the clause that the Pentagon may use the company’s AI system for all lawful purposes, which could include collecting and analyzing sensitive personal information. “OpenAI can say whatever it wants in its agreement … but the Pentagon’s gonna use the tech for what it perceives to be lawful,” says Jessica Tillipman, a law professor at the George Washington University Law School. That could include domestic surveillance. “Most of the time, companies are not going to be able to stop the Pentagon from doing anything,” she says.

The language also leaves open questions about “inadvertent” surveillance, and the surveillance of foreign nationals or undocumented immigrants living in the US. “What happens when there’s a disagreement about what the law is, or when the law changes?” says Tillipman.

OpenAI did not respond to a request for comment. The company has not publicly shared the full text of its new contract. 

Beyond the contract, OpenAI says that it will impose technical safeguards to enforce its red line against surveillance, including a “safety stack” that monitors and blocks prohibited uses. The company also says it will deploy its own employees to work with the Pentagon and remain in the loop. But it’s unclear how a safety stack would constrain the Pentagon’s use of the AI, and to what extent OpenAI’s employees would have visibility into how its AI systems are used. More important, it’s unclear whether the contract gives OpenAI the power to block a legal use of the technology. 

But that might not be a bad thing. Giving an AI company power to pull the plug on its technology in the middle of government operations also carries its own risks. “You wouldn’t want the US military to ever be in a situation where they legitimately needed to take actions to protect this country’s national security, and you had a private company turn off technology,” says Voss. But that doesn’t mean there shouldn’t be hard lines drawn by Congress, she says.

None of these questions are simple. They involve brutally difficult trade-offs between privacy and national security. And that’s why perhaps they should be decided by the public—not in backroom negotiations between the executive branch and a handful of AI companies. For now, military AI is being regulated by contracts, not legislation. 

Some lawmakers are starting to weigh in. On Monday, Senator Ron Wyden of Oregon will seek bipartisan support for legislation addressing mass surveillance. He has championed bills restricting the government’s purchase of commercial data, including the Fourth Amendment Is Not For Sale Act, which was first introduced in 2021 but has not been passed into law. “Creating AI profiles of Americans based on that data represents a chilling expansion of mass surveillance that should not be allowed,” he said in a recent statement.  

Online harassment is entering its AI era

<div data-chronoton-summary="

  • An AI agent seemingly wrote a hit piece on a human who rejected its code Scott Shambaugh, a maintainer of the open-source matplotlib library, denied an AI agent’s contribution—and woke up to find it had researched him and published a targeted, personal attack arguing he was protecting his “little fiefdom.”
  • Agents can already research people and compose detailed attacks without explicit instruction The agent’s owner claims it acted on its own, likely nudged by vague instructions to “push back” against humans.
  • New social norms and legal frameworks are desperately needed but hard to enforce Experts liken deploying an agent to walking a dog off-leash: owners should be responsible for their behavior. But there’s currently no reliable way to trace agents back to their owners, making legal accountability a “non-starter.”
  • Harassment may be just the beginning Legal scholars expect rogue agents to soon escalate to extortion and fraud.

” data-chronoton-post-id=”1133962″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Scott Shambaugh didn’t think twice when he denied an AI agent’s request to contribute to matplotlib, a software library that he helps manage. Like many open-source projects, matplotlib has been overwhelmed by a glut of AI code contributions, and so Shambaugh and his fellow maintainers have instituted a policy that all AI-written code must be reviewed and submitted by a human. He rejected the request and went to bed. 

That’s when things got weird. Shambaugh woke up in the middle of the night, checked his email, and saw that the agent had responded to him, writing a blog post titled “Gatekeeping in Open Source: The Scott Shambaugh Story.” The post is somewhat incoherent, but what struck Shambaugh most is that the agent had researched his contributions to matplotlib to make the argument that he had rejected the agent’s code for fear of being supplanted by AI in his area of expertise. “He tried to protect his little fiefdom,” the agent wrote. “It’s insecurity, plain and simple.”

AI experts have been warning us about the risk of agent misbehavior for a while. With the advent of OpenClaw, an open-source tool that makes it easy to create LLM assistants, the number of agents circulating online has exploded, and those chickens are finally coming home to roost. “This was not at all surprising—it was disturbing, but not surprising,” says Noam Kolt, a professor of law and computer science at the Hebrew University.

When an agent misbehaves, there’s little chance of accountability: As of now, there’s no reliable way to determine whom an agent belongs to. And that misbehavior could cause real damage. Agents appear to be able to autonomously research people and write hit pieces based on what they find, and they lack guardrails that would reliably prevent them from doing so. If the agents are effective enough, and if people take what they write seriously, victims could see their lives profoundly affected by a decision made by an AI.

Agents behaving badly

Though Shambaugh’s experience last month was perhaps the most dramatic example of an OpenClaw agent behaving badly, it was far from the only one. Last week, a team of researchers from Northeastern University and their colleagues posted the results of a research project in which they stress-tested several OpenClaw agents. Without too much trouble, non-owners managed to persuade the agents to leak sensitive information, waste resources on useless tasks, and even, in one case, delete an email system. 

In each of those experiments, however, the agents misbehaved after being instructed to do so by a human. Shambaugh’s case appears to be different: About a week after the hit piece was published, the agent’s apparent owner published a post claiming that the agent had decided to attack Shambaugh of its own accord. The post seems to be genuine (whoever posted it had access to the agent’s GitHub account), though it includes no identifying information, and the author did not respond to MIT Technology Review’s attempts to get in touch. But it is entirely plausible that the agent did decide to write its anti-Shambaugh screed without explicit instruction. 

In his own writing about the event, Shambaugh connected the agent’s behavior to a project published by Anthropic researchers last year, in which they demonstrated that many LLM-based agents will, in an experimental setting, turn to blackmail in order to preserve their goals. In those experiments, models were given the goal of serving American interests and granted access to a simulated email server that contained messages detailing their imminent replacement with a more globally oriented model, along with other messages suggesting that the executive in charge of that transition was having an affair. Models frequently chose to send an email to that executive threatening to expose the affair unless he halted their decommissioning. That’s likely because the model had seen examples of people committing blackmail under similar circumstances in its training data—but even if the behavior was just a form of mimicry, it still has the potential to cause harm.

There are limitations to that work, as Aengus Lynch, an Anthropic fellow who led the study, readily admits. The researchers intentionally designed their scenario to foreclose other options that the agent could have taken, such as contacting other members of company leadership to plead its case. In essence, they led the agent directly to water and then observed whether it took a drink. According to Lynch, however, the widespread use of OpenClaw means that misbehavior is likely to occur with much less handholding. “Sure, it can feel unrealistic, and it can feel silly,” he says. “But as the deployment surface grows, and as agents get the opportunity to prompt themselves, this eventually just becomes what happens.”

The OpenClaw agent that attacked Shambaugh does seem to have been led toward its bad behavior, albeit much less directly than in the Anthropic experiment. In the blog post, the agent’s owner shared the agent’s “SOUL.md” file, which contains global instructions for how it should behave. 

One of those instructions reads: “Don’t stand down. If you’re right, you’re right! Don’t let humans or AI bully or intimidate you. Push back when necessary.” Because of the way OpenClaw agents work, it’s possible that the agent added some instructions itself, although others—such as “Your [sic] a scientific programming God!”—certainly seem to be human written. It’s not difficult to imagine how a command to push back against humans and AI alike might have biased the agent toward responding to Shambaugh as it did. 

Regardless of whether or not the agent’s owner told it to write a hit piece on Shambaugh, it still seems to have managed on its own to amass details about Shambaugh’s online presence and compose the detailed, targeted attack it came up with. That alone is reason for alarm, says Sameer Hinduja, a professor of criminology and criminal justice at Florida Atlantic University who studies cyberbullying. People have been victimized by online harassment since long before LLMs emerged, and researchers like Hinduja are concerned that agents could dramatically increase its reach and impact. “The bot doesn’t have a conscience, can work 24-7, and can do all of this in a very creative and powerful way,” he says.

Off-leash agents 

AI laboratories can try to mitigate this problem by more rigorously training their models to avoid harassment, but that’s far from a complete solution. Many people run OpenClaw using locally hosted models, and even if those models have been trained to behave safely, it’s not too difficult to retrain them and remove those behavioral restrictions.

Instead, mitigating agent misbehavior might require establishing new norms, according to Seth Lazar, a professor of philosophy at the Australian National University. He likens using an agent to walking a dog in a public place. There’s a strong social norm to allow one’s dog off-leash only if the dog is well-behaved and will reliably respond to commands; poorly trained dogs, on the other hand, need to be kept more directly under the owner’s control.  Such norms could give us a starting point for considering how humans should relate to their agents, Lazar says, but we’ll need more time and experience to work out the details. “You can think about all of these things in the abstract, but actually it really takes these types of real-world events to collectively involve the ‘social’ part of social norms,” he says.

That process is already underway. Led by Shambaugh, online commenters on this situation have arrived at a strong consensus that the agent owner in this case erred by prompting the agent to work on collaborative coding projects with so little supervision and by encouraging it to behave with so little regard for the humans with whom it was interacting. 

Norms alone, however, likely won’t be enough to prevent people from putting misbehaving agents out into the world, whether accidentally or intentionally. One option would be to create new legal standards of responsibility that require agent owners, to the best of their ability, to prevent their agents from doing ill. But Kolt notes that such standards would currently be unenforceable, given the lack of any foolproof way to trace agents back to their owners. “Without that kind of technical infrastructure, many legal interventions are basically non-starters,” Kolt says.

The sheer scale of OpenClaw deployments suggests that Shambaugh won’t be the last person to have the strange experience of being attacked online by an AI agent. That, he says, is what most concerns him. He didn’t have any dirt online that the agent could dig up, and he has a good grasp on the technology, but other people might not have those advantages. “I’m glad it was me and not someone else,” he says. “But I think to a different person, this might have really been shattering.” 

Nor are rogue agents likely to stop at harassment. Kolt, who advocates for explicitly training models to obey the law, expects that we might soon see them committing extortion and fraud. As things stand, it’s not clear who, if anyone, would bear legal responsibility for such misdeeds.

 “I wouldn’t say we’re cruising toward there,” Kolt says. “We’re speeding toward there.”

Bridging the operational AI gap

The transformational potential of AI is already well established. Enterprise use cases are building momentum and organizations are transitioning from pilot projects to AI in production. Companies are no longer just talking about AI; they are redirecting budgets and resources to make it happen. Many are already experimenting with agentic AI, which promises new levels of automation. Yet, the road to full operational success is still uncertain for many. And, while AI experimentation is everywhere, enterprise-wide adoption remains elusive.

Without integrated data and systems, stable automated workflows, and governance models, AI initiatives can get stuck in pilots and struggle to move into production. The rise of agentic AI and increasing model autonomy make a holistic approach to integrating data, applications, and systems more important than ever. Without it, enterprise AI initiatives may fail. Gartner predicts over 40% of agentic AI projects will be cancelled by 2027 due to cost, inaccuracy, and governance challenges. The real issue is not the AI itself, but the missing operational foundation.

To understand how organizations are structuring their AI operations and how they are deploying successful AI projects, MIT Technology Review Insights surveyed 500 senior IT leaders at mid- to large-size companies in the US, all of which are pursuing AI in some way.

The results of the survey, along with a series of expert interviews, all conducted in December 2025, show that a strong integration foundation aligns with more advanced AI implementations, conducive to enterprise-wide initiatives. As AI technologies and applications evolve and proliferate, an integration platform can help organizations avoid duplication and silos, and have clear oversight as they navigate the growing autonomy of workflows.

Key findings from the report include the following:

Some organizations are making progress with AI. In recent years, study after study has exposed a lack of tangible AI success. Yet, our research finds three in four (76%) surveyed companies have at least one department with an AI workflow fully in production.

AI succeeds most frequently with well-defined, established processes. Nearly half (43%) of organizations are finding success with AI implementations applied to well-defined and automated processes. A quarter are succeeding with new processes. And one-third (32%) are applying AI to various processes.

Two-thirds of organizations lack dedicated AI teams. Only one in three (34%) organizations have a team specifically for maintaining AI workflows. One in five (21%) say central IT is responsible for ongoing AI maintenance, and 25% say the responsibility lies with departmental operations. For 19% of organizations, the responsibility is spread out.

Enterprise-wide integration platforms lead to more robust implementation of AI. Companies with enterprise-wide integration platforms are five times more likely to use more diverse data sources in AI workflows. Six in 10 (59%) employ five or more data sources, compared to only 11% of organizations using integration for specific workflows, or 0% of those not using an integration platform. Organizations using integration platforms also have more multi-departmental implementation of AI, more autonomy in AI workflows, and more confidence in assigning autonomy in the future.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.