Elon Musk and Sam Altman are going to court over OpenAI’s future

After a yearslong legal feud, Elon Musk and OpenAI CEO Sam Altman are heading to trial this week in Northern California in a case that could have sweeping consequences. Ahead of OpenAI’s highly anticipated IPO, the court could rule on whether the company is allowed to exist as a for-profit enterprise and might even oust its current executive leadership, including Altman.

Musk is suing OpenAI, alleging that Altman and OpenAI president Greg Brockman deceived him into bankrolling the company in its early days by promising to maintain it as a nonprofit dedicated to developing AI that benefits humanity, only to later restructure the company to operate a for-profit subsidiary. Musk cofounded OpenAI with Altman and others in 2015, but he left in 2018 after a bitter power struggle. 

Musk is seeking as much as $134 billion in damages from OpenAI and Microsoft, one of OpenAI’s biggest financial backers. He is also asking the court to remove Altman and Brockman from their roles and to restore OpenAI as a nonprofit. Musk has asked the court to award any damages to OpenAI’s nonprofit rather than to him personally. 

Nine jurors will deliver an advisory verdict, a non-binding recommendation, to guide the judge in deciding Musk’s claims against Altman. Musk, Altman, and Brockman will take the stand. Former OpenAI chief scientist Ilya Sutskever, former OpenAI CTO Mira Murati, and Microsoft CEO Satya Nadella are also expected to testify. Cringey texts, raw diary entries, and endless scheming behind the founding and growth of OpenAI are expected to come to light.

In an industry enveloped in secrecy, the trial will be a rare opportunity for the public to look behind the curtain and find out what’s going on in the companies creating the most transformative technology ever built. 

What are they fighting about?

When OpenAI was originally founded as a nonprofit, backed by a $38 million donation from Musk, the company vowed to create open-source technology for the public’s benefit, unconstrained by a need to generate financial returns. But over the years, the company began to claim that intensifying competition could make it dangerous to share how it develops its AI models and that a nonprofit structure could not raise enough money to keep building AI. (MIT Technology Review was first to report on OpenAI’s internal conflicts around its mission.)

The court has already found that in 2017 Altman and Brockman wanted to establish a for-profit arm, while Musk proposed merging OpenAI with his electric-car company, Tesla. When Musk threatened to stop funding, Altman and Brockman told him that they were committed to keeping the company a nonprofit. Musk alleges that they pursued plans to pivot to a for-profit without informing him. According to OpenAI, Musk agreed that the company needed a for-profit entity and even wanted to be its CEO. 

But even if Musk proves he was duped by Altman and Brockman, he may not have standing in the first place to sue them for restructuring the company to operate a for-profit subsidiary. Some legal scholars are puzzled over why the judge allowed him to bring this claim. “The idea that Elon Musk can sue because he was a donor or used to be on the board is pretty puzzling,” says Jill Horwitz, a law professor who studies nonprofit law at Northwestern University. “Typically, it’s up to the attorneys general to bring such a claim to enforce the charitable purposes. And that’s already happened.” 

In October 2025, state attorneys general of California, where OpenAI is headquartered, and Delaware, where OpenAI is incorporated, struck a deal with OpenAI to approve its new corporate structure on a series of conditions. For example, a safety and security committee at the nonprofit would review safety-related decisions made by the for-profit subsidiary. Critics of the restructuring, including Musk, AI safety advocates, and civil society groups, have tried to stop it. 

California’s attorney general has declined to join Musk’s lawsuit, saying that the office did not see how his action serves the public interest.

Still, whether the deal holds OpenAI to its nonprofit mission is an open question. “Elon Musk should have to show … what the deficiencies are in what’s been agreed to by OpenAI with the attorneys general,” says Rose Chan Loui, the director of the UCLA School of Law’s philanthropy and nonprofit program. Even with the terms in place, holding OpenAI to them depends on “how much they can enforce it and how much transparency they get into OpenAI’s work.”

More importantly, legal experts say the case is being considered under the wrong body of law. Musk argues that Altman and Brockman breached OpenAI’s charitable trust by creating a closed-source, for-profit subsidiary. As a result, the court has been analyzing the claim under the law of trusts. “But OpenAI is not a trust. OpenAI is a corporation. And so really they should be looking at … the law of charitable nonprofit organizations,” says Chan Loui.

What’s on the line?

Despite all the legal muddiness, the outcome of the trial could upend the AI race. Any one of the remedies that Musk seeks could cripple OpenAI as it races to go public by the end of the year. OpenAI, which is valued at over $850 billion, has described the litigation with Musk as a potential risk to its business. Musk’s rival company xAI, which makes the chatbot Grok, is expected to go public as a part of his rocket company SpaceX as early as June. If Musk prevails, xAI, which in combination with SpaceX is valued at $1.25 trillion, could get a big advantage in the AI race. 

And the trial has helped expose the bitter schism between Musk and the company he once helped to found. An OpenAI spokesperson referred MIT Technology Review to a post on X: “This lawsuit has always been a baseless and jealous bid to derail a competitor.” Although Musk’s lawyers did not immediately respond to a request for comment, he has posted on X that “Scam Altman lies as easily as he breathes.”  

MIT Technology Review will have ongoing coverage of Musk v. Altman until its conclusion. Follow @techreview or @michelletomkim on X for up-to-the-minute reporting. 

Three reasons why DeepSeek’s new model matters

On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek’s previous models, V4 is open source, meaning it is available for anyone to download, use, and modify.

V4 marks DeepSeek’s most significant release since R1, the reasoning model it launched in January 2025. R1, which was trained on limited computing resources, stunned the global AI industry with its strong performance and efficiency, turning DeepSeek from a little-known research team into China’s best-known AI company almost overnight. It also helped set off a wave of open-weight model releases from other Chinese AI firms. 

DeepSeek has kept a relatively low profile since then—but earlier this month, it effectively teased V4’s release when it added “expert” and “flash” modes to the online version of its model, prompting speculation that the updates were tied to a bigger upcoming release.

While the company has become a powerful symbol of China’s AI ambitions, its big return to cutting-edge frontier models comes after months of scrutiny—including major personnel departures, delays to previous model launches, and growing scrutiny from both the US and Chinese governments. 

So, will V4 shake the AI field the way R1 did? Almost certainly not, but here are three big reasons why this release matters.

1. It breaks new ground for an open-source model.

As with R1 before it, DeepSeek claims that V4’s performance rivals the best models available at a fraction of the price. This is great news for developers and for companies using the tech, because it means they can access frontier AI capabilities on their own terms, and without worrying about skyrocketing costs.

The new model comes in two versions, both of which are available on DeepSeek’s website and in its app, with API access also open to developers. V4-Pro is a larger model built for coding and complex agent tasks, and V4-Flash is a smaller version designed to be faster and cheaper to run. Both versions offer reasoning modes, in which the model can carefully parse a user’s prompt and show each step as it works through the problem.

For V4-Pro, DeepSeek charges $1.74 per million input tokens and $3.48 per million output tokens, a fraction of the cost of comparable models from OpenAI and Anthropic. V4-Flash is even cheaper, at about $0.14 per million input tokens and about $0.28 per million output tokens, making it one of the cheapest top-tier models available. This would make it a very appealing model to build applications on.

In terms of performance, V4 is, perhaps unsurprisingly, a huge jump from R1—and it seems to be a strong alternative to just about all the latest big AI models. On the major benchmarks, according to results shared by the company, DeepSeek V4-Pro competes with leading closed-source models, matching the performance of Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. And compared to other open-source models, such as Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released. 

DeepSeek also says that V4-Pro now ranks among the strongest open-source models on benchmarks for agentic coding tasks and performs well on other tests that measure ability to carry out multistep problems. Its writing ability and world knowledge also lead the field, according to benchmarking results shared by the company. 

In a technical report released alongside the model, DeepSeek shared results from an internal survey of 85 experienced developers: More than 90% included V4-Pro among their top model choices for coding tasks.

DeepSeek says it has specifically optimized V4 for popular agent frameworks such as Claude Code, OpenClaw, and CodeBuddy.

2. It delivers on a new approach to memory efficiency.

One of the key innovations of V4 is its long context window—the amount of text the model can process at once. Both versions can handle 1 million tokens, which is large enough to fit all three volumes of The Lord of the Rings and The Hobbit combined. The company says this context window size is now the default across all DeepSeek services and it matches what is offered by cutting-edge versions of models like Gemini and Claude. 

But it’s important to know not just that DeepSeek has made this leap, but how it did so. V4 makes significant architectural changes to the company’s former models—especially in the attention mechanism, which is the feature of AI models that helps them understand each part of a prompt in relation to the rest. As the prompt text gets longer, these comparisons become much more costly, making attention one of the main bottlenecks for long-context models.

DeepSeek’s innovation was to make the model more selective about what it pays attention to. Instead of treating all earlier text as equally important, V4 compresses older information and focuses on the parts most likely to matter in the present moment, while still keeping nearby text in full so it does not miss important details. 

DeepSeek says this sharply reduces the cost of using long context. In a 1-million-token context, V4-Pro uses only 27% of the computing power required by its previous model, V3.2, while cutting memory use to 10%. The reduction in V4-Flash is even larger, using just 10% of the computing power and 7% of the memory. In practice, this could make it cheaper to build tools that need to work across huge amounts of material, such as an AI coding assistant that can read an entire codebase or a research agent that can analyze a long archive of documents without constantly forgetting what came before.

DeepSeek’s interest in long context windows didn’t start with V4. Over the past year and a half, the company has quietly published a series of papers on how AI models “remember” information, experimenting with compression and mathematical techniques to extend what AI models could realistically handle.

3. It marks the first steps on the hard road away from Nvidia.

V4 is DeepSeek’s first model optimized for domestic Chinese chips, such as Huawei’s Ascend—a move that has turned the launch into something of a test of whether China’s homegrown AI industry can begin to loosen its dependence on US chip giant Nvidia. 

This was largely expected, since The Information reported earlier this month that DeepSeek did not give American chipmakers like Nvidia and AMD early access to V4, though prerelease access is common to allow chipmakers to optimize support of the new model ahead of a launch. Instead, the company reportedly gave early access only to Chinese chipmakers. 

On Friday, Huawei said its Ascend supernode products, based on the Ascend 950 series, would support DeepSeek V4. This means that companies and individuals who want to run their own modified version of Deepseek V4 will be able to use Huawei chips easily.

Reuters previously reported that Chinese government officials recommended that DeepSeek integrate Huawei chips in its training process. And this pressure fits a broader pattern in China’s industrial policy: Strategic sectors are often pushed, and sometimes effectively required, to align with national self-reliance goals. But there’s a particular urgency when it comes to AI. Since 2022, US export controls have cut Chinese firms off from Nvidia’s most powerful chips, and they later also restricted access to downgraded China-market versions. Beijing’s response has been to accelerate the push for a domestic AI stack, from chips to software frameworks to data centers.

Chinese authorities have reportedly been pushing data centers and public computing projects to use more domestic chips, including through reported bans on foreign-made chips, sourcing quotas, and requirements to pair Nvidia chips with Chinese alternatives from companies such as Huawei and Cambricon. 

Still, replacing Nvidia is not as simple as swapping one chip for another. Nvidia’s advantage lies not only in its chips, but in the software ecosystem developers have spent years building around them. Moving to Huawei’s Ascend chips means adapting model code, rebuilding tools, and proving that systems built around those chips are stable enough for serious use.

To be clear, DeepSeek does not appear to have fully moved beyond Nvidia. The company’s technical report reveals that it is using Chinese chips to run the model for inference, or when someone asks the model to complete a task. But Liu Zhiyuan, a computer science professor at Tsinghua University, told MIT Technology Review that DeepSeek appears to have adapted only part of V4’s training process for Chinese chips. The report does not say whether some key long-context features were adapted to domestic chips, so Liu says V4 may still have been trained mainly on Nvidia chips. Multiple sources who spoke on the condition of anonymity, due to political sensitivity around these issues, told MIT Technology Review that Chinese chips still don’t perform as well as Nvidia chips but are better suited for inference than training.

DeepSeek is also tying the future costs of V4 to this hardware shift. The company says V4-Pro prices could fall significantly after Huawei’s Ascend 950 supernodes begin shipping at scale in the second half of this year. 

If that works, V4 could be an early sign that China is successfully building a parallel AI infrastructure.

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey.

But as AI becomes embedded in core workflows, business leaders are discovering that the biggest obstacle is not model performance or computing power but the quality and the context of the data on which those systems rely. AI essentially introduces a new requirement: Systems must not only access data — they must understand the business context behind it. 

Without that context, AI can generate answers quickly but still make the wrong decision, says Irfan Khan, president and chief product officer of SAP Data & Analytics. 

“AI is incredibly good at producing results,” he says. “It moves fast, but without context it can’t exercise good judgment, and good judgment is what creates a return on investment for the business. Speed without judgment doesn’t help. It can actually hurt us.”

In the emerging era of autonomous systems and intelligent applications, that context layer is becoming essential. To provide context, companies need a well-designed data fabric that does more than just integrate data, Khan says. The right data fabric allows organizations to scale AI safely, coordinate decisions across systems and agents, and ensure that automation reflects real business priorities rather than making decisions in isolation. 

Recognizing this, many organizations are rethinking their data architecture. Instead of simply moving data into a single repository, they are looking for ways to connect information across applications, clouds, and operational systems while preserving the semantics that describe how the business works. That shift is driving growing interest in data fabric as a foundation for AI infrastructure.

Losing context is a critical AI problem

Traditional data strategies have largely focused on aggregation. Over the past two decades, organizations have invested heavily in extracting information from operational systems and loading it into centralized warehouses, lakes, and dashboards. This approach makes it easier to run reports, monitor performance, and generate insights across the business, but in the process, much of the meaning attached to that data — how it relates to policies, processes, and real-world decisions — is lost. 

Take two companies using AI to manage supply-chain disruptions. If one uses raw signals such as inventory levels, lead times, and supply scores, while the other adds context across business processes, policies, and metadata, both systems will rapidly analyze the data but likely come up with different conclusions. 

Information such as which customers are strategic accounts, what tradeoffs are acceptable during shortages, and the status of extended supply chains will allow one AI system to make strategic decisions, while the other will not have the proper context, Khan says. 

“Both systems move very quickly, but only one moves in the right direction,” he says. “This is the context premium and the advantage you gain when your data foundation preserves context across processes, policies and data by design.”

In the past, companies implicitly managed a lack of context because human experts provided the missing information, but with AI, there is a shortfall and that creates serious limitations. AI systems do not just display information; they act on it. If a system does not explain why data matters, an AI model may optimize for the wrong outcome. Inventory numbers, payment histories, or demand signals might be accurate, but they do not necessarily reveal which customers must be prioritized, which contractual obligations apply, or which products are strategically important. As a result, the system can produce answers that are technically correct but operationally flawed.

This realization is changing how companies think about AI readiness. Most acknowledge that they do not have the mature data processes and infrastructure in place to trust their data and their AI systems. Only one in five organizations consider their approach to data to be highly mature, and only 9% feel fully prepared to integrate and interoperate with their data systems.

Don’t consolidate, integrate

The emerging solution is a data fabric: An abstraction layer that spans infrastructure, architecture, and logical organization. For agentic AI, the fabric becomes the primary interface, allowing agents to interact with business knowledge rather than raw storage systems. Knowledge graphs play a central role, enabling agents to query enterprise data using natural language and business logic.

The value of the data fabric relies on three components: Intelligent compute to provide speed, a knowledge pool to provide business understanding and context, and agents to provide autonomous action are grounded in that understanding. What makes this powerful is how these capabilities work together, says Khan. 

The technology provides the architecture — a foundation that makes agent-to-agent communication and coordination possible. The process will define how businesses and IT share ownership, and establish governance and a culture in which people trust enough to adopt it. Now all three things must work together for a business data fabric to truly be successful.

“It empowers confident, consistent decisions, and when these elements all come together, AI just doesn’t analyze and interpret the data — it drives smarter, faster decisions that really create business impact,” he says. “This is the promise of a thoughtfully designed business data fabric, where every part reinforces the other, and every insight is grounded in trust and clarity.”

Technically, building a data-fabric layer requires several capabilities. Data must be accessible across multiple environments through federation rather than forced consolidation. A semantic or knowledge layer is needed to harmonize meaning across systems, often supported by knowledge graphs and catalog-driven metadata. Governance and policy enforcement must also operate across the fabric so that AI systems can access data securely and consistently.

Together, these elements create a foundation where AI interacts with business knowledge instead of raw storage systems — an essential step for moving from experimentation to real enterprise automation.

Beyond data isolation and dashboards

In the emerging era of agentic AI, the responsibility for monitoring, analyzing, and making decisions based on data increasingly shifts to software. AI agents can monitor events, trigger workflows, and make decisions in real time, often without direct human intervention. That speed creates new opportunities, but it also raises the stakes. When multiple agents operate across finance, supply chain, procurement, or customer operations, they must be guided by the same understanding of business priorities.

Without a common knowledge layer connecting disparate data together, coordination between systems quickly breaks down. One system might optimize for margin, another for liquidity, and another for compliance, each working from a different slice of data. 

Importantly, most enterprises already possess much of the knowledge needed to make this work, says Khan. Years of operational data, master data, workflows, and policy logic already exist across business applications — companies just need to make it accessible. Companies that deploy data fabrics gain greater trust in their data, with more than two thirds of enterprises seeing improved data accessibility, data visibility, and exerting more control over their data. 

“The opportunity isn’t just inventing context from scratch, it’s activating and connecting the context across your business that already exists,” he continues, adding that a data fabric is the “architecture that ensures data semantics, business processes and policies are connected as a unified system across all the clouds.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Chinese tech workers are starting to train their AI doubles–and pushing back

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters. 

Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with an AI agent, went viral on Chinese social media. Though the project was created as a spoof, it struck a nerve among tech workers, a number of whom told MIT Technology Review that their bosses are encouraging them to document their workflows in order to automate specific tasks and processes using AI agent tools like OpenClaw or Claude Code. 

To set up Colleague Skill, a user names the coworker whose tasks they want to replicate and adds basic profile details. The tool then automatically imports chat history and files from Lark and DingTalk, both popular workplace apps in China, and generates reusable manuals describing that coworker’s duties—and even their unique quirks—for an AI agent to replicate. 

Colleague Skill was created by Tianyi Zhou, who works as an engineer at the Shanghai Artificial Intelligence Laboratory. Earlier this week he told Chinese outlet Southern Metropolis Daily that the project was started as a stunt, prompted by AI-related layoffs and by the growing tendency of companies to ask employees to automate themselves. He didn’t respond to requests for further comment.

Internet users have found humor in the idea behind the tool, joking about automating their coworkers before themselves. However, Colleague Skill’s virality has sparked a lot of debate about workers’ dignity and individuality in the age of AI.

After seeing Colleague Skill on social media, Amber Li, 27, a tech worker in Shanghai, used it to recreate a former coworker as a personal experiment. Within minutes, the tool created a file detailing how that person did their job. “It is surprisingly good,” Li says. “It even captures the person’s little quirks, like how they react and their punctuation habits.” With this skill, Li can use an AI agent as a new “coworker” that helps debug her code and replies instantly. It felt uncanny and uncomfortable, Li says. 

Even so,  replacing coworkers with agents could become a norm. Since OpenClaw became a national craze, bosses in China have been pushing tech workers to experiment with agents. 

Although AI agents can take control of your computer, read and summarize news, reply to emails, and book restaurant reservations for you, tech workers on the ground say their utility has so far proven to be limited in business contexts. Asking employees to make manuals describing the minutiae of their day-to-day jobs the way Colleague Skill does is one way to help bridge that gap. 

Hancheng Cao, an assistant professor at Emory University who studies AI and work, believes that companies have good reasons to push employees to create work blueprints like these, beyond simply following a trend. “Firms gain not only internal experience with the tools, but also richer data on employee know-how, workflows, and decision patterns. That helps companies see which parts of work can be standardized or codified into systems, and which still depend on human judgment,” he says.

To employees, though, making agents or even blueprints for them can feel strange and alienating. One software engineer, who spoke with MIT Technology Review anonymously because of concerns about their job security, trained an AI (not Colleague Skill) on their workflow and found that the process felt reductive—as if their work had been flattened into modules in a way that made them easier to replace. On social media, workers have turned to bleak humor to express similar feelings. In one comment on Rednote, a user wrote that “a cold farewell can be turned into warm tokens,” quipping that if they use Colleague Skill to distill their coworkers into tasks first, they themselves might survive a little longer.

The push for creating agents has also spurred clever countermeasures. Irritated by the idea of reducing a person to a skill, Koki Xu, 26 an AI product manager in Beijing, published an “anti-distillation” skill on GitHub on April 4. The tool, which took Xu about an hour to build, is designed to sabotage the process of creating workflows for agents. Users can choose between light, medium, and heavy sabotage modes depending on how closely their boss is observing the process, and the agent rewrites the material into generic, non-actionable language that would produce a less useful AI stand-in. A video Xu posted about the project went viral, drawing more than 5 million likes across platforms.

Xu told MIT Technology Review that she has been following the Colleague Skill trend from the start and that it has made her think about alienation, disempowerment, and broader implications for labor. “I originally wanted to write an op-ed, but decided it would be more useful to make something that pushes back against it,” she says.

Xu, who has undergraduate and master’s degrees in law, said the trend also raises legal questions. While a company may be able to argue that work chat histories and materials created on a work laptop are corporate property, a skill like this can also capture elements of personality, tone, and judgment, making ownership much less clear. She said she hopes Colleague Skill prompts more discussion about how to protect workers’ dignity and identity in the age of AI. “I believe it’s important to keep up with these trends so we (employees) can participate in shaping how they are used,” she says. Xu herself is an avid AI adopter, with seven OpenClaw agents set up across her personal and work devices.

Li, the tech worker in Shanghai, says her company has not yet found a way to replace actual workers with AI tools, largely because they remain unreliable and require constant supervision. “I don’t feel like my job is immediately at risk,” she says. “But I do feel that my value is being cheapened, and I don’t know what to do about it.”

How robots learn: A brief, contemporary history

Roboticists used to dream big but build small. They’d hope to match or exceed the extraordinary complexity of the human body, and then they’d spend their career refining robotic arms for auto plants. Aim for C-3P0; end up with the Roomba. 

The real ambition for many of these researchers was the robot of science fiction—one that could move through the world, adapt to different environments, and interact safely and helpfully with people. For the socially minded, such a machine could help those with mobility issues, ease loneliness, or do work too dangerous for humans. For the more financially inclined, it would mean a bottomless source of wage-free labor. Either way, a long history of failure left most of Silicon Valley hesitant to bet on helpful robots.

That has changed. The machines are yet unbuilt, but the money is flowing: Companies and investors put $6.1 billion into humanoid robots in 2025 alone, four times what was invested in 2024. 

What happened? A revolution in how machines have learned to interact with the world. 

Imagine you’d like a pair of robot arms installed in your home purely to do one thing: fold clothes. How would it learn to do that? You could start by writing rules. Check the fabric to figure out how much deformation it can tolerate before tearing. Identify a shirt’s collar. Move the gripper to the left sleeve, lift it, and fold it inward by exactly this distance. Repeat for the right sleeve. If the shirt is rotated, turn the plan accordingly. If the sleeve is twisted, correct it. Very quickly the number of rules explodes, but a complete accounting of them could produce reliable results. This was the original craft of robotics: anticipating every possibility and encoding it in advance.

Around 2015, the cutting edge started to do things differently: Build a digital simulation of the robotic arms and the clothes, and give the program a reward signal every time it folds successfully and a ding every time it fails. This way, it gets better by trying all sorts of techniques through trial and error, with millions of iterations—the same way AI got good at playing games.

The arrival of ChatGPT in 2022 catalyzed the current boom. Trained on vast amounts of text, large language models work not through trial and error but by learning to predict what word should come next in a sentence. Similar models adapted to robotics were soon able to absorb pictures, sensor readings, and the position of a robot’s joints and predict the next action the machine should take, issuing dozens of motor commands every second.

This conceptual shift—to reliance on AI models that ingest large amounts of data—seems to work whether that helpful robot is supposed to talk to people, move through an environment, or even do complicated tasks. And it was paired with other ideas about how to accomplish this new way of learning, like deploying robots even if they aren’t yet perfect so they can learn from the environment they’re meant to work in. Today, Silicon Valley roboticists are dreaming big again. Here’s how that happened. 


Jibo

A movable social robot carried out conversations long before the age of LLMs.

An MIT robotics researcher named Cynthia Breazeal introduced an armless, legless, faceless robot called Jibo to the world in 2014. It looked, in fact, like a lamp. Breazeal’s aim was to create a social robot for families, and the idea pulled in $3.7 million in a crowdsourced funding campaign. Early preorders cost $749.

The early Jibo could introduce itself and dance to entertain kids, but that was about it. The vision was always for it to become a sort of embodied assistant that could handle everything from scheduling and emails to telling stories. It earned a number of devoted users, but ultimately the company shut down in 2019.

A crowdfunding campaign started in 2014 and drew 4,800 Jibo preorders.
COURTESY OF MIT MEDIA LAB

In retrospect, one thing that Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and all those technologies at the time relied on heavy scripting. In broad terms, when you spoke to them, software would translate your speech into text, analyze what you wanted, and create a response pulled from preapproved snippets. Those snippets could be charming, but they were also repetitive and simply boringdownright robotic. That was especially a challenge for a robot that was supposed to be social and family oriented. 

What has happened since, of course, is a revolution in how machines can generate language. Voice mode from any leading AI provider is now engaging and impressive, and multiple hardware startups are trying (and failing) to build products that take advantage of it. 

But that comes with a new risk: While scripted conversations can’t really go off the rails, ones generated by AI certainly can. Some popular AI toys have, for example, talked to kids about how to find matches and knives. 


Dactyl

A robot hand trained with simulations tries to model the unpredictability and variation of the real world.

By 2018, every leading robotics lab was trying to scrap the old scripted rules and train robots through trial and error. OpenAI tried to train its robotic hand, Dactyl, virtuallywith digital models of the hand and of the palm-size cubes Dactyl was supposed to manipulate. The cubes had letters and numbers on their faces; the model might set a task like “Rotate the cube so the red side with the letter O faces upward.”

Here’s the problem: A robotic hand might get really good at doing this in its simulated world, but when you take that program and ask it to work on a real version in the real world, the slight differences between the two can cause things to go awry. Colors might be slightly different, or the deformable rubber in the robot’s fingertips could turn out to be stretchier than it was in simulation.

a Dactyl robot hand holds a Rubix cube
Dactyl, part of OpenAI’s first attempt at robotics, was trained in simulation to solve Rubik’s Cubes.
COURTESY OF OPENAI

The solution is called domain randomization. You essentially create millions of simulated worlds that all vary slightly and randomly from one another. In each one the friction might be less, or the lighting more harsh, or the colors darkened. Exposure to enough of this variation means the robots will be better able to manipulate the cube in the real world. The approach worked on Dactyl, and one year later it was able to use the same core techniques to do something harder: solving Rubik’s Cubes (though it worked only 60% of the time, and just 20% when the scrambles were particularly hard). 

Still, the limits of simulation mean that this technique plays a far smaller role today than it did in 2018. OpenAI shuttered its robotics effort in 2021 but has recently started the division up againreportedly focusing on humanoids. 


RT-2

Training on images from across the internet helps robots translate language into action.

Around 2022, Google’s robotics team was up to some strange things. It spent 17 months handing people robot controllers and filming them doing everything from picking up bags of chips to opening jars. The team ended up cataloguing 700 different tasks.

The point was to build and test one of the first large-scale foundation models for robotics. As with large language models, the idea was to input lots of text, tokenize it into a format an algorithm could work with, and then generate an output. Google’s RT-1 received input about what the robot was looking at and how the many parts of the robotic arm were positioned; then it took an instruction and translated it into motor commands to move the robot. When it had seen tasks before, it carried out 97% of them successfully; it succeeded at 76% of the instructions it hadn’t seen before. 

a robot at a table of small toys
The model RT-2, for Robotic Transformer 2, incorporated internet data to help robots process what they were seeing.
COURTESY OF GOOGLE DEEPMIND

The second iteration, RT-2, came out the following year and went even further. Instead of training on data specific to robotics, it went broad: It trained on more general images from across the internet, like the vision-language models lots of researchers were working on at the time. That allowed the robot to interpret where certain objects were in the scene.

“All these other things were unlocked,” says Kanishka Rao, a roboticist at Google DeepMind who led work on both iterations. “We could do things now like ‘Put the Coke can near the picture of Taylor Swift.’” 

In 2025, Google DeepMind further fused the worlds of large language models and robotics, releasing a Gemini Robotics model with improved ability to understand commands in natural language. 


RFM-1

An AI model that allows robotic arms to act like coworkers.

In 2017, before OpenAI shuttered its first robotics team, a group of its engineers spun out a project called Covariant, aiming to build not sci-fi humanoids but the most pragmatic of all robots: an arm that could pick up and move things in warehouses. After building a system based on foundation models similar to Google’s, Covariant deployed this platform in warehouses like those operated by Crate & Barrel and treated it as a data collection pipeline. 

By 2024, Covariant had released a robotics model, RFM-1, that you could interact with like a coworker. If you showed an arm many sleeves of tennis balls, for example, you could then instruct it to move each sleeve to a separate area. And the robot could respondperhaps predicting that it wouldn’t be able to get a good grip on the item and then asking for advice on which particular suction cups it should use. 

This sort of thing had been done in experiments, but Covariant was launching it at significant scale. The company now had cameras and data collection machines in every customer location, feeding back even more data for the model to train on.

a warehouse robot arm lifts object with many suckers to place in a bin
A Covariant robot demonstrates “induction”—the common warehouse task of placing objects on sorters or conveyors.
COURTESY OF COVARIANT

It wasn’t perfect. In a demo in March 2024 with an array of kitchen items, the robot struggled when it was asked to “return the banana” to its original location. It picked up a sponge, then an apple, then a host of other items before it finally accomplished the task. 

It “doesn’t understand the new concept” of retracing its steps, cofounder Peter Chen told me at the time. “But it’s a good exampleit might not work well yet in the places where you don’t have good training data.”

Chen and fellow founder Pieter Abbeel were soon hired by Amazon, which is currently licensing Covariant’s robotics model (Amazon did not respond to questions about how it’s being used, but the company runs an estimated 1,300 warehouses in the US alone). 


Digit

Companies are putting this humanoid to the test in real-world settings.

The new investment dollars flowing to robotics startups are aimed largely at robots shaped not like lamps or arms but like people. Humanoid robots are supposed to be able to seamlessly enter the spaces and jobs where humans currently work, avoiding the need to retool assembly lines to accommodate new shapes such as giant arms. 

It’s easier said than done. In the rare cases where humanoids appear in real warehouses, they’re often confined to test zones and pilot programs. 

Digit humanoid robot putting a plastic bin on a conveyor belt
Amazon and other companies are using Digit to help move shipping totes.
COURTESY OF AGILITY ROBOTICS

That said, Agility’s humanoid Digit appears to be doing some real work. The designwith exposed joints and a distinctly unhuman headis driven more by function than by sci-fi aesthetics. Amazon, Toyota, and GXO (a logistics giant with customers like Apple and Nike) have all deployed itmaking it one of the first examples of a humanoid robot that companies see as providing actual cost savings rather than novelty. Their Digits spend their days picking up, moving, and stacking shipping totes.

The current Digit is still a long way from the humanlike helper Silicon Valley is betting on, though. It can lift only 35 pounds, for exampleand every time Agility makes Digit stronger, its battery gets heavier and it has to recharge more often. And standards organizations say humanoids need stricter safety rules than most industrial robots, because they’re designed to be mobile and spend time in proximity to people. 

But Digit shows that this revolution in robot training isn’t converging on a single method. Agility relies on simulation techniques like those OpenAI used to train its hand, and the company has worked with Google’s Gemini models to help its robots adapt to new environments. That’s where more than a decade of experiments have gotten the industry: Now it’s building big.

Why having “humans in the loop” in an AI war is an illusion

The availability of artificial intelligence for use in warfare is at the center of a legal battle between Anthropic and the Pentagon. This debate has become urgent, with AI playing a bigger role than ever before in the current conflict with Iran. AI is no longer just helping humans analyze intelligence. It is now an active player—generating targets in real time, controlling and coordinating missile interceptions, and guiding lethal swarms of autonomous drones.

Most of the public conversation regarding the use of AI-driven autonomous lethal weapons centers on how much humans should remain “in the loop.” Under the Pentagon’s current guidelines, human oversight supposedly provides accountability, context, and nuance while reducing the risk of hacking.

AI systems are opaque “black boxes”

But the debate over “humans in the loop” is a comforting distraction. The immediate danger is not that machines will act without human oversight; it is that human overseers have no idea what the machines are actually “thinking.” The Pentagon’s guidelines are fundamentally flawed because they rest on the dangerous assumption that humans understand how AI systems work.

Having studied intentions in the human brain for decades and in AI systems more recently, I can attest that state-of-the-art AI systems are essentially “black boxes.” We know the inputs and outputs, but the artificial “brain” processing them remains opaque. Even their creators cannot fully interpret them or understand how they work. And when AIs do provide reasons, they are not always trustworthy.

The illusion of human oversight in autonomous systems

In the debate over human oversight, a fundamental question is going unasked: Can we understand what an AI system intends to do before it acts?

Imagine an autonomous drone tasked with destroying an enemy munitions factory. The automated command and control system determines that the optimal target is a munitions storage building. It reports a 92% probability of mission success because secondary explosions of the munitions in the building will thoroughly destroy the facility. A human operator reviews the legitimate military objective, sees the high success rate, and approves the strike.

But what the operator does not know is that the AI system’s calculation included a hidden factor: Beyond devastating the munitions factory, the secondary explosions would also severely damage a nearby children’s hospital. The emergency response would then focus on the hospital, ensuring the factory burns down. To the AI, maximizing disruption in this way meets its given objective. But to a human, it is potentially committing a war crime by violating the rules regarding civilian life. 

Keeping a human in the loop may not provide the safeguard people imagine, because the human cannot know the AI’s intention before it acts. Advanced AI systems do not simply execute instructions; they interpret them. If operators fail to define their objectives carefully enough—a highly likely scenario in high-pressure situations—the “black box” system could be doing exactly what it was told and still not acting as humans intended.

This “intention gap” between AI systems and human operators is precisely why we hesitate to deploy frontier black-box AI in civilian health care or air traffic control, and why its integration into the workplace remains fraught—yet we are rushing to deploy it on the battlefield.

To make matters worse, if one side in a conflict deploys fully autonomous weapons, which operate at machine speed and scale, the pressure to remain competitive would push the other side to rely on such weapons too. This means the use of increasingly autonomous—and opaque—AI decision-making in war is only likely to grow.

The solution: Advance the science of AI intentions

The science of AI must comprise both building highly capable AI technology and understanding how this technology works. Huge advances have been made in developing and building more capable models, driven by record investments—forecast by Gartner to grow to around $2.5 trillion in 2026 alone. In contrast, the investment in understanding how the technology works has been minuscule.

We need a massive paradigm shift. Engineers are building increasingly capable systems. But understanding how these systems work is not just an engineering problem—it requires an interdisciplinary effort. We must build the tools to characterize, measure, and intervene in the intentions of AI agents before they act. We need to map the internal pathways of the neural networks that drive these agents so that we can build a true causal understanding of their decision-making, moving beyond merely observing inputs and outputs. 

A promising way forward is to combine techniques from mechanistic interpretability (breaking neural networks down into human-understandable components) with insights, tools, and models from the neuroscience of intentions. Another idea is to develop transparent, interpretable “auditor” AIs designed to monitor the behavior and emergent goals of more capable black-box systems in real time.  

Developing a better understanding of how AI functions will enable us to rely on AI systems for mission-critical applications. It will also make it easier to build more efficient, more capable, and safer systems.

Colleagues and I are exploring how ideas from neuroscience, cognitive science, and philosophy—fields that study how intentions arise in human decision-making—might help us understand the intentions of artificial systems. We must prioritize these kinds of interdisciplinary efforts, including collaborations between academia, government, and industry.

However, we need more than just academic exploration. The tech industry—and the philanthropists funding AI alignment, which strives to encode human values and goals into these models—must direct substantial investments toward interdisciplinary interpretability research. Furthermore, as the Pentagon pursues increasingly autonomous systems, Congress must mandate rigorous testing of AI systems’ intentions, not just their performance.

Until we achieve that, human oversight over AI may be more illusion than safeguard.

Uri Maoz is a cognitive and computational neuroscientist specializing in how the brain transforms intentions into actions. A professor at Chapman University with appointments at UCLA and Caltech, he leads an interdisciplinary initiative focused on understanding and measuring intentions in artificial intelligence systems (ai-intentions.org).

Making AI operational in constrained public sector environments

The AI boom has hit across industries, and public sector organizations are facing pressure to accelerate adoption. At the same time, government institutions face distinct constraints around security, governance, and operations that set them apart from their business counterparts. For this reason, purpose-built small language models (SLMs) offer a promising path to operationalize AI in these environments.  

A Capgemini study found that 79 percent of public sector executives globally are wary about AI’s data security, an understandable figure given the heightened sensitivity of government data and the legal obligations surrounding its use. As Han Xiao, vice president of AI at Elastic, says, “Government agencies must be very restricted about what kind of data they send to the network. This sets a lot of boundaries on how they think about and manage their data.”

The fundamental need for control over sensitive information is one of many factors complicating AI deployment, particularly when compared against the private sector’s standard operational assumptions.

Unique operational challenges

When private-sector entities expand AI, they typically assume certain conditions will be in place, including continuous connectivity to the cloud, reliance on centralized infrastructure, acceptance of incomplete model transparency, and limited restrictions on data movement. For many state institutions, however, accepting these conditions could be anything from dangerous to impossible. 

Government agencies must ensure that their data stays under their control, that information can be checked and verified, and that operational disruptions are kept to an absolute minimum. At the same time, they often have to run their systems in environments where internet connectivity is limited, unreliable, or unavailable. These complexities prevent many promising public sector AI pilots from moving beyond experimentation. “Many people undervalue the operating challenge of AI,” Xiao says. “The public sector needs AI to perform reliably on all kinds of data, and then to be able to grow without breaking. Continuity of operations is often underestimated.” An Elastic survey of public sector leaders found that 65 percent struggle to use data continuously in real time and at scale. 

Infrastructure constraints compound the problem. Government organizations may also struggle to obtain the graphics processing units (GPUs) used to train and access complex AI models. As Xiao points out, “Government doesn’t often purchase GPUs, unlike the private sector—they’re not used to managing GPU infrastructure. So accessing a GPU to run the model is a bottleneck for much of the public sector.” 

A smaller, more practical model

The many nonnegotiable requirements in the public sector make large language models (LLMs) untenable. But SLMs can be housed locally, offering greater security and control. SLMs are specialized AI models that typically use billions rather than hundreds of billions of parameters, making them far less computationally demanding than the largest LLMs.

The public sector does not need to build ever-larger models housed in offsite, centralized locations. An empirical study found that SLMs performed as well or better than LLMs. SLMs allow sensitive information to be used effectively and efficiently while avoiding the operational complexity of maintaining large models. Xiao puts it this way: “It is easy to use ChatGPT to do proofreading. It’s very difficult to run your own large language models just as smoothly in an environment with no network access.” 

SLMs are purpose-built for the needs of the department or agency that will use them. The data is stored securely outside the model, and is only accessed when queried. Carefully engineered prompts ensure that only the most relevant information is retrieved, providing more accurate responses. Using methods such as smart retrieval, vector search, and verifiable source grounding, AI systems can be built that cater to public sector needs. 

Thus, the next phase of AI adoption in the public sector may be to bring the AI tool to the data, rather than sending the data out into the cloud. Gartner predicts that by 2027, small, specialized AI models will be used three times more than LLMs.

Superior search capabilities

“When people in the public sector hear AI, they probably think about ChatGPT. But we can be much more ambitious,” says Xiao. “AI can revolutionize how the government searches and manages the large amounts of data they have.”

Looking beyond chatbots reveals one of AI’s most immediate opportunities: dramatically improved search. Like many organizations, the public sector has mountains of unstructured data—including technical reports, procurement documents, minutes, and invoices. Today’s AI, however, can deliver results sourced from mixed media, like readable PDFs, scans, images, spreadsheets, and recordings, and in multiple languages. All of this can be indexed by SLM-powered systems to provide tailored responses and to draft complex texts in any language, while ensuring outputs are legally compliant. “The public sector has a lot of data, and they don’t always know how to use this data. They don’t know what the possibilities are,” says Xiao.

Even more powerful, AI can help government employees interpret the data they access. “Today’s AI can provide you with a completely new view of how to harness that data,” says Xiao. A well-trained SLM can interpret legal norms, extract insights from public consultations, support data-driven executive decision-making, and improve public access to services and administrative information. This can contribute to dramatic improvements in how the public sector conducts its operations.

The small-language promise

Focusing on SLMs shifts the conversation from how comprehensive the model can be to how efficient it is. LLMs incur significant performance and computational costs and require specialized hardware that many public entities cannot afford. Despite requiring some capital expenses, SLMs are less resource-intensive than LLMs, so they tend to be cheaper and reduce environmental impact. 

Public sector agencies often face stringent audit requirements, and SLM algorithms can be documented and certified as transparent. Some countries, particularly in Europe, also have privacy regulations such as GDPR that SLMs can be designed to meet.

Tailored training data produces more targeted results, reducing errors, bias, and hallucinations that AI is prone to. As Xiao puts it, “Large language models generate text based on what they were trained on, so there is a cut-off date when they were trained. If you ask about anything after that, it will hallucinate. We can solve this by forcing the model to work from verified sources.”

Risks are also minimized by keeping data on local servers, or even on a specific device. This isn’t about isolation but about strategic autonomy to enable trust, resilience, and relevance.

By prioritizing task-specific models designed for environments that process data locally, and by continuously monitoring performance and impact, public sector organizations can build lasting AI capabilities that support real-world decisions. “Do not start with a chatbot; start with search,” Xiao advises. “Much of what we think of as AI intelligence is really about finding the right information.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Treating enterprise AI as an operating layer

There’s a fault line running through enterprise AI, and it’s not the one getting the most attention. The public conversation still tracks foundation models and benchmarks—GPT versus Gemini, reasoning scores, and marginal capability gains. But in practice, the more durable advantage is structural: who owns the operating layer where intelligence is applied, governed, and improved. One model treats AI as an on-demand utility; the other embeds it as an operating layer—the combination of operation software, data capture, feedback loops and governance that sits between models and real work—that compounds with use.

Model providers like OpenAI and Anthropic sell intelligence as a service: you have a problem, you call an API, you get an answer. That intelligence is general-purpose, largely stateless, and only loosely connected to the day-to-day operations where decisions are made. It’s highly capable and increasingly interchangeable. The distinction that matters is whether intelligence resets on every prompt or accumulates over time.

Incumbent organizations, by contrast, can treat AI as an operating layer: instrumentation across operations, feedback loops from human decisions, and governance that turns individual tasks into reusable policy. In that setup, every exception, correction, and approval becomes a chance to learn—and intelligence can improve as the platform absorbs more of the organization’s work. The organizations most likely to shape the enterprise AI era are those that can embed intelligence directly into operational platforms and instrument those platforms so work generates usable signals.

The prevailing narrative says nimble startups will out-innovate incumbents by building AI-native from scratch. If AI is primarily a model problem, that story holds. But in many enterprise domains, AI is a systems problem—integrations, permissions, evaluation, and change management—where advantage accrues to whomever already sits inside high-volume, high-stakes operations and converts that position into learning and automation.

The inversion: AI executes, humans adjudicate

Traditional services organizations are built on a simple architecture: humans use software to do expert work. Operators log into systems, navigate operations, make decisions, and process cases. Technology is the medium. Human judgment is the product.

An AI-native platform inverts this. It ingests a problem, applies accumulated domain knowledge, executes autonomously what it can with high confidence, and routes targeted sub-tasks to human experts when the situation demands judgment that the system can’t yet reliably provide.

But inverting human-AI interaction isn’t just a UI redesign—it requires raw material. It’s only possible when the platform is built on a foundation of domain expertise, behavioral data, and operational knowledge accumulated over years.

The three compounding assets incumbents already own

AI-native startups begin with a clean architectural slate and can move quickly. What they can’t easily manufacture is the raw material that makes domain AI defensible at scale:

  • Proprietary operational data
  • A large workforce of domain experts whose day-to-day decisions generate training signals
  • Accumulated tacit knowledge about how complex work actually gets done

Services companies already have all three. But these ingredients aren’t moats on their own. They become an advantage only when a company can systematically convert messy operations into AI-ready signals and institutional knowledge—then feed the results back into operations so the system keeps improving.

Codifying expertise into reusable signals

In most services organizations, expertise is tacit and perishable. The best operators know things they cannot easily articulate: heuristics developed over the years, edge-case intuitions, and pattern recognition that operate below the level of conscious reasoning.

At Ensemble, the strategy for addressing this challenge is knowledge distillation. The systematic conversion of expert judgment and operational decisions into machine-readable training signals.

In health-care revenue cycle management, for example, systems can be seeded with explicit domain knowledge and then deepen their coverage through structured daily interaction with operators. In Ensemble’s implementation, the system identifies gaps, formulates targeted questions, and cross-checks answers across multiple experts to capture both consensus and edge-case nuance. It then synthesizes these inputs into a living knowledge base that reflects the situational reasoning behind expert-level performance.

Turning decisions into a learning flywheel

Once a system is constrained enough to be trusted, the next question is how it gets better without waiting for annual model upgrades. Every time a skilled operator makes a decision, they generate more than a completed task. They generate a potential labeled example—context paired with an expert action (and sometimes an outcome). At scale, across thousands of operators and millions of decisions, that stream can power supervised learning, evaluation, and targeted forms of reinforcement—teaching systems to behave more like experts in real conditions.

For example, if an organization processes 50,000 cases a week and captures just three high-quality decision points per case, that’s 150,000 labeled examples every week without creating a separate data-collection program.

A more advanced human-in-the-loop design places experts inside the decision process, so systems learn not just what the right answer was, but how ambiguity gets resolved. Practically, humans intervene at branch points—selecting from AI-generated options, correcting assumptions, and redirecting operations. Each intervention becomes a high-value training signal. When the platform detects an edge case or a deviation from the expected process, it can prompt for a brief, structured rationale, capturing decision factors without requiring lengthy free-form reasoning logs.

Building toward expertise amplification

The goal is to permanently embed the accumulated expertise of thousands of domain experts—their knowledge, decisions, and reasoning—into an AI platform that amplifies what every operator can accomplish. Done well, this produces a quality of execution that neither humans nor AI achieve independently: higher consistency, improved throughput, and measurable operational gains. Operators can focus on more consequential work, supported by an AI that has already completed the analytical groundwork across thousands of analogous prior cases.

The broader implication for enterprise leaders is straightforward. Advantages in AI won’t be determined by access to general-purpose models alone. It will come from an organization’s ability to capture, refine, and compound what it knows, its data, decisions, and operational judgment, while building the controls required for high-stakes environments. As AI shifts from experimentation to infrastructure, the most durable edge may belong to the companies that understand the work well enough to instrument it and can turn that understanding into systems that improve with use.

This content was produced by Ensemble. It was not written by MIT Technology Review’s editorial staff.

Building trust in the AI era with privacy-led UX

The practice of privacy-led user experience (UX) is a design philosophy that treats transparency around data collection and usage as an integral part of the customer relationship. An undertapped opportunity in digital marketing, privacy-led UX treats user consent not as a tick-box compliance exercise, but rather as the first overture in an ongoing customer relationship. For the companies that get it right, the payoff can bring something more intangible, valuable, and durable than simple consent rates: consumer trust.

The opportunities of privacy-led UX have only recently come into focus. Adelina Peltea, the chief marketing officer at Usercentrics, has seen enterprise sentiment shift: “Even just a few years ago, this space was viewed more as a trade-off between growth and compliance,” she says. “But as the market has matured, there’s been a greater focus on how to tie well-designed privacy experiences to business growth.”

And it turns out that well-designed, value-forward consent experiences routinely outperform initial estimates.
Touchpoints for privacy-led UX often include consent management platforms, terms and conditions, privacy policies, data subject access request (DSAR) tools, and, increasingly, AI data use disclosures.

This report examines how data transparency builds trust with customers; how this, in turn, can support business performance; and how organizations can maintain this trust even as AI systems add complexity to consent processes.

Key findings include the following:

  • Privacy is evolving from a one-time consent transaction into an ongoing data relationship. Rather than asking users for broad permissions up front, leading organizations are introducing data-sharing decisions gradually, matching the depth of the ask to the stage of the customer relationship. Companies that take this tack tend to gather both a larger quantity and higher quality of consumer data, the value of which often compounds over time.
  • Privacy-led UX is a prerequisite for AI growth. The consumer data that organizations gather is rapidly becoming a core foundation upon which AI-powered personalization is built. Organizations that establish clear, enforceable privacy and data transparency policies now are better positioned to deploy AI responsibly and at scale in the future. This starts with correctly configured consent mode across ad platforms.
  • Agentic AI introduces new levels of both complexity and opportunity. As AI systems begin acting on users’ behalf, the traditional consent moment may never occur. Governing agent-generated data flows requires privacy infrastructure that goes well beyond the cookie banner.
  • Realizing the advantages of privacy-led UX requires cross-functional collaboration and clear leadership. Privacy-led UX touches marketing, product, legal, and data teams—but someone must own the strategy and weave the threads together. Chief marketing officers
  • (CMOs) are often best positioned for that role, given their visibility across brand, data, and customer experience.
  • A practical framework can support businesses in getting it right. Organizations must define their data collection and usage strategies and ensure their UX incorporates data consent, including a focus on banner design. Following a blueprint for evaluating and improving privacy-led UX supports consistency at every consent touchpoint.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Coming soon: 10 Things That Matter in AI Right Now

Each year we compile our 10 Breakthrough Technologies list, featuring our educated predictions for which technologies will have the biggest impact on how we live and work.

This year, however, we had a dilemma. While our final picks encompass all our core coverage areas (energy, AI, and biotech, plus a few more), our 2026 list was harder to wrangle than normal. Why? We had so many worthy AI candidates we couldn’t fit them all in! (The ones that made it were AI companions, mechanistic interpretability, generative coding, and hyperscale data centers.) Many great ideas fell by the wayside to keep the list as wide-ranging as possible.

Well, that got us thinking: What if we made an entirely new list that was all about AI? We got excited about that idea—and before we knew it we had the beginnings of what we’re calling 10 Things That Matter in AI Right Now. It’s an entirely new annual list that we’re proud to be publishing for the first time on April 21, 2026. We’ll unveil it on stage for attendees at our signature AI conference, EmTech AI, held on MIT’s campus (it’s not too late to get tickets), and then publish the list online later that day.

The process for coming up with the list was similar to the way we pick our 10 Breakthrough Technologies. We petitioned our AI team of reporters and editors to propose ideas, put them all in a document, and engaged in some robust discussion. Eventually, we voted for our favorites and whittled the long list down to a final 10.

But there’s a slight difference between this list and our 10 Breakthrough Technologies. AI is already such a big part of our lives that we didn’t want to restrict ourselves to nominating only technologies. Instead, we wanted to put together a definitive annual list that highlights what we believe are the biggest ideas, topics, and research directions in AI right now. So yes, it will include cutting-edge AI technologies, but it will also feature other trends and developments in AI that we want to bring to our subscribers’ attention.

Think of it as a sneak peek inside the collective brain of our crack AI reporting team: These are the things that our reporters will be watching this year. We intend to follow the items on this list really closely, and you will see it reflected in the news and feature stories we publish in 2026.

For us, 10 Things That Matter in AI Right Now is a guide to how we view the current AI landscape. It will be a source of discussion, debate, and maybe some arguments! We are so excited to share it with you on April 21. If you want to be among the first to see it—join us at EmTech AI or become a subscriber to livestream the announcement.