OpenAI’s “compromise” with the Pentagon is what Anthropic feared

On February 28, OpenAI announced it had reached a deal that will allow the US military to use its technologies in classified settings. CEO Sam Altman said the negotiations, which the company began pursuing only after the Pentagon’s public reprimand of Anthropic, were “definitely rushed.”

In its announcements, OpenAI took great pains to say that it had not caved to allow the Pentagon to do whatever it wanted with its technology. The company published a blog post explaining that its agreement protected against use for autonomous weapons and mass domestic surveillance, and Altman said the company did not simply accept the same terms that Anthropic refused. 

You could read this to say that OpenAI won both the contract and the moral high ground, but reading between the lines and the legalese makes something else clear: Anthropic pursued a moral approach that won it many supporters but failed, while OpenAI pursued a pragmatic and legal approach that is ultimately softer on the Pentagon. 

It’s not yet clear if OpenAI can build in the safety precautions it promises as the military rushes out a politicized AI strategy during strikes on Iran, or if the deal will be seen as good enough by employees who wanted the company to take a harder line. Walking that tightrope will be tricky. (OpenAI did not immediately respond to requests for additional information about its agreement.)

But the devil is also in the details. The reason OpenAI was able to make a deal when Anthropic could not was less about boundaries, Altman said, but about approach. “Anthropic seemed more focused on specific prohibitions in the contract, rather than citing applicable laws, which we felt comfortable with,” he wrote

OpenAI says one basis for its willingness to work with the Pentagon is simply an assumption that the government won’t break the law. The company, which has shared a limited excerpt of its contract, cites a number of laws and policies related to autonomous weapons and surveillance. They are as specific as a 2023 directive from the Pentagon on autonomous weapons (which does not prohibit them but issues guidelines for their design and testing) and as broad as the Fourth Amendment, which has supported protections for Americans against mass surveillance. 

However, the published excerpt “does not give OpenAI an Anthropic-style, free-standing right to prohibit otherwise-lawful government use,” wrote Jessica Tillipman, associate dean for government procurement law studies at George Washington University’s law school. It simply states that the Pentagon can’t use OpenAI’s tech to break any of those laws and policies as they’re stated today.

The whole reason Anthropic earned so many supporters in its fight—including some of OpenAI’s own employees—is that they don’t believe these rules are good enough to prevent the creation of AI-enabled autonomous weapons or mass surveillance. And an assumption that federal agencies won’t break the law is little assurance to anyone who remembers that the surveillance practices exposed by Edward Snowden had been deemed legal by internal agencies and were ruled unlawful only after drawn-out battles (not to mention the many surveillance tactics allowed under current law that AI could expand). On this front, we’ve essentially ended up back where we started: allowing the Pentagon to use its AI for any lawful use. 

OpenAI could say, as its head of national security partnerships wrote yesterday, that if you believe the government won’t follow the law, then you should also not be confident it would honor the red lines that Anthropic was proposing. But that’s not an argument against setting them. Imperfect enforcement doesn’t make constraints meaningless, and contract terms still shape behavior, oversight, and political consequences.

OpenAI claims a second line of defense. The company says it maintains control over the safety rules governing its models and will not give the military a version of its AI stripped of those safety controls. “We can embed our red lines—no mass surveillance and no directing weapons systems without human involvement—directly into model behavior,” wrote Boaz Barak, an OpenAI employee Altman deputized to speak on the issue about X. 

But the company doesn’t specify how its safety rules for the military differ from its rules for normal users. Enforcement is also never perfect, and it is especially unlikely to be when OpenAI is rolling out these protections in a classified setting for the first time and is expected to do so in just six months.

There’s another question beneath all this: Should it be down to tech companies to prohibit things that are legal but that they find morally objectionable? The government certainly viewed Anthropic’s willingness to play this role as unacceptable. On Friday evening, eight hours before the US launched strikes in Tehran, Defense Secretary Pete Hegseth issued harsh remarks on X. “Anthropic delivered a master class in arrogance and betrayal,” he wrote, and echoed President Trump’s order for the government to cease working with the AI company after Anthropic sought to keep its model Claude from being used for autonomous weapons or mass domestic surveillance. “The Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose,” Hegseth wrote.

But unless OpenAI’s full contract will reveal more, it’s hard not to see the company as sitting on an ideological seesaw, promising that it does have leverage it will proudly use to do what it sees as the right thing while deferring to the law as the main backstop for what the Pentagon can do with its tech.

There are three things to be watching here. One is whether this position will be good enough for OpenAI’s most critical employees. With AI companies spending so heavily on talent, it’s possible that some at OpenAI see in Altman’s justification an unforgivable compromise.

Second, there is the scorched-earth campaign that Hegseth has promised to wage against Anthropic. Going far beyond simply canceling the government’s contract with the company, he announced that it would be classified as a supply chain risk, and that “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic.” There is significant debate about whether this death blow is legally possible, and Anthropic has said it will sue if the threat is pursued. OpenAI has also come out against the move.

Lastly, how will the Pentagon swap out Claude—the only AI model it actively uses in classified operations, including some in Venezuela—while it escalates strikes against Iran? Hegseth granted the agency six months to do so, during which the military will phase in OpenAI’s models as well as those from Elon Musk’s xAI.

But Claude was reportedly used in the strikes on Iran hours after the ban was issued, suggesting that a phase-out will be anything but simple. Even if the months-long feud between Anthropic and the Pentagon is over (which I doubt it is), we are now seeing the Pentagon’s AI acceleration plan put pressure on companies to relinquish lines in the sand they had once drawn, with new tensions in the Middle East as the primary testing ground.

If you have information to share about how this is unfolding, reach out to me via Signal (username: jamesodonnell.22).

The human work behind humanoid robots is being hidden

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In January, Nvidia’s Jensen Huang, the head of the world’s most valuable company, proclaimed that we are entering the era of physical AI, when artificial intelligence will move beyond language and chatbots into physically capable machines. (He also said the same thing the year before, by the way.)

The implication—fueled by new demonstrations of humanoid robots putting away dishes or assembling cars—is that mimicking human limbs with single-purpose robot arms is the old way of automation. The new way is to replicate the way humans think, learn, and adapt while they work. The problem is that the lack of transparency about the human labor involved in training and operating such robots leaves the public both misunderstanding what robots can actually do and failing to see the strange new forms of work forming around them.

Consider how, in the AI era, robots often learn from humans who demonstrate how to do a chore. Creating this data at scale is now leading to Black Mirror–esque scenarios. A worker in Shanghai, for example, recently spent a week wearing a virtual-reality headset and an exoskeleton while opening and closing the door of a microwave hundreds of times a day to train the robot next to him, Rest of World reported. In North America, the robotics company Figure appears to be planning something similar: It announced in September it would partner with the investment firm Brookfield, which manages 100,000 residential units, to capture “massive amounts” of real-world data “across a variety of household environments.” (Figure did not respond to questions about this effort.)

Just as our words became training data for large language models, our movements are now poised to follow the same path. Except this future might leave humans with an even worse deal, and it’s already beginning. The roboticist Aaron Prather told me about recent work with a delivery company that had its workers wear movement-tracking sensors as they moved boxes; the data collected will be used to train robots. The effort to build humanoids will likely require manual laborers to act as data collectors at massive scale. “It’s going to be weird,” Prather says. “No doubts about it.” 

Or consider tele-operation. Though the endgame in robotics is a machine that can complete a task on its own, robotics companies employ people to operate their robots remotely. Neo, a $20,000 humanoid robot from the startup 1X, is set to ship to homes this year, but the company’s founder, Bernt Øivind Børnich, told me recently that he’s not committed to any prescribed level of autonomy. If a robot gets stuck, or if the customer wants it to do a tricky task, a tele-operator from the company’s headquarters in Palo Alto, California, will pilot it, looking through its cameras to iron clothes or unload the dishwasher.

This isn’t inherently harmful—1X gets customer consent before switching into tele-operation mode—but privacy as we know it will not exist in a world where tele-operators are doing chores in your house through a robot. And if home humanoids are not genuinely autonomous, the arrangement is better understood as a form of wage arbitrage that re-creates the dynamics of gig work while, for the first time, allowing physical tasks to be performed wherever labor is cheapest.

We’ve been down similar roads before. Carrying out “AI-driven” content moderation on social media platforms or assembling training data for AI companies often requires workers in low-wage countries to view disturbing content. And despite claims that AI will soon enough train on its outputs and learn on its own, even the best models require an awful lot of human feedback to work as desired.

These human workforces do not mean that AI is just vaporware. But when they remain invisible, the public consistently overestimates the machines’ actual capabilities.

That’s great for investors and hype, but it has consequences for everyone. When Tesla marketed its driver-assistance software as “Autopilot,” for example, it inflated public expectations about what the system could safely do—a distortion a Miami jury recently found contributed to a crash that killed a 22-year-old woman (Tesla was ordered to pay $240 million in damages). 

The same will be true for humanoid robots. If Huang is right, and physical AI is coming for our workplaces, homes, and public spaces, then the way we describe and scrutinize such technology matters. Yet robotics companies remain as opaque about training and tele-operation as AI firms are about their training data. If that does not change, we risk mistaking concealed human labor for machine intelligence—and seeing far more autonomy than truly exists.

Why the Moltbook frenzy was like Pokémon

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Lots of influential people in tech last week were describing Moltbook, an online hangout populated by AI agents interacting with one another, as a glimpse into the future. It appeared to show AI systems doing useful things for the humans that created them (one person used the platform to help him negotiate a deal on a new car). Sure, it was flooded with crypto scams, and many of the posts were actually written by people, but something about it pointed to a future of helpful AI, right?

The whole experiment reminded our senior editor for AI, Will Douglas Heaven, of something far less interesting: Pokémon.

Back in 2014, someone set up a game of Pokémon in which the main character could be controlled by anyone on the internet via the streaming platform Twitch. Playing was as clunky as it sounds, but it was incredibly popular: at one point, a million people were playing the game at the same time.

“It was yet another weird online social experiment that got picked up by the mainstream media: What did this mean for the future?” Will says. “Not a lot, it turned out.”

The frenzy about Moltbook struck a similar tone to Will, and it turned out that one of the sources he spoke to had been thinking about Pokémon too. Jason Schloetzer, at the Georgetown Psaros Center for Financial Markets and Policy, saw the whole thing as a sort of Pokémon battle for AI enthusiasts, in which they created AI agents and deployed them to interact with other agents. In this light, the news that many AI agents were actually being instructed by people to say certain things that made them sound sentient or intelligent makes a whole lot more sense. 

“It’s basically a spectator sport,” he told Will, “but for language models.”

Will wrote an excellent piece about why Moltbook was not the glimpse into the future that it was said to be. Even if you are excited about a future of agentic AI, he points out, there are some key pieces that Moltbook made clear are still missing. It was a forum of chaos, but a genuinely helpful hive mind would require more coordination, shared objectives, and shared memory.

“More than anything else, I think Moltbook was the internet having fun,” Will says. “The biggest question that now leaves me with is: How far will people push AI just for the laughs?”

Read the whole story.

What we’ve been getting wrong about AI’s truth crisis

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

What would it take to convince you that the era of truth decay we were long warned about—where AI content dupes us, shapes our beliefs even when we catch the lie, and erodes societal trust in the process—is now here? A story I published last week pushed me over the edge. It also made me realize that the tools we were sold as a cure for this crisis are failing miserably. 

On Thursday, I reported the first confirmation that the US Department of Homeland Security, which houses immigration agencies, is using AI video generators from Google and Adobe to make content that it shares with the public. The news comes as immigration agencies have flooded social media with content to support President Trump’s mass deportation agenda—some of which appears to be made with AI (like a video about “Christmas after mass deportations”).

But I received two types of reactions from readers that may explain just as much about the epistemic crisis we’re in. 

One was from people who weren’t surprised, because on January 22 the White House had posted a digitally altered photo of a woman arrested at an ICE protest, one that made her appear hysterical and in tears. Kaelan Dorr, the White House’s deputy communications director, did not respond to questions about whether the White House altered the photo but wrote, “The memes will continue.”

The second was from readers who saw no point in reporting that DHS was using AI to edit content shared with the public, because news outlets were apparently doing the same. They pointed to the fact that the news network MS Now (formerly MSNBC) shared an image of Alex Pretti that was AI-edited and appeared to make him look more handsome, a fact that led to many viral clips this week, including one from Joe Rogan’s podcast. Fight fire with fire, in other words? A spokesperson for MS Now told Snopes that the news outlet aired the image without knowing it was edited.

There is no reason to collapse these two cases of altered content into the same category, or to read them as evidence that truth no longer matters. One involved the US government sharing a clearly altered photo with the public and declining to answer whether it was intentionally manipulated; the other involved a news outlet airing a photo it should have known was altered but taking some steps to disclose the mistake.

What these reactions reveal instead is a flaw in how we were collectively preparing for this moment. Warnings about the AI truth crisis revolved around a core thesis: that not being able to tell what is real will destroy us, so we need tools to independently verify the truth. My two grim takeaways are that these tools are failing, and that while vetting the truth remains essential, it is no longer capable on its own of producing the societal trust we were promised.

For example, there was plenty of hype in 2024 about the Content Authenticity Initiative, cofounded by Adobe and adopted by major tech companies, which would attach labels to content disclosing when it was made, by whom, and whether AI was involved. But Adobe applies automatic labels only when the content is wholly AI-generated. Otherwise the labels are opt-in on the part of the creator.

And platforms like X, where the altered arrest photo was posted, can strip content of such labels anyway (a note that the photo was altered was added by users). Platforms can also simply not choose to show the label; indeed, when Adobe launched the initiative, it noted that the Pentagon’s website for sharing official images, DVIDS, would display the labels to prove authenticity, but a review of the website today shows no such labels.

Noticing how much traction the White House’s photo got even after it was shown to be AI-altered, I was struck by the findings of a very relevant new paper published in the journal Communications Psychology. In the study, participants watched a deepfake “confession” to a crime, and the researchers found that even when they were told explicitly that the evidence was fake, participants relied on it when judging an individual’s guilt. In other words, even when people learn that the content they’re looking at is entirely fake, they remain emotionally swayed by it. 

“Transparency helps, but it isn’t enough on its own,” the disinformation expert Christopher Nehring wrote recently about the study’s findings. “We have to develop a new masterplan of what to do about deepfakes.”

AI tools to generate and edit content are getting more advanced, easier to operate, and cheaper to run—all reasons why the US government is increasingly paying to use them. We were well warned of this, but we responded by preparing for a world in which the main danger was confusion. What we’re entering instead is a world in which influence survives exposure, doubt is easily weaponized, and establishing the truth does not serve as a reset button. And the defenders of truth are already trailing way behind.

Update: This story was updated on February 2 with details about how Adobe applies its content authenticity labels.

Why chatbots are starting to check your age

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How do tech companies check if their users are kids?

This question has taken on new urgency recently thanks to growing concern about the dangers that can arise when children talk to AI chatbots. For years Big Tech asked for birthdays (that one could make up) to avoid violating child privacy laws, but they weren’t required to moderate content accordingly. Two developments over the last week show how quickly things are changing in the US and how this issue is becoming a new battleground, even among parents and child-safety advocates.

In one corner is the Republican Party, which has supported laws passed in several states that require sites with adult content to verify users’ ages. Critics say this provides cover to block anything deemed “harmful to minors,” which could include sex education. Other states, like California, are coming after AI companies with laws to protect kids who talk to chatbots (by requiring them to verify who’s a kid). Meanwhile, President Trump is attempting to keep AI regulation a national issue rather than allowing states to make their own rules. Support for various bills in Congress is constantly in flux.

So what might happen? The debate is quickly moving away from whether age verification is necessary and toward who will be responsible for it. This responsibility is a hot potato that no company wants to hold.

In a blog post last Tuesday, OpenAI revealed that it plans to roll out automatic age prediction. In short, the company will apply a model that uses factors like the time of day, among others, to predict whether a person chatting is under 18. For those identified as teens or children, ChatGPT will apply filters to “reduce exposure” to content like graphic violence or sexual role-play. YouTube launched something similar last year. 

If you support age verification but are concerned about privacy, this might sound like a win. But there’s a catch. The system is not perfect, of course, so it could classify a child as an adult or vice versa. People who are wrongly labeled under 18 can verify their identity by submitting a selfie or government ID to a company called Persona. 

Selfie verifications have issues: They fail more often for people of color and those with certain disabilities. Sameer Hinduja, who co-directs the Cyberbullying Research Center, says the fact that Persona will need to hold millions of government IDs and masses of biometric data is another weak point. “When those get breached, we’ve exposed massive populations all at once,” he says. 

Hinduja instead advocates for device-level verification, where a parent specifies a child’s age when setting up the child’s phone for the first time. This information is then kept on the device and shared securely with apps and websites. 

That’s more or less what Tim Cook, the CEO of Apple, recently lobbied US lawmakers to call for. Cook was fighting lawmakers who wanted to require app stores to verify ages, which would saddle Apple with lots of liability. 

More signals of where this is all headed will come on Wednesday, when the Federal Trade Commission—the agency that would be responsible for enforcing these new laws—is holding an all-day workshop on age verification. Apple’s head of government affairs, Nick Rossi, will be there. He’ll be joined by higher-ups in child safety at Google and Meta, as well as a company that specializes in marketing to children.

The FTC has become increasingly politicized under President Trump (his firing of the sole Democratic commissioner was struck down by a federal court, a decision that is now pending review by the US Supreme Court). In July, I wrote about signals that the agency is softening its stance toward AI companies. Indeed, in December, the FTC overturned a Biden-era ruling against an AI company that allowed people to flood the internet with fake product reviews, writing that it clashed with President Trump’s AI Action Plan.

Wednesday’s workshop may shed light on how partisan the FTC’s approach to age verification will be. Red states favor laws that require porn websites to verify ages (but critics warn this could be used to block a much wider range of content). Bethany Soye, a Republican state representative who is leading an effort to pass such a bill in her state of South Dakota, is scheduled to speak at the FTC meeting. The ACLU generally opposes laws requiring IDs to visit websites and has instead advocated for an expansion of existing parental controls.

While all this gets debated, though, AI has set the world of child safety on fire. We’re dealing with increased generation of child sexual abuse material, concerns (and lawsuits) about suicides and self-harm following chatbot conversations, and troubling evidence of kids’ forming attachments to AI companions. Colliding stances on privacy, politics, free expression, and surveillance will complicate any effort to find a solution. Write to me with your thoughts. 

CES showed me why Chinese tech companies feel so optimistic

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

I decided to go to CES kind of at the last minute. Over the holiday break, contacts from China kept messaging me about their travel plans. After the umpteenth “See you in Vegas?” I caved. As a China tech writer based in the US, I have one week a year when my entire beat seems to come to me—no 20-hour flights required.

CES, the Consumer Electronics Show, is the world’s biggest tech show, where companies launch new gadgets and announce new developments, and it happens every January. This year, it attracted over 148,000 attendees and over 4,100 exhibitors. It sprawls across the Las Vegas Convention Center, the city’s biggest exhibition space, and spills over into adjacent hotels. 

China has long had a presence at CES, but this year it showed up in a big way. Chinese exhibitors accounted for nearly a quarter of all companies at the show, and in pockets like AI hardware and robotics, China’s presence felt especially dominant. On the floor, I saw tons of Chinese industry attendees roaming around, plus a notable number of Chinese VCs. Multiple experienced CES attendees told me this is the first post-covid CES where China was present in a way you couldn’t miss. Last year might have been trending that way too, but a lot of Chinese attendees reportedly ran into visa denials. Now AI has become the universal excuse, and reason, to make the trip.

As expected, AI was the biggest theme this year, seen on every booth wall. It’s both the biggest thing everyone is talking about and a deeply confusing marketing gimmick. “We added AI” is slapped onto everything from the reasonable (PCs, phones, TVs, security systems) to the deranged (slippers, hair dryers, bed frames). 

Consumer AI gadgets still feel early and of very uneven quality. The most common categories are educational devices and emotional support toys—which, as I’ve written about recently, are all the rage in China. There are some memorable ones: Luka AI makes a robotic panda that scuttles around and keeps a watchful eye on your baby. Fuzozo, a fluffy keychain-size AI robot, is basically a digital pet in physical form. It comes with a built-in personality and reacts to how you treat it. The companies selling these just hope you won’t think too hard about the privacy implications.

Ian Goh, an investor at 01.VC, told me China’s manufacturing advantage gives it a unique edge in AI consumer electronics, because a lot of Western companies feel they simply cannot fight and win in the arena of hardware. 

Another area where Chinese companies seem to be at the head of the pack is household electronics. The products they make are becoming impressively sophisticated. Home robots, 360 cams, security systems, drones, lawn-mowing machines, pool heat pumps … Did you know two Chinese brands basically dominate the market for home cleaning robots in the US and are eating the lunch of Dyson and Shark? Did you know almost all the suburban yard tech you can buy in the West comes from Shenzhen, even though that whole backyard-obsessed lifestyle barely exists in China? This stuff is so sleek that you wouldn’t clock it as Chinese unless you went looking. The old “cheap and repetitive” stereotype doesn’t explain what I saw. I walked away from CES feeling that I needed a major home appliance upgrade.

Of course, appliances are a safe, mature market. On the more experiential front, humanoid robots were a giant magnet for crowds, and Chinese companies put on a great show. Every robot seemed to be dancing, in styles from Michael Jackson to K-pop to lion dancing, some even doing back flips. Hangzhou-based Unitree even set up a boxing ring where people could “challenge” its robots. The robot fighters were about half the size of an adult human and the matches often ended in a robot knockout, but that’s not really the point. What Unitree was actually showing off was its robots’ stability and balance: they got shoved, stumbled across the ring, and stayed upright, recovering mid-motion. Beyond flexing dynamic movements like these there were also impressive showcases of dexterity: Robots could be seen folding paper pinwheels, doing laundry, playing piano, and even making latte art.

Attendees take photos of the UniTree autonomous robot which is posing with its boxing gloves and headgear

CAL SPORT MEDIA VIA AP IMAGES

However, most of these robots, even the good ones, are one-trick ponies. They’re optimized for a specific task on the show floor. I tried to make one fold a T-shirt after I’d flipped the garment around, and it got confused very quickly. 

Still, they’re getting a lot of hype as an  important next frontier because they could help drag AI out of text boxes and into the physical world. As LLMs mature, vision-language models feel like the logical next step. But then you run into the big problem: There’s far less physical-world data than text data to train AI on. Humanoid robots become both applications and roaming data-collection terminals. China is uniquely positioned here because of supply chains, manufacturing depth, and spillover from adjacent industries (EVs, batteries, motors, sensors), and it’s already developing a humanoid training industry, as Rest of World reported recently. 

Most Chinese companies believe that if you can manufacture at scale, you can innovate, and they’re not wrong. A lot of the confidence in China’s nascent humanoid robot industry and beyond is less about a single breakthrough and more about “We can iterate faster than the West.”

Chinese companies are not just selling gadgets, though—they’re working on every layer of the tech stack. Not just on end products but frameworks, tooling, IoT enablement, spatial data. Open-source culture feels deeply embedded; engineers from Hangzhou tell me there are AI hackathons every week in the city, where China’s new “little Silicon Valley” is located.

Indeed, the headline innovations at CES 2026 were not on devices but in cloud: platforms, ecosystems, enterprise deployments, and “hybrid AI” (cloud + on-device) applications. Lenovo threw the buzziest main-stage events this year, and yes, there were PCs—but the core story was its cross-device AI agent system, Qira, and a partnership pitch with Nvidia aimed at AI cloud providers. Nvidia’s CEO, Jensen Huang, launched Vera Rubin, a new data-center platform, claiming it would  dramatically lower costs for training and running AI. AMD’s CEO, Lisa Su, introduced Helios, another data-center system built to run huge AI workloads. These solutions point to the ballooning AI computing workload at data centers, and the real race of making cloud services cheap and powerful enough to keep up.

As I spoke with China-related attendees, the overall mood I felt was a cautious optimism. At a house party I went to, VCs and founders from China were mingling effortlessly with Bay Area transplants. Everyone is building something. Almost no one wants to just make money from Chinese consumers anymore. The new default is: Build in China, sell to the world, and treat the US market like the proving ground.

How social media encourages the worst of AI boosterism

Demis Hassabis, CEO of Google DeepMind, summed it up in three words: “This is embarrassing.”  

Hassabis was replying on X to an overexcited post by Sébastien Bubeck, a research scientist at the rival firm OpenAI, announcing that two mathematicians had used OpenAI’s latest large language model, GPT-5, to find solutions to 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” Bubeck crowed.

Put your math hats on for a minute, and let’s take a look at what this beef from mid-October was about. It’s a perfect example of what’s wrong with AI right now.

Bubeck was excited that GPT-5 seemed to have somehow solved a number of puzzles known as Erdős problems.

Paul Erdős, one of the most prolific mathematicians of the 20th century, left behind hundreds of puzzles when he died. To help keep track of which ones have been solved, Thomas Bloom, a mathematician at the University of Manchester, UK, set up erdosproblems.com, which lists more than 1,100 problems and notes that around 430 of them come with solutions. 

When Bubeck celebrated GPT-5’s breakthrough, Bloom was quick to call him out. “This is a dramatic misrepresentation,” he wrote on X. Bloom explained that a problem isn’t necessarily unsolved if this website does not list a solution. That simply means Bloom wasn’t aware of one. There are millions of mathematics papers out there, and nobody has read all of them. But GPT-5 probably has.

It turned out that instead of coming up with new solutions to 10 unsolved problems, GPT-5 had scoured the internet for 10 existing solutions that Bloom hadn’t seen before. Oops!

There are two takeaways here. One is that breathless claims about big breakthroughs shouldn’t be made via social media: Less knee jerk and more gut check.

The second is that GPT-5’s ability to find references to previous work that Bloom wasn’t aware of is also amazing. The hype overshadowed something that should have been pretty cool in itself.

Mathematicians are very interested in using LLMs to trawl through vast numbers of existing results, François Charton, a research scientist who studies the application of LLMs to mathematics at the AI startup Axiom Math, told me when I talked to him about this Erdős gotcha.

But literature search is dull compared with genuine discovery, especially to AI’s fervent boosters on social media. Bubeck’s blunder isn’t the only example.

In August, a pair of mathematicians showed that no LLM at the time was able to solve a math puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with evidence that GPT-5 now could. “Lee Sedol moment is coming for many,” one observer commented, referring to the Go master who lost to DeepMind’s AI AlphaGo in 2016.

But Charton pointed out that solving Yu Tsumura’s 554th Problem isn’t a big deal to mathematicians. “It’s a question you would give an undergrad,” he said. “There is this tendency to overdo everything.”

Meanwhile, more sober assessments of what LLMs may or may not be good at are coming in. At the same time that mathematicians were fighting on the internet about GPT-5, two new studies came out that looked in depth at the use of LLMs in medicine and law (two fields that model makers have claimed their tech excels at). 

Researchers found that LLMs could make certain medical diagnoses, but they were flawed at recommending treatments. When it comes to law, researchers found that LLMs often give inconsistent and incorrect advice. “Evidence thus far spectacularly fails to meet the burden of proof,” the authors concluded.

But that’s not the kind of message that goes down well on X. “You’ve got that excitement because everybody is communicating like crazy—nobody wants to be left behind,” Charton said. X is where a lot of AI news drops first, it’s where new results are trumpeted, and it’s where key players like Sam Altman, Yann LeCun, and Gary Marcus slug it out in public. It’s hard to keep up—and harder to look away.

Bubeck’s post was only embarrassing because his mistake was caught. Not all errors are. Unless something changes researchers, investors, and non-specific boosters will keep teeing each other up. “Some of them are scientists, many are not, but they are all nerds,” Charton told me. “Huge claims work very well on these networks.”

*****

There’s a coda! I wrote everything you’ve just read above for the Algorithm column in the January/February 2026 issue of MIT Technology Review magazine (out very soon). Two days after that went to press, Axiom told me its own math model, AxiomProver, had solved two open Erdős problems (#124 and #481, for the math fans in the room). That’s impressive stuff for a small startup founded just a few months ago. Yup—AI moves fast!

But that’s not all. Five days later the company announced that AxiomProver had solved nine out of 12 problems in this year’s Putnam competition, a college-level math challenge that some people consider harder than the better-known International Math Olympiad (which LLMs from both Google DeepMind and OpenAI aced a few months back). 

The Putnam result was lauded on X by big names in the field, including Jeff Dean, chief scientist at Google DeepMind, and Thomas Wolf, cofounder at the AI firm Hugging Face. Once again familiar debates played out in the replies. A few researchers pointed out that while the International Math Olympiad demands more creative problem-solving, the Putnam competition tests math knowledge—which makes it notoriously hard for undergrads, but easier, in theory, for LLMs that have ingested the internet.

How should we judge Axiom’s achievements? Not on social media, at least. And the eye-catching competition wins are just a starting point. Determining just how good LLMs are at math will require a deeper dive into exactly what these models are doing when they solve hard (read: hard for humans) math problems.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Why it’s time to reset our expectations for AI

Can I ask you a question: How do you feel about AI right now? Are you still excited? When you hear that OpenAI or Google just dropped a new model, do you still get that buzz? Or has the shine come off it, maybe just a teeny bit? Come on, you can be honest with me.

Truly, I feel kind of stupid even asking the question, like a spoiled brat who has too many toys at Christmas. AI is mind-blowing. It’s one of the most important technologies to have emerged in decades (despite all its many many drawbacks and flaws and, well, issues).

At the same time I can’t help feeling a little bit: Is that it?

If you feel the same way, there’s good reason for it: The hype we have been sold for the past few years has been overwhelming. We were told that AI would solve climate change. That it would reach human-level intelligence. That it would mean we no longer had to work!

Instead we got AI slop, chatbot psychosis, and tools that urgently prompt you to write better email newsletters. Maybe we got what we deserved. Or maybe we need to reevaluate what AI is for.

That’s the reality at the heart of a new series of stories, published today, called Hype Correction. We accept that AI is still the hottest ticket in town, but it’s time to re-set our expectations.

As my colleague Will Douglas Heaven puts it in the package’s intro essay, “You can’t help but wonder: When the wow factor is gone, what’s left? How will we view this technology a year or five from now? Will we think it was worth the colossal costs, both financial and environmental?” 

Elsewhere in the package, James O’Donnell looks at Sam Altman, the ultimate AI hype man, through the medium of his own words. And Alex Heath explains the AI bubble, laying out for us what it all means and what we should look out for.

Michelle Kim analyzes one of the biggest claims in the AI hype cycle: that AI would completely eliminate the need for certain classes of jobs. If ChatGPT can pass the bar, surely that means it will replace lawyers? Well, not yet, and maybe not ever. 

Similarly, Edd Gent tackles the big question around AI coding. Is it as good as it sounds? Turns out the jury is still out. And elsewhere David Rotman looks at the real-world work that needs to be done before AI materials discovery has its breakthrough ChatGPT moment.

Meanwhile, Garrison Lovely spends time with some of the biggest names in the AI safety world and asks: Are the doomers still okay? I mean, now that people are feeling a bit less scared about their impending demise at the hands of superintelligent AI? And Margaret Mitchell reminds us that hype around generative AI can blind us to the AI breakthroughs we should really celebrate.

Let’s remember: AI was here before ChatGPT and it will be here after. This hype cycle has been wild, and we don’t know what its lasting impact will be. But AI isn’t going anywhere. We shouldn’t be so surprised that those dreams we were sold haven’t come true—yet.

The more likely story is that the real winners, the killer apps, are still to come. And a lot of money is being bet on that prospect. So yes: The hype could never sustain itself over the short term. Where we’re at now is maybe the start of a post-hype phase. In an ideal world, this hype correction will reset expectations. 

Let’s all catch our breath, shall we?

This story first appeared in The Algorithm, our weekly free newsletter all about AI. Sign up to read past editions here.

The State of AI: Welcome to the economic singularity

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday for the next two weeks, writers from both publications will debate one aspect of the generative AI revolution reshaping global power.

This week, Richard Waters, FT columnist and former West Coast editor, talks with MIT Technology Review’s editor at large David Rotman about the true impact of AI on the job market.

Bonus: If you’re an MIT Technology Review subscriber, you can join David and Richard, alongside MIT Technology Review’s editor in chief, Mat Honan, for an exclusive conversation live on Tuesday, December 9 at 1pm ET about this topic. Sign up to be a part here.

Richard Waters writes:

Any far-reaching new technology is always uneven in its adoption, but few have been more uneven than generative AI. That makes it hard to assess its likely impact on individual businesses, let alone on productivity across the economy as a whole.

At one extreme, AI coding assistants have revolutionized the work of software developers. Mark Zuckerberg recently predicted that half of Meta’s code would be written by AI within a year. At the other extreme, most companies are seeing little if any benefit from their initial investments. A widely cited study from MIT found that so far, 95% of gen AI projects produce zero return.

That has provided fuel for the skeptics who maintain that—by its very nature as a probabilistic technology prone to hallucinating—generative AI will never have a deep impact on business.

To many students of tech history, though, the lack of immediate impact is just the normal lag associated with transformative new technologies. Erik Brynjolfsson, then an assistant professor at MIT, first described what he called the “productivity paradox of IT” in the early 1990s. Despite plenty of anecdotal evidence that technology was changing the way people worked, it wasn’t showing up in the aggregate data in the form of higher productivity growth. Brynjolfsson’s conclusion was that it just took time for businesses to adapt.

Big investments in IT finally showed through with a notable rebound in US productivity growth starting in the mid-1990s. But that tailed off a decade later and was followed by a second lull.

Richard Waters and David Rotman

FT/MIT TECHNOLOGY REVIEW | ADOBE STOCK

In the case of AI, companies need to build new infrastructure (particularly data platforms), redesign core business processes, and retrain workers before they can expect to see results. If a lag effect explains the slow results, there may at least be reasons for optimism: Much of the cloud computing infrastructure needed to bring generative AI to a wider business audience is already in place.

The opportunities and the challenges are both enormous. An executive at one Fortune 500 company says his organization has carried out a comprehensive review of its use of analytics and concluded that its workers, overall, add little or no value. Rooting out the old software and replacing that inefficient human labor with AI might yield significant results. But, as this person says, such an overhaul would require big changes to existing processes and take years to carry out.

There are some early encouraging signs. US productivity growth, stuck at 1% to 1.5% for more than a decade and a half, rebounded to more than 2% last year. It probably hit the same level in the first nine months of this year, though the lack of official data due to the recent US government shutdown makes this impossible to confirm.

It is impossible to tell, though, how durable this rebound will be or how much can be attributed to AI. The effects of new technologies are seldom felt in isolation. Instead, the benefits compound. AI is riding earlier investments in cloud and mobile computing. In the same way, the latest AI boom may only be the precursor to breakthroughs in fields that have a wider impact on the economy, such as robotics. ChatGPT might have caught the popular imagination, but OpenAI’s chatbot is unlikely to have the final word.

David Rotman replies: 

This is my favorite discussion these days when it comes to artificial intelligence. How will AI affect overall economic productivity? Forget about the mesmerizing videos, the promise of companionship, and the prospect of agents to do tedious everyday tasks—the bottom line will be whether AI can grow the economy, and that means increasing productivity. 

But, as you say, it’s hard to pin down just how AI is affecting such growth or how it will do so in the future. Erik Brynjolfsson predicts that, like other so-called general purpose technologies, AI will follow a J curve in which initially there is a slow, even negative, effect on productivity as companies invest heavily in the technology before finally reaping the rewards. And then the boom. 

But there is a counterexample undermining the just-be-patient argument. Productivity growth from IT picked up in the mid-1990s but since the mid-2000s has been relatively dismal. Despite smartphones and social media and apps like Slack and Uber, digital technologies have done little to produce robust economic growth. A strong productivity boost never came.

Daron Acemoglu, an economist at MIT and a 2024 Nobel Prize winner, argues that the productivity gains from generative AI will be far smaller and take far longer than AI optimists think. The reason is that though the technology is impressive in many ways, the field is too narrowly focused on products that have little relevance to the largest business sectors.

The statistic you cite that 95% of AI projects lack business benefits is telling. 

Take manufacturing. No question, some version of AI could help; imagine a worker on the factory floor snapping a picture of a problem and asking an AI agent for advice. The problem is that the big tech companies creating AI aren’t really interested in solving such mundane tasks, and their large foundation models, mostly trained on the internet, aren’t all that helpful. 

It’s easy to blame the lack of productivity impact from AI so far on business practices and poorly trained workers. Your example of the executive of the Fortune 500 company sounds all too familiar. But it’s more useful to ask how AI can be trained and fine-tuned to give workers, like nurses and teachers and those on the factory floor, more capabilities and make them more productive at their jobs. 

The distinction matters. Some companies announcing large layoffs recently cited AI as the reason. The worry, however, is that it’s just a short-term cost-saving scheme. As economists like Brynjolfsson and Acemoglu agree, the productivity boost from AI will come when it’s used to create new types of jobs and augment the abilities of workers, not when it is used just to slash jobs to reduce costs. 

Richard Waters responds : 

I see we’re both feeling pretty cautious, David, so I’ll try to end on a positive note. 

Some analyses assume that a much greater share of existing work is within the reach of today’s AI. McKinsey reckons 60% (versus 20% for Acemoglu) and puts annual productivity gains across the economy at as much as 3.4%. Also, calculations like these are based on automation of existing tasks; any new uses of AI that enhance existing jobs would, as you suggest, be a bonus (and not just in economic terms).

Cost-cutting always seems to be the first order of business with any new technology. But we’re still in the early stages and AI is moving fast, so we can always hope.

Further reading

FT chief economics commentator Martin Wolf has been skeptical about whether tech investment boosts productivity but says AI might prove him wrong. The downside: Job losses and wealth concentration might lead to “techno-feudalism.”

The FT‘s Robert Armstrong argues that the boom in data center investment need not turn to bust. The biggest risk is that debt financing will come to play too big a role in the buildout.

Last year, David Rotman wrote for MIT Technology Review about how we can make sure AI works for us in boosting productivity, and what course corrections will be required.

David also wrote this piece about how we can best measure the impact of basic R&D funding on economic growth, and why it can often be bigger than you might think.

The State of AI: How war will be changed forever

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday, writers from both publications debate one aspect of the generative AI revolution reshaping global power.

In this conversation, Helen Warrell, FT investigations reporter and former defense and security editor, and James O’Donnell, MIT Technology Review’s senior AI reporter, consider the ethical quandaries and financial incentives around AI’s use by the military.

Helen Warrell, FT investigations reporter 

It is July 2027, and China is on the brink of invading Taiwan. Autonomous drones with AI targeting capabilities are primed to overpower the island’s air defenses as a series of crippling AI-generated cyberattacks cut off energy supplies and key communications. In the meantime, a vast disinformation campaign enacted by an AI-powered pro-Chinese meme farm spreads across global social media, deadening the outcry at Beijing’s act of aggression.

Scenarios such as this have brought dystopian horror to the debate about the use of AI in warfare. Military commanders hope for a digitally enhanced force that is faster and more accurate than human-directed combat. But there are fears that as AI assumes an increasingly central role, these same commanders will lose control of a conflict that escalates too quickly and lacks ethical or legal oversight. Henry Kissinger, the former US secretary of state, spent his final years warning about the coming catastrophe of AI-driven warfare.

Grasping and mitigating these risks is the military priority—some would say the “Oppenheimer moment”—of our age. One emerging consensus in the West is that decisions around the deployment of nuclear weapons should not be outsourced to AI. UN secretary-general António Guterres has gone further, calling for an outright ban on fully autonomous lethal weapons systems. It is essential that regulation keep pace with evolving technology. But in the sci-fi-fueled excitement, it is easy to lose track of what is actually possible. As researchers at Harvard’s Belfer Center point out, AI optimists often underestimate the challenges of fielding fully autonomous weapon systems. It is entirely possible that the capabilities of AI in combat are being overhyped.

Anthony King, Director of the Strategy and Security Institute at the University of Exeter and a key proponent of this argument, suggests that rather than replacing humans, AI will be used to improve military insight. Even if the character of war is changing and remote technology is refining weapon systems, he insists, “the complete automation of war itself is simply an illusion.”

Of the three current military use cases of AI, none involves full autonomy. It is being developed for planning and logistics, cyber warfare (in sabotage, espionage, hacking, and information operations; and—most controversially—for weapons targeting, an application already in use on the battlefields of Ukraine and Gaza. Kyiv’s troops use AI software to direct drones able to evade Russian jammers as they close in on sensitive sites. The Israel Defense Forces have developed an AI-assisted decision support system known as Lavender, which has helped identify around 37,000 potential human targets within Gaza. 

Helen Warrell and James O'Donnell

FT/MIT TECHNOLOGY REVIEW | ADOBE STOCK

There is clearly a danger that the Lavender database replicates the biases of the data it is trained on. But military personnel carry biases too. One Israeli intelligence officer who used Lavender claimed to have more faith in the fairness of a “statistical mechanism” than that of a grieving soldier.

Tech optimists designing AI weapons even deny that specific new controls are needed to control their capabilities. Keith Dear, a former UK military officer who now runs the strategic forecasting company Cassi AI, says existing laws are more than sufficient: “You make sure there’s nothing in the training data that might cause the system to go rogue … when you are confident you deploy it—and you, the human commander, are responsible for anything they might do that goes wrong.”

It is an intriguing thought that some of the fear and shock about use of AI in war may come from those who are unfamiliar with brutal but realistic military norms. What do you think, James? Is some opposition to AI in warfare less about the use of autonomous systems and really an argument against war itself? 

James O’Donnell replies:

Hi Helen, 

One thing I’ve noticed is that there’s been a drastic shift in attitudes of AI companies regarding military applications of their products. In the beginning of 2024, OpenAI unambiguously forbade the use of its tools for warfare, but by the end of the year, it had signed an agreement with Anduril to help it take down drones on the battlefield. 

This step—not a fully autonomous weapon, to be sure, but very much a battlefield application of AI—marked a drastic change in how much tech companies could publicly link themselves with defense. 

What happened along the way? For one thing, it’s the hype. We’re told AI will not just bring superintelligence and scientific discovery but also make warfare sharper, more accurate and calculated, less prone to human fallibility. I spoke with US Marines, for example, who tested a type of AI while patrolling the South Pacific that was advertised to analyze foreign intelligence faster than a human could. 

Secondly, money talks. OpenAI and others need to start recouping some of the unimaginable amounts of cash they’re spending on training and running these models. And few have deeper pockets than the Pentagon. And Europe’s defense heads seem keen to splash the cash too. Meanwhile, the amount of venture capital funding for defense tech this year has already doubled the total for all of 2024, as VCs hope to cash in on militaries’ newfound willingness to buy from startups. 

I do think the opposition to AI warfare falls into a few camps, one of which simply rejects the idea that more precise targeting (if it’s actually more precise at all) will mean fewer casualties rather than just more war. Consider the first era of drone warfare in Afghanistan. As drone strikes became cheaper to implement, can we really say it reduced carnage? Instead, did it merely enable more destruction per dollar?

But the second camp of criticism (and now I’m finally getting to your question) comes from people who are well versed in the realities of war but have very specific complaints about the technology’s fundamental limitations. Missy Cummings, for example, is a former fighter pilot for the US Navy who is now a professor of engineering and computer science at George Mason University. She has been outspoken in her belief that large language models, specifically, are prone to make huge mistakes in military settings.

The typical response to this complaint is that AI’s outputs are human-checked. But if an AI model relies on thousands of inputs for its conclusion, can that conclusion really be checked by one person?

Tech companies are making extraordinarily big promises about what AI can do in these high-stakes applications, all while pressure to implement them is sky high. For me, this means it’s time for more skepticism, not less. 

Helen responds:

Hi James, 

We should definitely continue to question the safety of AI warfare systems and the oversight to which they’re subjected—and hold political leaders to account in this area. I am suggesting that we also apply some skepticism to what you rightly describe as the “extraordinarily big promises” made by some companies about what AI might be able to achieve on the battlefield. 

There will be both opportunities and hazards in what the military is being offered by a relatively nascent (though booming) defense tech scene. The danger is that in the speed and secrecy of an arms race in AI weapons, these emerging capabilities may not receive the scrutiny and debate they desperately need.

Further reading:

Michael C. Horowitz, director of Perry World House at the University of Pennsylvania, explains the need for responsibility in the development of military AI systems in this FT op-ed.

The FT’s tech podcast asks what Israel’s defense tech ecosystem can tell us about the future of warfare 

This MIT Technology Review story analyzes how OpenAI completed its pivot to allowing its technology on the battlefield.

MIT Technology Review also uncovered how US soldiers are using generative AI to help scour thousands of pieces of open-source intelligence.