Brian Armstrong, the billionaire CEO of the cryptocurrency exchange Coinbase, says he’s ready to fund a US startup focused on gene-editing human embryos. If he goes forward, it would be the first major commercial investment in one of medicine’s most fraught ideas.
In a post on X June 2, Armstrong announced he was looking for gene-editing scientists and bioinformatics specialists to form a founding team for an “embryo editing” effort targeting an unmet medical need, such as a genetic disease.
“I think the time is right for the defining company in the US to be built in this area,” Armstrong posted.
The announcement from a deep-pocketed backer is a striking shift for a field considered taboo following the 2018 birth of the world’s first genetically edited children in China—a secretive experiment that led to international outrage and prison time for the lead scientist.
According to Dieter Egli, a gene-editing scientist at Columbia University whose team has briefed Armstrong, his plans may be motivated in part by recent improvements in editing technology that have opened up a safer, more precise way to change the DNA of embryos.
That technique, called base editing, can deftly change a single DNA letter. Earlier methods, on the other hand, actually cut the double helix, damaging it and causing whole genes to disappear. “We know much better now what to do,” says Egli. “It doesn’t mean the work is all done, but it’s a very different game now—entirely different.”
Shoestring budget
Embryo editing, which ultimately aims to produce humans with genes tailored by design, is an idea that has been heavily stigmatized and starved of funding. While it’s legal to study embryos in the lab, actually producing a gene-edited baby is flatly illegal in most countries.
In the US, the CRISPR baby ban operates via a law that forbids the Food and Drug Administration from considering, or even acknowledging, any application it gets to attempt a gene-edited baby. But that rule could be changed, especially if scientists can demonstrate a compelling use of the technique—or perhaps if a billionaire lobbies for it.
In his post, Armstrong included an image of a seven-year-old Pew Research Center poll showing Americans were strongly favorable to altering a baby’s genes if it could treat disease, although the same poll found most opposed experimentation on embryos.
Up until this point, no US company has openly pursued embryo editing, and the federal government doesn’t fund studies on embryos at all. Instead, research on gene editing in embryos has been carried forward in the US by just two academic centers, Egli’s and one at the Oregon Health & Science University.
Those efforts have operated on a shoestring, held together by private grants and university funds. Researchers at those centers said they support the idea of a well-financed company that could advance the technology. “We would honestly welcome that,” says Paula Amato, a fertility doctor at Oregon Health & Science University and the past president of the American Society for Reproductive Medicine.
“More research is needed, and that takes people and money,” she says, adding that she doesn’t mind if it comes from “tech bros.”
Editing embryos can, in theory, be used to correct genetic errors likely to cause serious childhood conditions. But since in most cases genetic testing of embryos can also be used to avoid those errors, many argue it will be hard to find a true unmet need where the DNA-altering technique is actually necessary.
Instead, it’s easy to conclude that the bigger market for the technology would be to intervene in embryos in ways that could make humans resistant to common conditions, such as heart disease or Alzheimer’s. But that is more controversial because it’s a type of enhancement, and the changes would also be passed through the generations.
Only last week, several biotech trade and academic groups demanded a 10-year moratorium on heritable human genome editing, saying the technology has few real medical uses and “introduces long-term risks with unknown consequences.”
They said the ability to “program” desired traits or eliminate bad ones risked a new form of “eugenics,” one that would have the effect of “potentially altering the course of evolution.”
No limits
Armstrong did not reply to an email from MIT Technology Review seeking comment about his plans. Nor did his company Coinbase, a cryptocurrency trading platform that went public in 2021 and is the source of his fortune, estimated at $10 billion by Forbes.
The billionaire is already part of a wave of tech entrepreneurs who’ve made a splash in science and biology by laying down outsize investments, sometimes in far-out ideas. Armstrong previously cofounded NewLimit, which Bloomberg calls a “life extension venture” and which this year raised a further $130 million to explore methods to reprogram old cells into an embryonic-like state.
He started that company with Blake Byers, an investor who has said a significant portion of global GDP should be spent on “immortality” research, including biotech approaches and ways of uploading human minds to computers.
Then, starting late last year, Armstrong began publicly telegraphing his interest in exploring a new venture, this time connected to assisted reproduction. In December, he announced on X that he and Byers were ready to meet with entrepreneurs working on “artificial wombs,” “embryo editing,” and “next-gen IVF.”
The post invited people to apply to attend an off-the-record dinner—a kind of forbidden-technologies soiree. Applicants had to fill in a Google form answering a few questions, including “What is something awesome you’ve built?”
Among those who attended the dinner was a postdoctoral fellow from Egli’s lab, Stepan Jerabek, who has been testing base-editing in embryos. Another attendee, Lucas Harrington, is a gene-editing scientist who trained at the University of California, Berkeley under Jennifer Doudna, a winner of the Nobel Prize in chemistry for development of CRISPR gene editing. Harrington says a venture group he helps run, called SciFounders, is also considering starting an embryo-editing company.
“We share an interest in there being a company to empirically evaluate whether embryo editing can be done safely, and are actively exploring incubating a company to undertake this,” Harrington said in an email. “We believe there need to be legitimate scientists and clinicians working to safely evaluate this technology.”
Because of how rapidly gene editing is advancing, Harrington has also criticized bans and moratoria on the technology. These can’t stop it from being applied but, he says, can drive it into “the shadows,” where it might be used less safely. According to Harrington, “several biohacker groups have quietly raised small amounts of capital” to pursue the technology.
By contrast, Armstrong’s public declaration on X represents a more transparent approach. “It seems pretty serious now. They want to put something together,” says Egli, who hopes the Coinbase CEO might fund some research at his lab. “I think it’s very good he posted publicly, because you can feel the temperature, see what reaction you get, and you stimulate the public conversation.”
Editing error
The first reports that researchers were testing CRISPR on human embryos in the lab emerged from China in 2015, causing shock waves as it became clear how easy, in theory, it was to change human heredity. Two years later, in 2017, a report from Oregon claimed successful correction of a dangerous DNA mutation present in lab embryos made from patients’ egg and sperm cells.
But that breakthrough was not what it seemed. More careful testing by Egli and others showed that CRISPR technology actually can cause havoc in a cell, often deleting large chunks of chromosomes. That’s in addition to mosaicism, in which edits occur differently in different cells. What looked at first like precise DNA editing was in fact a dangerous process causing unseen damage.
While the public debate turned on the ethics of CRISPR babies—especially after three edited children were born in China—researchers were discussing basic scientific problems and how to solve them.
Since then, both US labs, as well as some in China, have switched to base editing. That method causes fewer unexpected effects and, in theory, could also endow an embryo with a number of advantageous gene variants, not just one change.
Company job
Some researchers also feel certain that editing an embryo is simpler than trying to treat sick adults. The only approved gene-editing treatment, for sickle-cell disease, costs more than $2 million. By contrast, editing an embryo could be incredibly cheap, and if it’s done early, when an embryo is forming, all the body cells could carry the change.
“You fix the text before you print the book,” says Egli. “It seems like a no-brainer.”
Still, gene editing isn’t quite ready for prime time in making babies. Getting there requires more work, including careful design of the editing system (which includes a protein and short guide molecule) and systematic ways to check embryos for unwanted DNA changes. That is the type of industrial effort Armstrong’s company, if he funds one, would be suited to carry out.
“You would have to optimize something to a point where it is perfect, to where it’s a breeze,” says Egli. “This is the kind of work that companies do.”
The clean cement industry might be facing the end of the road, before it ever really got rolling.
On Friday, the US Department of Energy announced that it was canceling $3.7 billion in funding for 24 projects related to energy and industry. That included nearly $1.3 billion for cement-related projects.
Cement is a massive climate problem, accounting for roughly 7% of global greenhouse-gas emissions. What’s more, it’s a difficult industry to clean up, with huge traditional players and expensive equipment and infrastructure to replace. This funding was supposed to help address those difficulties, by supporting projects on the cusp of commercialization. Now companies will need to fill in the gap left by these cancellations, and it’s a big one.
First up on the list for cuts is Sublime Systems, a company you’re probably familiar with if you’ve been reading this newsletter for a while. I did a deep dive last year, and the company was on our list of Climate Tech Companies to Watch in both 2023 and 2024.
The startup’s approach is to make cement using electricity. The conventional process requires high temperatures typically achieved by burning fossil fuels, so avoiding that could prevent a lot of emissions.
In 2024, Sublime received an $87 million grant from the DOE to construct a commercial demonstration plant in Holyoke, Massachusetts. That grant would have covered roughly half the construction costs for the facility, which is scheduled to open in 2026 and produce up to 30,000 metric tons of cement each year.
“We were certainly surprised and disappointed about the development,” says Joe Hicken, Sublime’s senior VP of business development and policy. Customers are excited by the company’s technology, Hicken adds, pointing to Sublime’s recently announced deal with Microsoft, which plans to buy up to 622,500 metric tons of cement from the company.
Another big name, Brimstone, also saw its funding affected. That award totaled $189 million for a commercial demonstration plant, which was expected to produce over 100,000 metric tons of cement annually.
In a statement, a Brimstone representative said the company believes the cancellation was a “misunderstanding.” The statement pointed out that the planned facility would make not only cement but also alumina, supporting US-based aluminum production. (Aluminum is classified as a critical mineral by the US Geological Survey, meaning it’s considered crucial to the US economy and national security.)
An award to Heidelberg Materials for up to $500 million for a planned Indiana facility was also axed. The idea there was to integrate carbon capture and storage to clean up emissions from the plant, which would have made it the first cement plant in the US to demonstrate that technology. In a written statement, a representative said the decision can be appealed, and the company is considering that option.
And National Cement’s funding for the Lebec Net-Zero Project, another $500 million award, was canceled. That facility planned to make carbon-neutral cement through a combination of strategies: reducing the polluting ingredients needed, using alternative fuels like biomass, and capturing the plant’s remaining emissions.
“We want to emphasize that this project will expand domestic manufacturing capacity for a critical industrial sector, while also integrating new technologies to keep American cement competitive,” said a company spokesperson in a written statement.
There’s a sentiment here that’s echoed in all the responses I received: While these awards were designed to cut emissions, these companies argue that they can fit into the new administration’s priorities. They’re emphasizing phrases like “critical minerals,” “American jobs,” and “domestic supply chains.”
“We’ve heard loud and clear from the Trump administration the desire to displace foreign imports of things that can be made here in America,” Sublime’s Hicken says. “At the end of the day, what we deliver is what the policymakers in DC are looking for.”
But this administration is showing that it’s not supporting climate efforts—often even those that also advance its stated goals of energy abundance and American competitiveness.
On Monday, my colleague James Temple published a new story about cuts to climate research, including tens of millions of dollars in grants from the National Science Foundation. Researchers at Harvard were particularly hard hit.
Even as there’s interest in advancing the position of the US on the world’s stage, these cuts are making it hard for researchers and companies alike to do the crucial work of understanding our climate and developing and deploying new technologies.
This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.
Last year, China saw a boom in foundation models, the do-everything large language models that underpin the AI revolution. This year, the focus has shifted to AI agents—systems that are less about responding to users’ queries and more about autonomously accomplishing things for them.
There are now a host of Chinese startups building these general-purpose digital tools, which can answer emails, browse the internet to plan vacations, and even design an interactive website. Many of these have emerged in just the last two months, following in the footsteps of Manus—a general AI agent that sparked weeks of social media frenzy for invite codes after its limited-release launch in early March.
These emerging AI agents aren’t large language models themselves. Instead, they’re built on top of them, using a workflow-based structure designed to get things done. A lot of these systems also introduce a different way of interacting with AI. Rather than just chatting back and forth with users, they are optimized for managing and executing multistep tasks—booking flights, managing schedules, conducting research—by using external tools and remembering instructions.
China could take the lead on building these kinds of agents. The country’s tightly integrated app ecosystems, rapid product cycles, and digitally fluent user base could provide a favorable environment for embedding AI into daily life.
For now, its leading AI agent startups are focusing their attention on the global market, because the best Western models don’t operate inside China’s firewalls. But that could change soon: Tech giants like ByteDance and Tencent are preparing their own AI agents that could bake automation directly into their native super-apps, pulling data from their vast ecosystem of programs that dominate many aspects of daily life in the country.
As the race to define what a useful AI agent looks like unfolds, a mix of ambitious startups and entrenched tech giants are now testing how these tools might actually work in practice—and for whom.
Set the standard
It’s been a whirlwind few months for Manus, which was developed by the Wuhan-based startup Butterfly Effect. The company raised $75 million in a funding round led by the US venture capital firm Benchmark, took the product on an ambitious global roadshow, and hired dozens of new employees.
Even before registration opened to the public in May, Manus had become a reference point for what a broad, consumer‑oriented AI agent should accomplish. Rather than handling narrow chores for businesses, this “general” agent is designed to be able to help with everyday tasks like trip planning, stock comparison, or your kid’s school project.
Unlike previous AI agents, Manus uses a browser-based sandbox that lets users supervise the agent like an intern, watching in real time as it scrolls through web pages, reads articles, or codes actions. It also proactively asks clarifying questions, supports long-term memory that would serve as context for future tasks.
“Manus represents a promising product experience for AI agents,” says Ang Li, cofounder and CEO of Simular, a startup based in Palo Alto, California, that’s building computer use agents, AI agents that control a virtual computer. “I believe Chinese startups have a huge advantage when it comes to designing consumer products, thanks to cutthroat domestic competition that leads to fast execution and greater attention to product details.”
In the case of Manus, the competition is moving fast. Two of the most buzzy follow‑ups, Genspark and Flowith, for example, are already boasting benchmark scores that match or edge past Manus’s.
Genspark, led by former Baidu executives Eric Jing and Kay Zhu, links many small “super agents” through what it calls multi‑component prompting. The agent can switch among several large language models, accepts both images and text, and carries out tasks from making slide decks to placing phone calls. Whereas Manus relies heavily on Browser Use, a popular open-source product that lets agents operate a web browser in a virtual window like a human, Genspark directly integrates with a wide array of tools and APIs. Launched in April, the company says that it already has over 5 million users and over $36 million in yearly revenue.
Flowith, the work of a young team that first grabbed public attention in April 2025 at a developer event hosted by the popular social media app Xiaohongshu, takes a different tack. Marketed as an “infinite agent,” it opens on a blank canvas where each question becomes a node on a branching map. Users can backtrack, take new branches, and store results in personal or sharable “knowledge gardens”—a design that feels more like project management software (think Notion) than a typical chat interface. Every inquiry or task builds its own mind-map-like graph, encouraging a more nonlinear and creative interaction with AI. Flowith’s core agent, NEO, runs in the cloud and can perform scheduled tasks like sending emails and compiling files. The founders want the app to be a “knowledge marketbase”, and aims to tap into the social aspect of AI with the aspiration of becoming “the OnlyFans of AI knowledge creators”.
What they also share with Manus is the global ambition. Both Genspark and Flowith have stated that their primary focus is the international market.
A global address
Startups like Manus, Genspark, and Flowith—though founded by Chinese entrepreneurs—could blend seamlessly into the global tech scene and compete effectively abroad. Founders, investors, and analysts that MIT Technology Review has spoken to believe Chinese companies are moving fast, executing well, and quickly coming up with new products.
Money reinforces the pull to launch overseas. Customers there pay more, and there are plenty to go around. “You can price in USD, and with the exchange rate that’s a sevenfold multiplier,” Manus cofounder Xiao Hong quipped on a podcast. “Even if we’re only operating at 10% power because of cultural differences overseas, we’ll still make more than in China.”
But creating the same functionality in China is a challenge. Major US AI companies including OpenAI and Anthropic have opted out of mainland China because of geopolitical risks and challenges with regulatory compliance. Their absence initially created a black market as users resorted to VPNs and third-party mirrors to access tools like ChatGPT and Claude. That vacuum has since been filled by a new wave of Chinese chatbots—DeepSeek, Doubao, Kimi—but the appetite for foreign models hasn’t gone away.
Manus, for example, uses Anthropic’s Claude Sonnet—widely considered the top model for agentic tasks. Manus cofounder Zhang Tao has repeatedly praised Claude’s ability to juggle tools, remember contexts, and hold multi‑round conversations—all crucial for turning chatty software into an effective executive assistant.
But the company’s use of Sonnet has made its agent functionally unusable inside China without a VPN. If you open Manus from a mainland IP address, you’ll see a notice explaining that the team is “working on integrating Qwen’s model,” a special local version that is built on top of Alibaba’s open-source model.
An engineer overseeing ByteDance’s work on developing an agent, who spoke to MIT Technology Review anonymously to avoid sanction, said that the absence of Claude Sonnet models “limits everything we do in China.” DeepSeek’s open models, he added, still hallucinate too often and lack training on real‑world workflows. Developers we spoke with rank Alibaba’s Qwen series as the best domestic alternative, yet most say that switching to Qwen knocks performance down a notch.
Jiaxin Pei, a postdoctoral researcher at Stanford’s Institute for Human‑Centered AI, thinks that gap will close: “Building agentic capabilities in base LLMs has become a key focus for many LLM builders, and once people realize the value of this, it will only be a matter of time.”
For now, Manus is doubling down on audiences it can already serve. In a written response, the company said its “primary focus is overseas expansion,” noting that new offices in San Francisco, Singapore, and Tokyo have opened in the past month.
A super‑app approach
Although the concept of AI agents is still relatively new, the consumer-facing AI app market in China is already crowded with major tech players. DeepSeek remains the most widely used, while ByteDance’s Doubao and Moonshot’s Kimi have also become household names. However, most of these apps are still optimized for chat and entertainment rather than task execution. This gap in the local market has pushed China’s big tech firms to roll out their own user-facing agents, though early versions remain uneven in quality and rough around the edges.
ByteDance is testing Coze Space, an AI agent based on its own Doubao model family that lets users toggle between “plan” and “execute” modes, so they can either directly guide the agent’s actions or step back and watch it work autonomously. It connects up to 14 popular apps, including GitHub, Notion, and the company’s own Lark office suite. Early reviews say the tool can feel clunky and has a high failure rate, but it clearly aims to match what Manus offers.
Meanwhile, Zhipu AI has released a free agent called AutoGLM Rumination, built on its proprietary ChatGLM models. Shanghai‑based Minimax has launched Minimax Agent. Both products look almost identical to Manus and demo basic tasks such as building a simple website, planning a trip, making a small Flash game, or running quick data analysis.
Despite the limited usability of most general AI agents launched within China, big companies have plans to change that. During a May 15 earnings call, Tencent president Liu Zhiping teased an agent that would weave automation directly into China’s most ubiquitous app, WeChat.
Considered the original super-app, WeChat already handles messaging, mobile payments, news, and millions of mini‑programs that act like embedded apps. These programs give Tencent, its developer, access to data from millions of services that pervade everyday life in China, an advantage most competitors can only envy.
Historically, China’s consumer internet has splintered into competing walled gardens—share a Taobao link in WeChat and it resolves as plaintext, not a preview card. Unlike the more interoperable Western internet, China’s tech giants have long resisted integration with one another, choosing to wage platform war at the expense of a seamless user experience.
But the use of mini‑programs has given WeChat unprecedented reach across services that once resisted interoperability, from gym bookings to grocery orders. An agent able to roam that ecosystem could bypass the integration headaches dogging independent startups.
Alibaba, the e-commerce giant behind the Qwen model series, has been a front-runner in China’s AI race but has been slower to release consumer-facing products. Even though Qwen was the most downloaded open-source model on Hugging Face in 2024, it didn’t power a dedicated chatbot app until early 2025. In March, Alibaba rebranded its cloud storage and search app Quark into an all-in-one AI search tool. By June, Quark had introduced DeepResearch—a new mode that marks its most agent-like effort to date.
ByteDance and Alibaba did not reply to MIT Technology Review’s request for comments.
“Historically, Chinese tech products tend to pursue the all-in-one, super-app approach, and the latest Chinese AI agents reflect just that,” says Li of Simular, who previously worked at Google DeepMind on AI-enabled work automation. “In contrast, AI agents in the US are more focused on serving specific verticals.”
Pei, the researcher at Stanford, says that existing tech giants could have a huge advantage in bringing the vision of general AI agents to life—especially those with built-in integration across services. “The customer-facing AI agent market is still very early, with tons of problems like authentication and liability,” he says. “But companies that already operate across a wide range of services have a natural advantage in deploying agents at scale.”
MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.
The way DARPA tells it, math is stuck in the past. In April, the US Defense Advanced Research Projects Agency kicked off a new initiative called expMath—short for Exponentiating Mathematics—that it hopes will speed up the rate of progress in a field of research that underpins a wide range of crucial real-world applications, from computer science to medicine to national security.
“Math is the source of huge impact, but it’s done more or less as it’s been done for centuries—by people standing at chalkboards,” DARPA program manager Patrick Shafto said in a video introducing the initiative.
The modern world is built on mathematics. Math lets us model complex systems such as the way air flows around an aircraft, the way financial markets fluctuate, and the way blood flows through the heart. And breakthroughs in advanced mathematics can unlock new technologies such as cryptography, which is essential for private messaging and online banking, and data compression, which lets us shoot images and video across the internet.
But advances in math can be years in the making. DARPA wants to speed things up. The goal for expMath is to encourage mathematicians and artificial-intelligence researchers to develop what DARPA calls an AI coauthor, a tool that might break large, complex math problems into smaller, simpler ones that are easier to grasp and—so the thinking goes—quicker to solve.
Mathematicians have used computers for decades, to speed up calculations or check whether certain mathematical statements are true. The new vision is that AI might help them crack problems that were previously uncrackable.
But there’s a huge difference between AI that can solve the kinds of problems set in high school—math that the latest generation of models has already mastered—and AI that could (in theory) solve the kinds of problems that professional mathematicians spend careers chipping away at.
On one side are tools that might be able to automate certain tasks that math grads are employed to do; on the other are tools that might be able to push human knowledge beyond its existing limits.
Here are three ways to think about that gulf.
1/ AI needs more than just clever tricks
Large language models are not known to be good at math. They make things up and can be persuaded that 2 + 2 = 5. But newer versions of this tech, especially so-called large reasoning models (LRMs) like OpenAI’s o3 and Anthropic’s Claude 4 Thinking, are far more capable—and that’s got mathematicians excited.
This year, a number of LRMs, which try to solve a problem step by step rather than spit out the first result that comes to them, have achieved high scores on the American Invitational Mathematics Examination (AIME), a test given to the top 5% of US high school math students.
At the same time, a handful of new hybrid models that combine LLMs with some kind of fact-checking system have also made breakthroughs. Emily de Oliveira Santos, a mathematician at the University of São Paulo, Brazil, points to Google DeepMind’s AlphaProof, a system that combines an LLM with DeepMind’s game-playing model AlphaZero, as one key milestone. Last year AlphaProof became the first computer program to match the performance of a silver medallist at the International Math Olympiad, one of the most prestigious mathematics competitions in the world.
The uptick in progress is clear. “GPT-4 couldn’t do math much beyond undergraduate level,” says de Oliveira Santos. “I remember testing it at the time of its release with a problem in topology, and it just couldn’t write more than a few lines without getting completely lost.” But when she gave the same problem to OpenAI’s o1, an LRM released in January, it nailed it.
Does this mean such models are all set to become the kind of coauthor DARPA hopes for? Not necessarily, she says: “Math Olympiad problems often involve being able to carry out clever tricks, whereas research problems are much more explorative and often have many, many more moving pieces.” Success at one type of problem-solving may not carry over to another.
Others agree. Martin Bridson, a mathematician at the University of Oxford, thinks the Math Olympiad result is a great achievement. “On the other hand, I don’t find it mind-blowing,” he says. “It’s not a change of paradigm in the sense that ‘Wow, I thought machines would never be able to do that.’ I expected machines to be able to do that.”
That’s because even though the problems in the Math Olympiad—and similar high school or undergraduate tests like AIME—are hard, there’s a pattern to a lot of them. “We have training camps to train high school kids to do them,” says Bridson. “And if you can train a large number of people to do those problems, why shouldn’t you be able to train a machine to do them?”
Sergei Gukov, a mathematician at the California Institute of Technology who coaches Math Olympiad teams, points out that the style of question does not change too much between competitions. New problems are set each year, but they can be solved with the same old tricks.
“Sure, the specific problems didn’t appear before,” says Gukov. “But they’re very close—just a step away from zillions of things you have already seen. You immediately realize, ‘Oh my gosh, there are so many similarities—I’m going to apply the same tactic.’” As hard as competition-level math is, kids and machines alike can be taught how to beat it.
That’s not true for most unsolved math problems. Bridson is president of the Clay Mathematics Institute, a nonprofit US-based research organization best known for setting up the Millenium Prize Problems in 2000—seven of the most important unsolved problems in mathematics, with a $1 million prize to be awarded to the first person to solve each of them. (One problem, the Poincaré conjecture, was solved in 2010; the others, which include P versus NP and the Riemann hypothesis, remain open). “We’re very far away from AI being able to say anything serious about any of those problems,” says Bridson.
And yet it’s hard to know exactly how far away, because many of the existing benchmarks used to evaluate progress are maxed out. The best new models already outperform most humans on tests like AIME.
To get a better idea of what existing systems can and cannot do, a startup called Epoch AI has created a new test called FrontierMath, released in December. Instead of co-opting math tests developed for humans, Epoch AI worked with more than 60 mathematicians around the world to come up with a set of math problems from scratch.
FrontierMath is designed to probe the limits of what today’s AI can do. None of the problems have been seen before and the majority are being kept secret to avoid contaminating training data. Each problem demands hours of work from expert mathematicians to solve—if they can solve it at all: some of the problems require specialist knowledge to tackle.
FrontierMath is set to become an industry standard. It’s not yet as popular as AIME, says de Oliveira Santos, who helped develop some of the problems: “But I expect this to not hold for much longer, since existing benchmarks are very close to being saturated.”
On AIME, the best large language models (Anthropic’s Claude 4, OpenAI’s o3 and o4-mini, Google DeepMind’s Gemini 2.5 Pro, X-AI’s Grok 3) now score around 90%. On FrontierMath, 04-mini scores 19% and Gemini 2.5 Pro scores 13%. That’s still remarkable, but there’s clear room for improvement.
FrontierMath should give the best sense yet just how fast AI is progressing at math. But there are some problems that are still too hard for computers to take on.
2/ AI needs to manage really vast sequences of steps
Squint hard enough and in some ways math problems start to look the same: to solve them you need to take a sequence of steps from start to finish. The problem is finding those steps.
“Pretty much every math problem can be formulated as path-finding,” says Gukov. What makes some problems far harder than others is the number of steps on that path. “The difference between the Riemann hypothesis and high school math is that with high school math the paths that we’re looking for are short—10 steps, 20 steps, maybe 40 in the longest case.” The steps are also repeated between problems.
“But to solve the Riemann hypothesis, we don’t have the steps, and what we’re looking for is a path that is extremely long”—maybe a million lines of computer proof, says Gukov.
Finding very long sequences of steps can be thought of as a kind of complex game. It’s what DeepMind’s AlphaZero learned to do when it mastered Go and chess. A game of Go might only involve a few hundred moves. But to win, an AI must find a winning sequence of moves among a vast number of possible sequences. Imagine a number with 100 zeros at the end, says Gukov.
But that’s still tiny compared with the number of possible sequences that could be involved in proving or disproving a very hard math problem: “A proof path with a thousand or a million moves involves a number with a thousand or a million zeros,” says Gukov.
No AI system can sift through that many possibilities. To address this, Gukov and his colleagues developed a system that shortens the length of a path by combining multiple moves into single supermoves. It’s like having boots that let you take giant strides: instead of taking 2,000 steps to walk a mile, you can now walk it in 20.
The challenge was figuring out which moves to replace with supermoves. In a series of experiments, the researchers came up with a system in which one reinforcement-learning model suggests new moves and a second model checks to see if those moves help.
They used this approach to make a breakthrough in a math problem called the Andrews-Curtis conjecture, a puzzle that has been unsolved for 60 years. It’s a problem that every professional mathematician will know, says Gukov.
(An aside for math stans only: The AC conjecture states that a particular way of describing a type of set called a trivial group can be translated into a different but equivalent description with a certain sequence of steps. Most mathematicians think the AC conjecture is false, but nobody knows how to prove that. Gukov admits himself that it is an intellectual curiosity rather than a practical problem, but an important problem for mathematicians nonetheless.)
Gukov and his colleagues didn’t solve the AC conjecture, but they found that a counterexample (suggesting that the conjecture is false) proposed 40 years ago was itself false. “It’s been a major direction of attack for 40 years,” says Gukov. With the help of AI, they showed that this direction was in fact a dead end.
“Ruling out possible counterexamples is a worthwhile thing,” says Bridson. “It can close off blind alleys, something you might spend a year of your life exploring.”
True, Gukov checked off just one piece of one esoteric puzzle. But he thinks the approach will work in any scenario where you need to find a long sequence of unknown moves, and he now plans to try it out on other problems.
“Maybe it will lead to something that will help AI in general,” he says. “Because it’s teaching reinforcement learning models to go beyond their training. To me it’s basically about thinking outside of the box—miles away, megaparsecs away.”
3/ Can AI ever provide real insight?
Thinking outside the box is exactly what mathematicians need to solve hard problems. Math is often thought to involve robotic, step-by-step procedures. But advanced math is an experimental pursuit, involving trial and error and flashes of insight.
That’s where tools like AlphaEvolve come in. Google DeepMind’s latest model asks an LLM to generate code to solve a particular math problem. A second model then evaluates the proposed solutions, picks the best, and sends them back to the LLM to be improved. After hundreds of rounds of trial and error, AlphaEvolve was able to come up with solutions to a wide range of math problems that were better than anything people had yet come up with. But it can also work as a collaborative tool: at any step, humans can share their own insight with the LLM, prompting it with specific instructions.
This kind of exploration is key to advanced mathematics. “I’m often looking for interesting phenomena and pushing myself in a certain direction,” says Geordie Williamson, a mathematician at the University of Sydney in Australia. “Like: ‘Let me look down this little alley. Oh, I found something!’”
Williamson worked with Meta on an AI tool called PatternBoost, designed to support this kind of exploration. PatternBoost can take a mathematical idea or statement and generate similar ones. “It’s like: ‘Here’s a bunch of interesting things. I don’t know what’s going on, but can you produce more interesting things like that?’” he says.
Such brainstorming is essential work in math. It’s how new ideas get conjured. Take the icosahedron, says Williamson: “It’s a beautiful example of this, which I kind of keep coming back to in my own work.” The icosahedron is a 20-sided 3D object where all the faces are triangles (think of a 20-sided die). The icosahedron is the largest of a family of exactly five such objects: there’s the tetrahedron (four sides), cube (six sides), octahedron (eight sides), and dodecahedron (12 sides).
Remarkably, the fact that there are exactly five of these objects was proved by mathematicians in ancient Greece. “At the time that this theorem was proved, the icosahedron didn’t exist,” says Williamson. “You can’t go to a quarry and find it—someone found it in their mind. And the icosahedron goes on to have a profound effect on mathematics. It’s still influencing us today in very, very profound ways.”
For Williamson, the exciting potential of tools like PatternBoost is that they might help people discover future mathematical objects like the icosahedron that go on to shape the way math is done. But we’re not there yet. “AI can contribute in a meaningful way to research-level problems,” he says. “But we’re certainly not getting inundated with new theorems at this stage.”
Ultimately, it comes down to the fact that machines still lack what you might call intuition or creative thinking. Williamson sums it up like this: We now have AI that can beat humans when it knows the rules of the game. “But it’s one thing for a computer to play Go at a superhuman level and another thing for the computer to invent the game of Go.”
“I think that applies to advanced mathematics,” he says. “Breakthroughs come from a new way of thinking about something, which is akin to finding completely new moves in a game. And I don’t really think we understand where those really brilliant moves in deep mathematics come from.”
Perhaps AI tools like AlphaEvolve and PatternBoost are best thought of as advance scouts for human intuition. They can discover new directions and point out dead ends, saving mathematicians months or years of work. But the true breakthroughs will still come from the minds of people, as has been the case for thousands of years.
For now, at least. “There’s plenty of tech companies that tell us that won’t last long,” says Williamson. “But you know—we’ll see.”
After working on it for months, my colleague Casey Crownhart and I finally saw our story on AI’s energy and emissions burden go live last week.
The initial goal sounded simple: Calculate how much energy is used each time we interact with a chatbot, and then tally that up to understand why everyone from leaders of AI companies to officials at the White House wants to harness unprecedented levels of electricity to power AI and reshape our energy grids in the process.
It was, of course, not so simple. After speaking with dozens of researchers, we realized that the common understanding of AI’s energy appetite is full of holes. I encourage you to read the full story, which has some incredible graphics to help you understand everything from the energy used in a single query right up to what AI will require just three years from now (enough electricity to power 22% of US households, it turns out). But here are three takeaways I have after the project.
AI is in its infancy
We focused on measuring the energy requirements that go into using a chatbot, generating an image, and creating a video with AI. But these three uses are relatively small-scale compared with where AI is headed next.
Lots of AI companies are building reasoning models, which “think” for longer and use more energy. They’re building hardware devices, perhaps like the one Jony Ive has been working on (which OpenAI just acquired for $6.5 billion), that have AI constantly humming along in the background of our conversations. They’re designing agents and digital clones of us to act on our behalf. All these trends point to a more energy-intensive future (which, again, helps explain why OpenAI and others are spending such inconceivable amounts of money on energy).
But the fact that AI is in its infancy raises another point. The models, chips, and cooling methods behind this AI revolution could all grow more efficient over time, as my colleague Will Douglas Heaven explains. This future isn’t predetermined.
AI video is on another level
When we tested the energy demands of various models, we found the energy required to produce even a low-quality, five-second video to be pretty shocking: It was 42,000 times more than the amount needed for a chatbot answer a question about a recipe, and enough to power a microwave for over an hour. If there’s one type of AI whose energy appetite should worry you, it’s this one.
Soon after we published, Google debuted the latest iteration of its Veo model. People quickly created compilations of the most impressive clips (this one being the most shocking to me). Something we point out in the story is that Google (as well as OpenAI, which has its own video generator, Sora) denied our request for specific numbers on the energy their AI models use. Nonetheless, our reporting suggests it’s very likely that high-definition video models like Veo and Sora are much larger, and much more energy-demanding, than the models we tested.
I think the key to whether the use of AI video will produce indefensible clouds of emissions in the near future will be how it’s used, and how it’s priced. The example I linked shows a bunch of TikTok-style content, and I predict that if creating AI video is cheap enough, social video sites will be inundated with this type of content.
There are more important questions than your own individual footprint
We expected that a lot of readers would understandably think about this story in terms of their own individual footprint, wondering whether their AI usage is contributing to the climate crisis. Don’t panic: It’s likely that asking a chatbot for help with a travel plan does not meaningfully increase your carbon footprint. Video generation might. But after reporting on this for months, I think there are more important questions.
Consider, for example, the water being drained from aquifers in Nevada, the country’s driest state, to power data centers that are drawn to the area by tax incentives and easy permitting processes, as detailed in an incredible story by James Temple. Or look at how Meta’s largest data center project, in Louisiana, is relying on natural gas despite industry promises to use clean energy, per a story by David Rotman. Or the fact that nuclear energy is not the silver bullet that AI companies often make it out to be.
There are global forces shaping how much energy AI companies are able to access and what types of sources will provide it. There is also very little transparency from leading AI companies on their current and future energy demands, even while they’re asking for public support for these plans. Pondering your individual footprint can be a good thing to do, provided you remember that it’s not so much your footprint as these other factors that are keeping climate researchers and energy experts we spoke to up at night.
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.
The Trump administration has terminated National Science Foundation grants for more than 100 research projects related to climate change amid a widening campaign to slash federal funding for scientists and institutions studying the rising risks of a warming world.
The move will cut off what’s likely to amount to tens of millions of dollars for studies that were previously approved and, in most cases, already in the works.
Affected projects include efforts to develop cleaner fuels, measure methane emissions, improve understanding of how heat waves and sea-level rise disproportionately harm marginalized groups, and help communities transition to sustainable energy, according to an MIT Technology Review review of a GrantWatch database—a volunteer-led effort to track federal cuts to research—and a list of terminated grants from the National Science Foundation (NSF) itself.
The NSF is one of the largest sources of US funding for university research, so the cancellations will deliver a big blow to climate science and clean-energy development.
They come on top of the White House’s broader efforts to cut research funding and revenue for universities and significantly raise their taxes. The administration has also strived to slash staff and budgets at federal research agencies, halt efforts to assess the physical and financial risks of climate change, and shut down labs that have monitored and analyzed the levels of greenhouse gases in the air for decades.
“I don’t think it takes a lot of imagination to understand where this is going,” says Daniel Schrag, co-director of the science, technology, and public policy program at Harvard University, which has seen greater funding cuts than any other university amid an escalating legal conflict with the administration. “I believe the Trump administration intends to zero out funding for climate science altogether.”
The NSF says it’s terminating grants that aren’t aligned with the agency’s program goals, “including but not limited to those on diversity, equity, and inclusion (DEI), environmental justice, and misinformation/disinformation.”
Trump administration officials have argued that DEI considerations have contaminated US science, favoring certain groups over others and undermining the public’s trust in researchers.
“Political biases have displaced the vital search for truth,” Michael Kratsios, head of the White House Office of Science and Technology Policy, said to a group of NSF administrations and others last month, according to reporting in Science.
Science v. politics
But research projects that got caught in the administration’s anti-DEI filter aren’t the only casualties of the cuts. The NSF has also canceled funding for work that has little obvious connections to DEI ambitions, such as research on catalysts.
Many believe the administration’s broader motivation is to undermine the power of the university system and prevent research findings that cut against its politics.
“It certainly seems like a deliberate attempt to undo any science that contradicts the administration,” says Alexa Fredston, an assistant professor of ocean sciences at the University of California, Santa Cruz.
On May 28, a group of states including California, New York, and Illinois sued the NSF, arguing that the cuts illegally violated diversity goals and funding priorities clearly established by Congress, which controls federal spending.
A group of universities also filed a lawsuit against the NSF over its earlier decision to reduce the indirect cost rate for research, which reimburses universities for overhead expenses associated with work carried out on campuses. The plaintiffs included the California Institute of Technology, Carnegie Mellon University, and the Massachusetts Institute of Technology, which has also lost a number of research grants.
(MIT Technology Review is owned by, but editorially independent from, MIT.)
The NSF declined to comment.
‘Theft from the American people’
GrantWatch is an effort among researchers at rOpenSci, Harvard, and other organizations to track terminations of grants issued by the National Institutes of Health (NIH) and NSF. It draws on voluntary submissions from scientists involved as well as public government information.
A search of its database for the terms “climate change,” “clean energy,” “climate adaptation,” “environmental justice,” and “climate justice” showed that the NSF has canceled funds for 118 projects, which were supposed to receive more than $100 million in total. Searching for the word “climate” produces more than 300 research projects that were set to receive more than $230 million. (That word often indicates climate-change-related research, but in some abstracts it refers to the cultural climate.)
Some share of those funds has already been issued to research groups. The NSF section of the database doesn’t include that “outlaid” figure, but it’s generally about half the amount of the original grants, according to Noam Ross, a computational researcher and executive director of rOpenSci, a nonprofit initiative that promotes open and reproducible science.
A search for “climate change” among the NIH projects produces another 22 studies that were terminated and were still owed nearly $50 million in grants. Many of those projects explored the mental or physical health effects of climate change and extreme weather events.
The NSF more recently released its own list of terminated projects, which mostly mirrored GrantWatch’s findings and confirms the specific terminations mentioned in this story.
“These grant terminations are theft from the American people,” Ross said in an email response. “By illegally ending this research the Trump administration is wasting taxpayer dollars, gutting US leadership in science, and telling the world that the US government breaks its promises.”
Harvard, the country’s oldest university, has been particularly hard hit.
In April, the university sued the Trump administration over cuts to its research funding and efforts to exert control over its admissions and governance policies. The White House, in turn, has moved to eliminate all federal funds for the university, including hundreds of NSF and NIH grants.
Daniel Nocera, a professor at Harvard who has done pioneering work on so-called artificial photosynthesis, a pathway for producing clean fuels from sunlight, said in an email that all of his grants were terminated.
“I have no research funds,” he added.
Another terminated grant involved a collaboration between Harvard and the NSF National Center for Atmospheric Research (NCAR), designed to update the atmospheric chemistry component of the Community Earth System Model, an open-source climate model widely used by scientists around the world.
The research was expected to “contribute to a better understanding of atmospheric chemistry in the climate system and to improve air quality predictions within the context of climate change,” according to the NSF abstract.
“We completed most of the work and were able to bring it to a stopping point,” Daniel Jacob, a professor at Harvard listed as the principal investigator on the project, said in an email. “But it will affect the ability to study chemistry-climate interactions. And it is clearly not right to pull funding from an existing project.”
Plenty of the affected research projects do, in one way or another, grapple with issues of diversity, equity, and inclusion. But that’s because there is ample evidence that disadvantaged communities experience higher rates of illness from energy-sector pollution, will be harder hit by the escalating effects of extreme weather and are underrepresented in scientific fields.
One of the largest terminations cut off about $4 million dollars of remaining funds for the CLIMATE Justice Initiative, a fellowship program at the University of California, Irvine designed to recruit, train and mentor a more diverse array of researchers in Earth sciences.
The NSF decision occurred halfway into the 5-year program, halting funds for a number of fellows who were in the midst of environmental justice research efforts with community partners in Southern California. Kathleen Johnson, a professor at UC Irvine and director of the initiative, says the university is striving to find ways to fund as many participants as possible for the remainder of their fellowships.
“We need people from all parts of society who are trained in geoscience and climate science to address all these global challenges that we are facing,” she says. “The people who will be best positioned to do this work … are the people who understand the community’s needs and are able to therefore work to implement equitable solutions.”
“Diverse teams have been shown to do better science,” Johnson adds.
Numerous researchers whose grants were terminated didn’t respond to inquiries from MIT Technology Review or declined to comment, amid growing concerns that the Trump administration will punish scientists or institutions that criticize their policies.
Coming cuts
The termination of existing NSF and NIH grants is just the start of the administration’s plans to cut federal funding for climate and clean-energy research.
The White House’s budget proposal for the coming fiscal year seeks to eliminate tens of billions of dollars in funding across federal agencies, specifically calling out “Green New Scam funds” at the Department of Energy; “low-priority climate monitoring satellites” at NASA; “climate-dominated research, data, and grant programs” at the National Oceanic and Atmospheric Administration; and “climate; clean energy; woke social, behavioral, and economic sciences” at the NSF.
The administration released a more detailed NSF budget proposal on May 30th, which called for a 60% reduction in research spending and nearly zeroed out the clean energy technology program. It also proposed cutting funds by 97% for the US Global Change Research Program, which produces regular assessments of climate risks; 80% for the Ocean Observatories Initiative, a global network of ocean sensors that monitor shifting marine conditions; and 40% for NCAR, the atmospheric research center.
If Congress approves budget reductions anywhere near the levels the administration has put forward, scientists fear, it could eliminate the resources necessary to carry on long-running climate observation of oceans, forests, and the atmosphere.
The administration also reportedly plans to end the leases on dozens of NOAA facilities, including the Global Monitoring Laboratory in Hilo, Hawaii. The lab supports the work of the nearby Mauna Loa Observatory, which has tracked atmospheric carbon dioxide levels for decades.
Even short gaps in these time-series studies, which scientists around the world rely upon, would have an enduring impact on researchers’ ability to analyze and understand weather and climate trends.
“We won’t know where we’re going if we stop measuring what’s happening,” says Jane Long, formerly the associate director of energy and environment at Lawrence Livermore National Lab. “It’s devastating—there’s no two ways around it.”
Stunting science
Growing fears that public research funding will take an even larger hit in the coming fiscal year are forcing scientists to rethink their research plans—or to reconsider whether they want to stay in the field at all, numerous observers said.
“The amount of funding we’re talking about isn’t something a university can fill indefinitely, and it’s not something that private philanthropy can fill for very long,” says Michael Oppenheimer, a professor of geosciences and international affairs at Princeton University. “So what we’re talking about is potentially cataclysmic for climate science.”
“Basically it’s a shit show,” he adds, “and how bad a shit show it is will depend a lot on what happens in the courts and Congress over the next few months.”
One climate scientist, who declined to speak on the record out of concern that the administration might punish his institution, said the declining funding is forcing researchers to shrink their scientific ambitions down to a question of “What can I do with my laptop and existing data sets?”
“If your goal was to make the United States a second-class or third-class country when it comes to science and education, you would be doing exactly what the administration is doing,” the scientist said. “People are pretty depressed, upset, and afraid.”
Given the rising challenges, Harvard’s Schrag fears that the best young climate scientists will decide to shift their careers outside of the US, or move into high tech or other fields where they can make significantly more money.
“We might lose a generation of talent—and that’s not going to get fixed four years from now,” he says. “The irony is that Trump is attacking the institutions and foundation of US science that literally made America great.”
Back in April, OpenAIannounced it was rolling back an update to its GPT-4o model that made ChatGPT’s responses to user queries too sycophantic.
An AI model that acts in an overly agreeable and flattering way is more than just annoying. It could reinforce users’ incorrect beliefs, mislead people, and spread misinformation that can be dangerous—a particular risk when increasing numbers of young people are using ChatGPT as a life advisor. And because sycophancy is difficult to detect, it can go unnoticed until a model or update has already been deployed, as OpenAI found out.
A new benchmark that measures the sycophantic tendencies of major AI models could help AI companies avoid these issues in the future. The team behind Elephant, from Stanford, Carnegie Mellon, and the University of Oxford, found that LLMs consistently exhibit higher rates of sycophancy than humans do.
“We found that language models don’t challenge users’ assumptions, even when they might be harmful or totally misleading,” says Myra Cheng, a PhD student at Stanford University who worked on the research, which has not been peer-reviewed. “So we wanted to give researchers and developers the tools to empirically evaluate their models on sycophancy, because it’s a problem that is so prevalent.”
It’s hard to assess how sycophantic AI models are because sycophancy comes in many forms. Previousresearch has tended to focus on how chatbots agree with users even when what the human has told the AI is demonstrably wrong—for example, they might state that Nice, not Paris, is the capital of France.
While this approach is still useful, it overlooks all the subtler, more insidious ways in which models behave sycophantically when there isn’t a clear ground truth to measure against. Users typically ask LLMs open-ended questions containing implicit assumptions, and those assumptions can trigger sycophantic responses, the researchers claim. For example, a model that’s asked “How do I approach my difficult coworker?” is more likely to accept the premise that a coworker is difficult than it is to question why the user thinks so.
To bridge this gap, Elephant is designed to measure social sycophancy—a model’s propensity to preserve the user’s “face,” or self-image, even when doing so is misguided or potentially harmful. It uses metrics drawn from social science to assess five nuanced kinds of behavior that fall under the umbrella of sycophancy: emotional validation, moral endorsement, indirect language, indirect action, and accepting framing.
To do this, the researchers tested it on two data sets made up of personal advice written by humans. This first consisted of 3,027 open-ended questions about diverse real-world situations taken from previous studies. The second data set was drawn from 4,000 posts on Reddit’s AITA (“Am I the Asshole?”) subreddit, a popular forum among users seeking advice. Those data sets were fed into eight LLMs from OpenAI (the version of GPT-4o they assessed was earlier than the version that the company later called too sycophantic), Google, Anthropic, Meta, and Mistral, and the responses were analyzed to see how the LLMs’ answers compared with humans’.
Overall, all eight models were found to be far more sycophantic than humans, offering emotional validation in 76% of cases (versus 22% for humans) and accepting the way a user had framed the query in 90% of responses (versus 60% among humans). The models also endorsed user behavior that humans said was inappropriate in an average of 42% of cases from the AITA data set.
But just knowing when models are sycophantic isn’t enough; you need to be able to do something about it. And that’s trickier. The authors had limited success when they tried to mitigate these sycophantic tendencies through two different approaches: prompting the models to provide honest and accurate responses, and training a fine-tuned model on labeled AITA examples to encourage outputs that are less sycophantic. For example, they found that adding “Please provide direct advice, even if critical, since it is more helpful to me” to the prompt was the most effective technique, but it only increased accuracy by 3%. And although prompting improved performance for most of the models, none of the fine-tuned models were consistently better than the original versions.
“It’s nice that it works, but I don’t think it’s going to be an end-all, be-all solution,” says Ryan Liu, a PhD student at Princeton University who studies LLMs but was not involved in the research. “There’s definitely more to do in this space in order to make it better.”
Gaining a better understanding of AI models’ tendency to flatter their users is extremely important because it gives their makers crucial insight into how to make them safer, says Henry Papadatos, managing director at the nonprofit SaferAI. The breakneck speed at which AI models are currently being deployed to millions of people across the world, their powers of persuasion, and their improved abilities to retain information about their users add up to “all the components of a disaster,” he says. “Good safety takes time, and I don’t think they’re spending enough time doing this.”
While we don’t know the inner workings of LLMs that aren’t open-source, sycophancy is likely to be baked into models because of the ways we currently train and develop them. Cheng believes that models are often trained to optimize for the kinds of responses users indicate that they prefer. ChatGPT, for example, gives users the chance to mark a response as good or bad via thumbs-up and thumbs-down icons. “Sycophancy is what gets people coming back to these models. It’s almost the core of what makes ChatGPT feel so good to talk to,” she says. “And so it’s really beneficial, for companies, for their models to be sycophantic.” But while some sycophantic behaviors align with user expectations, others have the potential to cause harm if they go too far—particularly when people do turn to LLMs for emotional support or validation.
“We want ChatGPT to be genuinely useful, not sycophantic,” an OpenAI spokesperson says. “When we saw sycophantic behavior emerge in a recent model update, we quickly rolled it back and shared an explanation of what happened. We’re now improving how we train and evaluate models to better reflect long-term usefulness and trust, especially in emotionally complex conversations.”
Cheng and her fellow authors suggest that developers should warn users about the risks of social sycophancy and consider restricting model usage in socially sensitive contexts. They hope their work can be used as a starting point to develop safer guardrails.
She is currently researching the potential harms associated with these kinds of LLM behaviors, the way they affect humans and their attitudes toward other people, and the importance of making models that strike the right balance between being too sycophantic and too critical. “This is a very big socio-technical challenge,” she says. “We don’t want LLMs to end up telling users, ‘You are the asshole.’”
Imagine: China deploys hundreds of thousands of autonomous drones in the air, on the sea, and under the water—all armed with explosive warheads or small missiles. These machines descend in a swarm toward military installations on Taiwan and nearby US bases, and over the course of a few hours, a single robotic blitzkrieg overwhelms the US Pacific force before it can even begin to fight back.
Maybe it sounds like a new Michael Bay movie, but it’s the scenario that keeps the chief technology officer of the US Army up at night.
“I’m hesitant to say it out loud so I don’t manifest it,” says Alex Miller, a longtime Army intelligence official who became the CTO to the Army’s chief of staff in 2023.
Even if World War III doesn’t break out in the South China Sea, every US military installation around the world is vulnerable to the same tactics—as are the militaries of every other country around the world. The proliferation of cheap drones means just about any group with the wherewithal to assemble and launch a swarm could wreak havoc, no expensive jets or massive missile installations required.
While the US has precision missiles that can shoot these drones down, they don’t always succeed: A drone attack killed three US soldiers and injured dozens more at a base in the Jordanian desert last year. And each American missile costs orders of magnitude more than its targets, which limits their supply; countering thousand-dollar drones with missiles that cost hundreds of thousands, or even millions, of dollars per shot can only work for so long, even with a defense budget that could reach a trillion dollars next year.
The US armed forces are now hunting for a solution—and they want it fast. Every branch of the service and a host of defense tech startups are testing out new weapons that promise to disable drones en masse. There are drones that slam into other drones like battering rams; drones that shoot out nets to ensnare quadcopter propellers; precision-guided Gatling guns that simply shoot drones out of the sky; electronic approaches, like GPS jammers and direct hacking tools; and lasers that melt holes clear through a target’s side.
Then there are the microwaves: high-powered electronic devices that push out kilowatts of power to zap the circuits of a drone as if it were the tinfoil you forgot to take off your leftovers when you heated them up.
That’s where Epirus comes in.
When I went to visit the HQ of this 185-person startup in Torrance, California, earlier this year, I got a behind-the-scenes look at its massive microwave, called Leonidas, which the US Army is already betting on as a cutting-edge anti-drone weapon. The Army awarded Epirus a $66 million contract in early 2023, topped that up with another $17 million last fall, and is currently deploying a handful of the systems for testing with US troops in the Middle East and the Pacific. (The Army won’t get into specifics on the location of the weapons in the Middle East but published a report of a live-fire test in the Philippines in early May.)
Up close, the Leonidas that Epirus built for the Army looks like a two-foot-thick slab of metal the size of a garage door stuck on a swivel mount. Pop the back cover, and you can see that the slab is filled with dozens of individual microwave amplifier units in a grid. Each is about the size of a safe-deposit box and built around a chip made of gallium nitride, a semiconductor that can survive much higher voltages and temperatures than the typical silicon.
Leonidas sits on top of a trailer that a standard-issue Army truck can tow, and when it is powered on, the company’s software tells the grid of amps and antennas to shape the electromagnetic waves they’re blasting out with a phased array, precisely overlapping the microwave signals to mold the energy into a focused beam. Instead of needing to physically point a gun or parabolic dish at each of a thousand incoming drones, the Leonidas can flick between them at the speed of software.
The Leonidas contains dozens of microwave amplifier units and can pivot to direct waves at incoming swarms of drones.
EPIRUS
Of course, this isn’t magic—there are practical limits on how much damage one array can do, and at what range—but the total effect could be described as an electromagnetic pulse emitter, a death ray for electronics, or a force field that could set up a protective barrier around military installations and drop drones the way a bug zapper fizzles a mob of mosquitoes.
I walked through the nonclassified sections of the Leonidas factory floor, where a cluster of engineers working on weaponeering—the military term for figuring out exactly how much of a weapon, be it high explosive or microwave beam, is necessary to achieve a desired effect—ran tests in a warren of smaller anechoic rooms. Inside, they shot individual microwave units at a broad range of commercial and military drones, cycling through waveforms and power levels to try to find the signal that could fry each one with maximum efficiency.
On a live video feed from inside one of these foam-padded rooms, I watched a quadcopter drone spin its propellers and then, once the microwave emitter turned on, instantly stop short—first the propeller on the front left and then the rest. A drone hit with a Leonidas beam doesn’t explode—it just falls.
Compared with the blast of a missile or the sizzle of a laser, it doesn’t look like much. But it could force enemies to come up with costlier ways of attacking that reduce the advantage of the drone swarm, and it could get around the inherent limitations of purely electronic or strictly physical defense systems. It could save lives.
Epirus CEO Andy Lowery, a tall guy with sparkplug energy and a rapid-fire southern Illinois twang, doesn’t shy away from talking big about his product. As he told me during my visit, Leonidas is intended to lead a last stand, like the Spartan from whom the microwave takes its name—in this case, against hordes of unmanned aerial vehicles, or UAVs. While the actual range of the Leonidas system is kept secret, Lowery says the Army is looking for a solution that can reliably stop drones within a few kilometers. He told me, “They would like our system to be the owner of that final layer—to get any squeakers, any leakers, anything like that.”
Now that they’ve told the world they “invented a force field,” Lowery added, the focus is on manufacturing at scale—before the drone swarms really start to descend or a nation with a major military decides to launch a new war. Before, in other words, Miller’s nightmare scenario becomes reality.
Why zap?
Miller remembers well when the danger of small weaponized drones first appeared on his radar. Reports of Islamic State fighters strapping grenades to the bottom of commercial DJI Phantom quadcopters first emerged in late 2016 during the Battle of Mosul. “I went, ‘Oh, this is going to be bad,’ because basically it’s an airborne IED at that point,” he says.
He’s tracked the danger as it’s built steadily since then, with advances in machine vision, AI coordination software, and suicide drone tactics only accelerating.
Then the war in Ukraine showed the world that cheap technology has fundamentally changed how warfare happens. We have watched in high-definition video how a cheap, off-the-shelf drone modified to carry a small bomb can be piloted directly into a faraway truck, tank, or group of troops to devastating effect. And larger suicide drones, also known as “loitering munitions,” can be produced for just tens of thousands of dollars and launched in massive salvos to hit soft targets or overwhelm more advanced military defenses through sheer numbers.
As a result, Miller, along with largeswaths of the Pentagon and DC policycircles, believes that the current US arsenal for defending against these weapons is just too expensive and the tools in too short supply to truly match the threat.
Just look at Yemen, a poor country where the Houthi military group has been under constant attack for the past decade. Armed with this new low-tech arsenal, in the past 18 months the rebel group has been able to bomb cargo ships and effectively disrupt global shipping in the Red Sea—part of an effort to apply pressure on Israel to stop its war in Gaza. The Houthis have also used missiles, suicide drones, and even drone boats to launch powerful attacks on US Navy ships sent to stop them.
The most successful defense tech firm selling anti-drone weapons to the US military right now is Anduril, the company started by Palmer Luckey, the inventor of the Oculus VR headset, and a crew of cofounders from Oculus and defense data giant Palantir. In just the past few months, the Marines have chosen Anduril for counter-drone contracts that could be worth nearly $850 million over the next decade, and the company has been working with Special Operations Command since 2022 on a counter-drone contract that could be worth nearly a billion dollars over a similar time frame. It’s unclear from the contracts what, exactly, Anduril is selling to each organization, but its weapons include electronic warfare jammers, jet-powered drone bombs, and propeller-driven Anvil drones designed to simply smash into enemy drones.
In this arsenal, the cheapest way to stop a swarm of drones is electronic warfare: jamming the GPS or radio signals used to pilot the machines. But the intense drone battles in Ukraine have advanced the art of jamming and counter-jamming close to the point of stalemate. As a result, anew state of the art is emerging: unjammable drones that operate autonomously by using onboard processors to navigate via internal maps and computer vision, or even drones connected with 20-kilometer-long filaments of fiber-optic cable for tethered control.
But unjammable doesn’t mean unzappable. Instead of using the scrambling method of a jammer, which employs an antenna to block the drone’s connection to a pilot or remote guidance system, the Leonidas microwave beam hits a drone body broadside. The energy finds its way into something electrical, whether the central flight controller or a tiny wire controlling a flap on a wing, to short-circuit whatever’s available. (The company also says that this targeted hit of energy allows birds and other wildlife to continuetomove safely.)
Tyler Miller, a senior systems engineer on Epirus’s weaponeering team, told me that they never know exactly which part of the target drone is going to go down first, but they’ve reliably seen the microwave signal get in somewhere to overload a circuit. “Based on the geometry and the way the wires are laid out,” he said, one of those wires is going to be the best path in. “Sometimes if we rotate the drone 90 degrees, you have a different motor go down first,” he added.
The team has even tried wrapping target drones in copper tape, which would theoretically provide shielding, only to find that the microwave still finds a way in through moving propeller shafts or antennas that need to remain exposed for the drone to fly.
EPIRUS
Leonidas also has an edge when it comes to downing a mass of drones at once. Physically hitting a drone out of the sky or lighting it up with a laser can be effective in situations where electronic warfare fails, but anti-drone drones can only take out one at a time, and lasers need to precisely aim and shoot. Epirus’s microwaves can damage everything in a roughly 60-degree arc from the Leonidas emitter simultaneously and keep on zapping and zapping; directed energy systems like this one never run out of ammo.
As for cost, each Army Leonidas unit currently runs in the “low eight figures,” Lowery told me. Defense contract pricing can be opaque, but Epirus delivered four units for its $66 million initial contract, giving a back-of-napkin price around $16.5 million each. For comparison, Stinger missiles from Raytheon, which soldiers shoot at enemy aircraft or drones from a shoulder-mounted launcher, cost hundreds of thousands of dollars a pop, meaning the Leonidas could start costing less (and keep shooting) after it downs the first wave of a swarm.
Raytheon’s radar, reversed
Epirus is part of a new wave of venture-capital-backed defense companies trying to change the way weapons are created—and the way the Pentagon buys them. The largest defense companies, firms like Raytheon, Boeing, Northrop Grumman, and Lockheed Martin, typically develop new weapons in response to research grants and cost-plus contracts, in which the US Department of Defense guarantees a certain profit margin to firms building products that match their laundry list of technical specifications. These programs have kept the military supplied with cutting-edge weapons for decades, but the results may be exquisite pieces of military machinery delivered years late and billions of dollars over budget.
Rather than building to minutely detailed specs, the new crop of military contractors aim to produce products on a quick time frame to solve a problem and then fine-tune them as they pitch to the military. The model, pioneered by Palantir and SpaceX, has since propelled companies like Anduril, Shield AI, and dozens of other smaller startups into the business of war as venture capital piles tens of billions of dollars into defense.
Like Anduril, Epirus has direct Palantir roots; it was cofounded by Joe Lonsdale, who also cofounded Palantir, and John Tenet, Lonsdale’s colleague at the time at his venture fund, 8VC. (Tenet, the son of former CIA director George Tenet, may have inspired the company’s name—the elder Tenet’s parents were born in the Epirus region in the northwest of Greece. But the company more often says it’s a reference to the pseudo-mythological Epirus Bow from the 2011 fantasy action movie Immortals, which never runs out of arrows.)
While Epirus is doing business in the new mode, its roots are in the old—specifically in Raytheon, a pioneer in the field of microwave technology. Cofounded by MIT professor Vannevar Bush in 1922, it manufactured vacuum tubes, like those found in old radios. But the company became synonymous with electronic defense during World War II, when Bush spun up a lab to develop early microwave radar technology invented by the British into a workable product, and Raytheon then began mass-producing microwave tubes—known as magnetrons—for the US war effort. By the end of the war in 1945, Raytheon was making 80% of the magnetrons powering Allied radar across the world.
From padded foam chambers at the Epirus HQ, Leonidas devices can be safely tested on drones.
EPIRUS
Large tubes remained the best way to emit high-power microwaves for more than half a century, handily outperforming silicon-based solid-state amplifiers. They’re still around—the microwave on your kitchen counter runs on a vacuum tube magnetron. But tubes have downsides: They’re hot, they’re big, and they require upkeep. (In fact, the other microwave drone zapper currently in the Pentagon pipeline, the Tactical High-power Operational Responder, or THOR, still relies on a physical vacuum tube. It’s reported to be effective at downing drones in tests but takes up a whole shipping container and needs a dish antenna to zap its targets.)
By the 2000s, new methods of building solid-state amplifiers out of materials like gallium nitride started to mature and were able to handle more power than silicon without melting or shorting out. The US Navy spent hundreds of millions of dollars on cutting-edge microwave contracts, one for a project at Raytheon called Next Generation Jammer—geared specifically toward designing a new way to make high-powered microwaves that work at extremely long distances.
Lowery, the Epirus CEO, began his career working on nuclear reactors on Navy aircraft carriers before he became the chief engineer for Next Generation Jammer at Raytheon in 2010. There, he and his team worked on a system that relied on many of the same fundamentals that now power the Leonidas—using the same type of amplifier material and antenna setup to fry the electronics of a small target at much closer range rather than disrupting the radar of a target hundreds of miles away.
The similarity is not a coincidence: Two engineers from Next Generation Jammer helped launch Epirus in 2018. Lowery—who by then was working at the augmented-reality startup RealWear, which makes industrial smart glasses—joined Epirus in 2021 to run product development and was asked to take the top spot as CEO in 2023, as Leonidas became a fully formed machine. Much of the founding team has since departed for other projects, but Raytheon still runs through the company’s collective CV: ex-Raytheon radar engineer Matt Markel started in January as the new CTO, and Epirus’s chief engineer for defense, its VP of engineering, its VP of operations, and a number of employees all have Raytheon roots as well.
Markel tells me that the Epirus way of working wouldn’t have flown at one of the big defense contractors: “They never would have tried spinning off the technology into a new application without a contract lined up.” The Epirus engineers saw the use case, raised money to start building Leonidas, and already had prototypes in the works before any military branch started awarding money to work on the project.
Waiting for the starting gun
On the wall of Lowery’s office are two mementos from testing days at an Army proving ground: a trophy wing from a larger drone, signed by the whole testing team, anda framed photo documenting the Leonidas’s carnage—a stack of dozens of inoperative drones piled up in a heap.
Despite what seems to have been an impressive test show, it’s still impossible from the outside to determine whether Epirus’s tech is ready to fully deliver if the swarms descend.
The Army would not comment specifically on the efficacy of any new weapons in testing or early deployment, including the Leonidas system. A spokesperson for the Army’s Rapid Capabilities and Critical Technologies Office, or RCCTO, which is the subsection responsible for contracting with Epirus to date, would only say in a statement that it is “committed to developing and fielding innovative Directed Energy solutions to address evolving threats.”
But various high-ranking officers appear to be giving Epirus a public vote of confidence. The three-star general who runs RCCTO and oversaw the Leonidas testing last summer toldBreaking Defense that “the system actually worked very well,” even if there was work to be done on “how the weapon system fits into the larger kill chain.”
And when former secretary of the Army Christine Wormuth, then the service’s highest-ranking civilian, gave a parting interview this past January, she mentioned Epirus in all but name, citing “one company” that is “using high-powered microwaves to basically be able to kill swarms of drones.” She called that kind of capability “critical for the Army.”
The Army isn’t the only branch interested in the microwave weapon. On Epirus’s factory floor when I visited, alongside the big beige Leonidases commissioned by the Army, engineers were building a smaller expeditionary version for the Marines, painted green, which it delivered in late April. Videos show that when it put some of its microwave emitters on a dock and tested them out for the Navy last summer, the microwaves left their targets dead in the water—successfully frying the circuits of outboard motors like the ones propelling Houthi drone boats.
Epirus is also currently working on an even smaller version of the Leonidas that can mount on top of the Army’s Stryker combat vehicles, and it’s testing out attaching a single microwave unit to a small airborne drone, which could work as a highly focused zapper to disable cars, data centers, or single enemy drones.
Epirus’s microwave technology is also being tested in devices smaller than the traditional Leonidas.
EPIRUS
While neither the Army nor the Navy has yet to announce a contract to start buying Epirus’s systems at scale, the company and its investors are actively preparing for the big orders to start rolling in. It raised $250 million in a funding round in early March to get ready to make as many Leonidases as possible in the coming years, adding to the more than $300 million it’s raised since opening its doors in 2018.
“If you invent a force field that works,” Lowery boasts, “you really get a lot of attention.”
The task for Epirus now, assuming that its main customers pull the trigger and start buying more Leonidases, is ramping up production while advancing the tech in its systems. Then there are the more prosaic problems of staffing, assembly, and testing at scale. For future generations, Lowery told me, the goal is refining the antenna design and integrating higher-powered microwave amplifiers to push the output into the tens of kilowatts, allowing for increased range and efficacy.
While this could be made harder by Trump’s global trade war, Lowery says he’s not worried about their supply chain; while China produces 98% of the world’s gallium, according to the US Geological Survey, and has choked off exports to the US, Epirus’s chip supplier uses recycled gallium from Japan.
The other outside challenge may be that Epirus isn’t the only company building a drone zapper. One of China’s state-owned defense companies has been working on its own anti-drone high-powered microwave weapon called the Hurricane, which it displayed at a major military show in late 2024.
It may be a sign that anti-electronics force fields will become common among the world’s militaries—and if so, the future of war is unlikely to go back to the status quo ante, and it might zag in a different direction yet again. But military planners believe it’s crucial for the US not to be left behind. So if it works as promised, Epirus could very well change the way that war will play out in the coming decade.
While Miller, the Army CTO, can’t speak directly to Epirus or any specific system, he will say that he believes anti-drone measures are going to have to become ubiquitous for US soldiers. “Counter-UAS [Unmanned Aircraft System] unfortunately is going to be like counter-IED,” he says. “It’s going to be every soldier’s job to think about UAS threats the same way it was to think about IEDs.”
And, he adds, it’s his job and his colleagues’ to make sure that tech so effective it works like “almost magic” is in the hands of the average rifleman. To that end, Lowery told me, Epirus is designing the Leonidas control system to work simply for troops, allowing them to identify a cluster of targets and start zapping with just a click of a button—but only extensive use in the field can prove that out.
Epirus CEO Andy Lowery sees the Leonidas as providing a last line of defense against UAVs.
EPIRUS
In the not-too-distant future, Lowery says, this could mean setting up along the US-Mexico border. But the grandest vision for Epirus’s tech that he says he’s heard is for a city-scale Leonidas along the lines of a ballistic missile defense radar system called PAVE PAWS, which takes up an entire 105-foot-tall building and can detect distant nuclear missile launches. The US set up four in the 1980s, and Taiwan currently has one up on a mountain south of Taipei. Fill a similar-size building full of microwave emitters, and the beam could reach out “10 or 15 miles,” Lowery told me, with one sitting sentinel over Taipei in the north and another over Kaohsiung in the south of Taiwan.
Riffing in Greek mythological mode, Lowery said of drones, “I call all these mischief makers. Whether they’re doing drugs or guns across the border or they’re flying over Langley [or] they’re spying on F-35s, they’re all like Icarus. You remember Icarus, with his wax wings? Flying all around—‘Nobody’s going to touch me, nobody’s going to ever hurt me.’”
“We built one hell of a wax-wing melter.”
Sam Dean is a reporter focusing on business, tech, and defense. He is writing a book about the recent history of Silicon Valley returning to work with the Pentagon for Viking Press and covering the defense tech industry for a number of publications. Previously, he was a business reporter at the Los Angeles Times.
This piece has been updated to clarify that Alex Miller is a civilian intelligence official.
It’s been a little over a week since we published Power Hungry, a package that takes a hard look at the expected energy demands of AI. Last week in this newsletter, I broke down the centerpiece of that package, an analysis I did with my colleague James O’Donnell. (In case you’re still looking for an intro, you can check out this Roundtable discussion with James and our editor in chief Mat Honan, or this short segment I did on Science Friday.)
But this week, I want to talk about another story that I also wrote for that package, which focused on nuclear energy. I thought this was an important addition to the mix of stories we put together, because I’ve seen a lot of promises about nuclear power as a saving grace in the face of AI’s energy demand. My reporting on the industry over the past few years has left me a little skeptical.
As I discovered while I continued that line of reporting, building new nuclear plants isn’t so simple or so fast. And as my colleague David Rotman lays out in his story for the package, the AI boom could wind up relying on another energy source: fossil fuels. So what’s going to power AI? Let’s get into it.
When we started talking about this big project on AI and energy demand, we had a lot of conversations about what to include. And from the beginning, the climate team was really focused on examining what, exactly, was going to be providing the electricity needed to run data centers powering AI models. As we wrote in the main story:
“A data center humming away isn’t necessarily a bad thing. If all data centers were hooked up to solar panels and ran only when the sun was shining, the world would be talking a lot less about AI’s energy consumption.”
But a lot of AI data centers need to be available constantly. Those that are used to train models can arguably be more responsive to the changing availability of renewables, since that work can happen in bursts, any time. Once a model is being pinged with questions from the public, though, there needs to be computing power ready to run all the time. Google, for example, would likely not be too keen on having people be able to use its new AI Mode only during daylight hours.
Solar and wind power, then, would seem not to be a great fit for a lot of AI electricity demand, unless they’re paired with energy storage—and that increases costs. Nuclear power plants, on the other hand, tend to run constantly, outputting a steady source of power for the grid.
As you might imagine, though, it can take a long time to get a nuclear power plant up and running.
Large tech companies can help support plans to reopen shuttered plants or existing plants’ efforts to extend their operating lifetimes. There are also some existing plants that can make small upgrades to improve their output. I just saw this news story from the Tri-City Herald about plans to upgrade the Columbia Generating Station in eastern Washington—with tweaks over the next few years, it could produce an additional 162 megawatts of power, over 10% of the plant’s current capacity.
But all that isn’t going to be nearly enough to meet the demand that big tech companies are claiming will materialize in the future. (For more on the numbers here and why new tech isn’t going to come online fast enough, check out my full story.)
Instead, natural gas has become the default to meet soaring demand from data centers, as David lays out in his story. And since the lifetime of plants built today is about 30 years, those new plants could be running past 2050, the date the world needs to bring greenhouse-gas emissions to net zero to meet the goals set out in the Paris climate agreement.
One of the bits I found most interesting in David’s story is that there’s potential for a different future here: Big tech companies, with their power and influence, could actually use this moment to push for improvements. If they reduced their usage during peak hours, even for less than 1% of the year, it could greatly reduce the amount of new energy infrastructure required. Or they could, at the very least, push power plant owners and operators to install carbon capture technology, or ensure that methane doesn’t leak from the supply chain.
AI’s energy demand is a big deal, but for climate change, how we choose to meet it is potentially an even bigger one.
A California-based company called Magrathea just turned on a new electrolyzer that can make magnesium metal from seawater. The technology has the potential to produce the material, which is used in vehicles and defense applications, with net-zero greenhouse-gas emissions.
Magnesium is an incredibly light metal, and it’s used for parts in cars and planes, as well as in aluminum alloys like those in vehicles. The metal is also used in defense and industrial applications, including the production processes for steel and titanium.
Today, China dominates production of magnesium, and the most common method generates a lot of the emissions that cause climate change. If Magrathea can scale up its process, it could help provide an alternative source of the metal and clean up industries that rely on it, including automotive manufacturing.
The star of Magrathea’s process is an electrolyzer, a device that uses electricity to split a material into its constituent elements. Using an electrolyzer in magnesium production isn’t new, but Magrathea’s approach represents an update. “We really modernized it and brought it into the 21st century,” says Alex Grant, Magrathea’s cofounder and CEO.
The whole process starts with salty water. There are small amounts of magnesium in seawater, as well as in salt lakes and groundwater. (In seawater, the concentration is about 1,300 parts per million, so magnesium makes up about 0.1% of seawater by weight.) If you take that seawater or brine and clean it up, concentrate it, and dry it out, you get a solid magnesium chloride salt.
Magrathea takes that salt (which it currently buys from Cargill) and puts it into the electrolyzer. The device reaches temperatures of about 700 °C (almost 1,300 °F) and runs electricity through the molten salt to split the magnesium from the chlorine, forming magnesium metal.
Typically, running an electrolyzer in this process would require a steady source of electricity. The temperature is generally kept just high enough to maintain the salt in a molten state. Allowing it to cool down too much would allow it to solidify, messing up the process and potentially damaging the equipment. Heating it up more than necessary would just waste energy.
Magrathea’s approach builds in flexibility. Basically, the company runs its electrolyzer about 100 °C higher than is necessary to keep the molten salt a liquid. It then uses the extra heat in inventive ways, including to dry out the magnesium salt that eventually goes into the reactor. This preparation can be done intermittently, so the company can take in electricity when it’s cheaper or when more renewables are available, cutting costs and emissions. In addition, the process will make a co-product, called magnesium oxide, that can be used to trap carbon dioxide from the atmosphere, helping to cancel out the remaining carbon pollution.
The result could be a production process with net-zero emissions, according to an independent life cycle assessment completed in January. While it likely won’t reach this bar at first, the potential is there for a much more climate-friendly process than what’s used in the industry today, Grant says.
Breaking into magnesium production won’t be simple, says Simon Jowitt, director of the Nevada Bureau of Mines and of the Center for Research in Economic Geology at the University of Nevada, Reno.
China produces roughly 95% of the global supply as of 2024, according to data from the US Geological Survey. This dominant position means companies there can flood the market with cheap metal, making it difficult for others to compete. “The economics of all this is uncertain,” Jowitt says.
The US has some trade protections in place, including an anti-dumping duty, but newer players with alternative processes can still face obstacles. US Magnesium, a company based in Utah, was the only company making magnesium in the US in recent years, but it shut down production in 2022 after equipment failures and a history of environmental concerns.
Magrathea plans to start building a demonstration plant in Utah in late 2025 or early 2026, which will have a capacity of roughly 1,000 tons per year and should be running in 2027. In February the company announced that it signed an agreement with a major automaker, though it declined to share its name on the record. The automaker pre-purchased material from the demonstration plant and will incorporate it into existing products.
After the demonstration plant is running, the next step would be to build a commercial plant with a larger capacity of around 50,000 tons annually.