Omar Yaghi was a quiet child, diligent, unlikely to roughhouse with his nine siblings. So when he was old enough, his parents tasked him with one of the family’s most vital chores: fetching water. Like most homes in his Palestinian neighborhood in Amman, Jordan, the Yaghis’ had no electricity or running water. At least once every two weeks, the city switched on local taps for a few hours so residents could fill their tanks. Young Omar helped top up the family supply. Decades later, he says he can’t remember once showing up late. The fear of leaving his parents, seven brothers, and two sisters parched kept him punctual.
Yaghi proved so dependable that his father put him in charge of monitoring how much the cattle destined for the family butcher shop ate and drank. The best-quality cuts came from well-fed, hydrated animals—a challenge given that they were raised in arid desert.
Specially designed materials called metal-organic frameworks can pull water from the air like a sponge—and then give it back.
But at 10 years old, Yaghi learned of a different occupation. Hoping to avoid a rambunctious crowd at recess, he found the library doors in his school unbolted and sneaked in. Thumbing through a chemistry textbook, he saw an image he didn’t understand: little balls connected by sticks in fascinating shapes. Molecules. The building blocks of everything.
“I didn’t know what they were, but it captivated my attention,” Yaghi says. “I kept trying to figure out what they might be.”
That’s how he discovered chemistry—or maybe how chemistry discovered him. After coming to the United States and, eventually, a postdoctoral program at Harvard University, Yaghi devoted his career to finding ways to make entirely new and fascinating shapes for those little sticks and balls. In October 2025, he was one of three scientists who won a Nobel Prize in chemistry for identifying metal-organic frameworks, or MOFs—metal ions tethered to organic molecules that form repeating structural landscapes. Today that work is the basis for a new project that sounds like science fiction, or a miracle: conjuring water out of thin air.
When he first started working with MOFs, Yaghi thought they might be able to absorb climate-damaging carbon dioxide—or maybe hold hydrogen molecules, solving the thorny problem of storing that climate-friendly but hard-to-contain fuel. But then, in 2014, Yaghi’s team of researchers at UC Berkeley had an epiphany. The tiny pores in MOFs could be designed so the material would pull water molecules from the air around them, like a sponge—and then, with just a little heat, give back that water as if squeezed dry. Just one gram of a water-absorbing MOF has an internal surface area of roughly 7,000 square meters.
Yaghi wasn’t the first to try to pull potable water from the atmosphere. But his method could do it at lower levels of humidity than rivals—potentially shaking up a tiny, nascent industry that could be critical to humanity in the thirsty decades to come. Now the company he founded, called Atoco, is racing to demonstrate a pair of machines that Yaghi believes could produce clean, fresh, drinkable water virtually anywhere on Earth, without even hooking up to an energy supply.
That’s the goal Yaghi has been working toward for more than a decade now, with the rigid determination that he learned while doing chores in his father’s butcher shop.
“It was in that shop where I learned how to perfect things, how to have a work ethic,” he says. “I learned that a job is not done until it is well done. Don’t start a job unless you can finish it.”
Most of Earth is covered in water, but just 3% of it is fresh, with no salt—the kind of water all terrestrial living things need. Today, desalination plants that take the salt out of seawater provide the bulk of potable water in technologically advanced desert nations like Israel and the United Arab Emirates, but at a high cost. Desalination facilities either heat water to distill out the drinkable stuff or filter it with membranes the salt doesn’t pass through; both methods require a lot of energy and leave behind concentrated brine. Typically desal pumps send that brine back into the ocean, with devastating ecological effects.
Heiner Linke, chair of the Nobel Committee for Chemistry, uses a model to explain how metalorganic frameworks (MOFs) can trap smaller molecules inside. In October 2025, Yaghi and two other scientists won the Nobel Prize in chemistry for identifying MOFs.
JONATHAN NACKSTRAND/GETTY IMAGES
I was talking to Atoco executives about carbon dioxide capture earlier this year when they mentioned the possibility of harvesting water from the atmosphere. Of course my mind immediately jumped to Star Wars, and Luke Skywalker working on his family’s moisture farm, using “vaporators” to pull water from the atmosphere of the arid planet Tatooine. (Other sci-fi fans’ minds might go to Dune, and the water-gathering technology of the Fremen.) Could this possibly be real?
It turns out people have been doing it for millennia. Archaeological evidence of water harvesting from fog dates back as far as 5000 BCE. The ancient Greeks harvested dew, and 500 years ago so did the Inca, using mesh nets and buckets under trees.
Today, harvesting water from the air is a business already worth billions of dollars, say industry analysts—and it’s on track to be worth billions more in the next five years. In part that’s because typical sources of fresh water are in crisis. Less snowfall in mountains during hotter winters means less meltwater in the spring, which means less water downstream. Droughts regularly break records. Rising seas seep into underground aquifers, already drained by farming and sprawling cities. Aging septic tanks leach bacteria into water, and cancer-causing “forever chemicals” are creating what the US Government Accountability Office last year said “may be the biggest water problem since lead.” That doesn’t even get to the emerging catastrophe from microplastics.
So lots of places are turning to atmospheric water harvesting. Watergen, an Israel-based company working on the tech, initially planned on deploying in the arid, poorer parts of the world. Instead, buyers in Europe and the United States have approached the company as a way to ensure a clean supply of water. And one of Watergen’s biggest markets is the wealthy United Arab Emirates. “When you say ‘water crisis,’ it’s not just the lack of water—it’s access to good-quality water,” says Anna Chernyavsky, Watergen’s vice president of marketing.
In other words, the technology “has evolved from lab prototypes to robust, field-deployable systems,” says Guihua Yu, a mechanical engineer at the University of Texas at Austin. “There is still room to improve productivity and energy efficiency in the whole-system level, but so much progress has been steady and encouraging.”
MOFs are just the latest approach to the idea. The first generation of commercial tech depended on compressors and refrigerant chemicals—large-scale versions of the machine that keeps food cold and fresh in your kitchen. Both use electricity and a clot of pipes and exchangers to make cold by phase-shifting a chemical from gas to liquid and back; refrigerators try to limit condensation, and water generators basically try to enhance it.
That’s how Watergen’s tech works: using a compressor and a heat exchanger to wring water from air at humidity levels as low as 20%—Death Valley in the spring. “We’re talking about deserts,” Chernyavsky says. “Below 20%, you get nosebleeds.”
A Watergen unit provides drinking water to students and staff at St. Joseph’s, a girls’ school in Freetown, Sierra Leone. “When you say ‘water crisis,’ it’s not just the lack of water— it’s access to good-quality water,” says Anna Chernyavsky, Watergen’s vice president of marketing.
COURTESY OF WATERGEN
That still might not be good enough. “Refrigeration works pretty well when you are above a certain relative humidity,” says Sameer Rao, a mechanical engineer at the University of Utah who researches atmospheric water harvesting. “As the environment dries out, you go to lower relative humidities, and it becomes harder and harder. In some cases, it’s impossible for refrigeration-based systems to really work.”
So a second wave of technology has found a market. Companies like Source Global use desiccants—substances that absorb moisture from the air, like the silica packets found in vitamin bottles—to pull in moisture and then release it when heated. In theory, the benefit of desiccant-based tech is that it could absorb water at lower humidity levels, and it uses less energy on the front end since it isn’t running a condenser system. Source Global claims its off-grid, solar-powered system is deployed in dozens of countries.
But both technologies still require a lot of energy, either to run the heat exchangers or to generate sufficient heat to release water from the desiccants. MOFs, Yaghi hopes, do not. Now Atoco is trying to prove it. Instead of using heat exchangers to bring the air temperature to dew point or desiccants to attract water from the atmosphere, a system can rely on specially designed MOFs to attract water molecules. Atoco’s prototype version uses an MOF that looks like baby powder, stuck to a surface like glass. The pores in the MOF naturally draw in water molecules but remain open, making it theoretically easy to discharge the water with no more heat than what comes from direct sunlight. Atoco’s industrial-scale design uses electricity to speed up the process, but the company is working on a second design that can operate completely off grid, without any energy input.
Yaghi’s Atoco isn’t the only contender seeking to use MOFs for water harvesting. A competitor, AirJoule, has introduced MOF-based atmospheric water generators in Texas and the UAE and is working with researchers at Arizona State University, planning to deploy more units in the coming months. The company started out trying to build more efficient air-conditioning for electric buses operating on hot, humid city streets. But then founder Matt Jore heard about US government efforts to harvest water from air—and pivoted. The startup’s stock price has been a bit of a roller-coaster, but Jore says the sheer size of the market should keep him in business. Take Maricopa County, encompassing Phoenix and its environs—it uses 1.2 billion gallons of water from its shrinking aquifer every day, and another 874 million gallons from surface sources like rivers.
“So, a couple of billion gallons a day, right?” Jore tells me. “You know how much influx is in the atmosphere every day? Twenty-five billion gallons.”
My eyebrows go up. “Globally?”
“Just the greater Phoenix area gets influx of about 25 billion gallons of water in the air,” he says. “If you can tap into it, that’s your source. And it’s not going away. It’s all around the world. We view the atmosphere as the world’s free pipeline.”
Besides AirJoule’s head start on Atoco, the companies also differ on where they get their MOFs. AirJoule’s system relies on an off-the-shelf version the company buys from the chemical giant BASF; Atoco aims to use Yaghi’s skill with designing the novel material to create bespoke MOFs for different applications and locations.
“Given the fact that we have the inventor of the whole class of materials, and we leverage the stuff that comes out of his lab at Berkeley—everything else equal, we have a good starting point to engineer maybe the best materials in the world,” says Magnus Bach, Atoco’s VP of business development.
Yaghi envisions a two-pronged product line. Industrial-scale water generators that run on electricity would be capable of producing thousands of liters per day on one end, while units that run on passive systems could operate in remote locations without power, just harnessing energy from the sun and ambient temperatures. In theory, these units could someday replace desalination and even entire municipal water supplies. The next round of field tests is scheduled for early 2026, in the Mojave Desert—one of the hottest, driest places on Earth.
“That’s my dream,” Yaghi says. “To give people water independence, so they’re not reliant on another party for their lives.”
Both Yaghi and Watergen’s Chernyavsky say they’re looking at more decentralized versions that could operate outside municipal utility systems. Home appliances, similar to rooftop solar panels and batteries, could allow households to generate their own water off grid.
That could be tricky, though, without economies of scale to bring down prices. “You have to produce, you have to cool, you have to filter—all in one place,” Chernyavsky says. “So to make it small is very, very challenging.”
Difficult as that may be, Yaghi’s childhood gave him a particular appreciation for the freedom to go off grid, to liberate the basic necessity of water from the whims of systems that dictate when and how people can access it.
“That’s really my dream,” he says. “To give people independence, water independence, so that they’re not reliant on another party for their livelihood or lives.”
Toward the end of one of our conversations, I asked Yaghi what he would tell the younger version of himself if he could. “Jordan is one of the worst countries in terms of the impact of water stress,” he said. “I would say, ‘Continue to be diligent and observant. It doesn’t really matter what you’re pursuing, as long as you’re passionate.’”
I pressed him for something more specific: “What do you think he’d say when you described this technology to him?”
Yaghi smiled: “I think young Omar would think you’re putting him on, that this is all fictitious and you’re trying to take something from him.” This reality, in other words, would be beyond young Omar’s wildest dreams.
Alexander C. Kaufman is a reporter who has covered energy, climate change, pollution, business, and geopolitics for more than a decade.
Depending who you ask, AI-powered coding is either giving software developers an unprecedented productivity boost or churning out masses of poorly designed code that saps their attention and sets software projects up for serious long term-maintenance problems.
The problem is right now, it’s not easy to know which is true.
As tech giants pour billions into large language models (LLMs), coding has been touted as the technology’s killer app. Both Microsoft CEO Satya Nadella and Google CEO Sundar Pichai have claimed that around a quarter of their companies’ code is now AI-generated. And in March, Anthropic’s CEO, Dario Amodei, predicted that within six months 90% of all code would be written by AI. It’s an appealing and obvious use case. Code is a form of language, we need lots of it, and it’s expensive to produce manually. It’s also easy to tell if it works—run a program and it’s immediately evident whether it’s functional.
This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.
Executives enamored with the potential to break through human bottlenecks are pushing engineers to lean into an AI-powered future. But after speaking to more than 30 developers, technology executives, analysts, and researchers, MIT Technology Review found that the picture is not as straightforward as it might seem.
For some developers on the front lines, initial enthusiasm is waning as they bump up against the technology’s limitations. And as a growing body of research suggests that the claimed productivity gains may be illusory, some are questioning whether the emperor is wearing any clothes.
The pace of progress is complicating the picture, though. A steady drumbeat of new model releases mean these tools’ capabilities and quirks are constantly evolving. And their utility often depends on the tasks they are applied to and the organizational structures built around them. All of this leaves developers navigating confusing gaps between expectation and reality.
Is it the best of times or the worst of times (to channel Dickens) for AI coding? Maybe both.
A fast-moving field
It’s hard to avoid AI coding tools these days. There are a dizzying array of products available, both from model developers like Anthropic, OpenAI, and Google and from companies like Cursor and Windsurf, which wrap these models in polished code-editing software. And according to Stack Overflow’s 2025 Developer Survey, they’re being adopted rapidly, with 65% of developers now using them at least weekly.
AI coding tools first emerged around 2016 but were supercharged with the arrival of LLMs. Early versions functioned as little more than autocomplete for programmers, suggesting what to type next. Today they can analyze entire code bases, edit across files, fix bugs, and even generate documentation explaining how the code works. All this is guided through natural-language prompts via a chat interface.
“Agents”—autonomous LLM-powered coding tools that can take a high-level plan and build entire programs independently—represent the latest frontier in AI coding. This leap was enabled by the latest reasoning models, which can tackle complex problems step by step and, crucially, access external tools to complete tasks. “This is how the model is able to code, as opposed to just talk about coding,” says Boris Cherny, head of Claude Code, Anthropic’s coding agent.
These agents have made impressive progress on software engineering benchmarks—standardized tests that measure model performance. When OpenAI introduced the SWE-bench Verified benchmark in August 2024, offering a way to evaluate agents’ success at fixing real bugs in open-source repositories, the top model solved just 33% of issues. A year later, leading models consistently score above 70%.
In February, Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, coined the term “vibe coding”—meaning an approach where people describe software in natural language and let AI write, refine, and debug the code. Social media abounds with developers who have bought into this vision, claiming massive productivity boosts.
But while some developers and companies report such productivity gains, the hard evidence is more mixed. Early studies from GitHub, Google, and Microsoft—all vendors of AI tools—found developers completing tasks 20% to 55% faster. But a September report from the consultancy Bain & Company described real-world savings as “unremarkable.”
Data from the developer analytics firm GitClear shows that most engineers are producing roughly 10% more durable code—code that isn’t deleted or rewritten within weeks—since 2022, likely thanks to AI. But that gain has come with sharp declines in several measures of code quality. Stack Overflow’s survey also found trust and positive sentiment toward AI tools falling significantly for the first time. And most provocatively, a July study by the nonprofit research organization Model Evaluation & Threat Research (METR) showed that while experienced developers believed AI made them 20% faster, objective tests showed they were actually 19% slower.
Growing disillusionment
For Mike Judge, principal developer at the software consultancy Substantial, the METR study struck a nerve. He was an enthusiastic early adopter of AI tools, but over time he grew frustrated with their limitations and the modest boost they brought to his productivity. “I was complaining to people because I was like, ‘It’s helping me but I can’t figure out how to make it really help me a lot,’” he says. “I kept feeling like the AI was really dumb, but maybe I could trick it into being smart if I found the right magic incantation.”
When asked by a friend, Judge had estimated the tools were providing a roughly 25% speedup. So when he saw similar estimates attributed to developers in the METR study he decided to test his own. For six weeks, he guessed how long a task would take, flipped a coin to decide whether to use AI or code manually, and timed himself. To his surprise, AI slowed him down by an median of 21%—mirroring the METR results.
This got Judge crunching the numbers. If these tools were really speeding developers up, he reasoned, you should see a massive boom in new apps, website registrations, video games, and projects on GitHub. He spent hours and several hundred dollars analyzing all the publicly available data and found flat lines everywhere.
“Shouldn’t this be going up and to the right?” says Judge. “Where’s the hockey stick on any of these graphs? I thought everybody was so extraordinarily productive.” The obvious conclusion, he says, is that AI tools provide little productivity boost for most developers.
Developers interviewed by MIT Technology Review generally agree on where AI tools excel: producing “boilerplate code” (reusable chunks of code repeated in multiple places with little modification), writing tests, fixing bugs, and explaining unfamiliar code to new developers. Several noted that AI helps overcome the “blank page problem” by offering an imperfect first stab to get a developer’s creative juices flowing. It can also let nontechnical colleagues quickly prototype software features, easing the load on already overworked engineers.
These tasks can be tedious, and developers are typically glad to hand them off. But they represent only a small part of an experienced engineer’s workload. For the more complex problems where engineers really earn their bread, many developers told MIT Technology Review, the tools face significant hurdles.
Perhaps the biggest problem is that LLMs can hold only a limited amount of information in their “context window”—essentially their working memory. This means they struggle to parse large code bases and are prone to forgetting what they’re doing on longer tasks. “It gets really nearsighted—it’ll only look at the thing that’s right in front of it,” says Judge. “And if you tell it to do a dozen things, it’ll do 11 of them and just forget that last one.”
DEREK BRAHNEY
LLMs’ myopia can lead to headaches for human coders. While an LLM-generated response to a problem may work in isolation, software is made up of hundreds of interconnected modules. If these aren’t built with consideration for other parts of the software, it can quickly lead to a tangled, inconsistent code base that’s hard for humans to parse and, more important, to maintain.
Developers have traditionally addressed this by following conventions—loosely defined coding guidelines that differ widely between projects and teams. “AI has this overwhelming tendency to not understand what the existing conventions are within a repository,” says Bill Harding, the CEO of GitClear. “And so it is very likely to come up with its own slightly different version of how to solve a problem.”
The models also just get things wrong. Like all LLMs, coding models are prone to “hallucinating”—it’s an issue built into how they work. But because the code they output looks so polished, errors can be difficult to detect, says James Liu, director of software engineering at the advertising technology company Mediaocean. Put all these flaws together, and using these tools can feel a lot like pulling a lever on a one-armed bandit. “Some projects you get a 20x improvement in terms of speed or efficiency,” says Liu. “On other things, it just falls flat on its face, and you spend all this time trying to coax it into granting you the wish that you wanted and it’s just not going to.”
Judge suspects this is why engineers often overestimate productivity gains. “You remember the jackpots. You don’t remember sitting there plugging tokens into the slot machine for two hours,” he says.
And it can be particularly pernicious if the developer is unfamiliar with the task. Judge remembers getting AI to help set up a Microsoft cloud service called an Azure Functions, which he’d never used before. He thought it would take about two hours, but nine hours later he threw in the towel. “It kept leading me down these rabbit holes and I didn’t know enough about the topic to be able to tell it ‘Hey, this is nonsensical,’” he says.
The debt begins to mount up
Developers constantly make trade-offs between speed of development and the maintainability of their code—creating what’s known as “technical debt,” says Geoffrey G. Parker, professor of engineering innovation at Dartmouth College. Each shortcut adds complexity and makes the code base harder to manage, accruing “interest” that must eventually be repaid by restructuring the code. As this debt piles up, adding new features and maintaining the software becomes slower and more difficult.
Accumulating technical debt is inevitable in most projects, but AI tools make it much easier for time-pressured engineers to cut corners, says GitClear’s Harding. And GitClear’s data suggests this is happening at scale. Since 2020, the company has seen a significant rise in the amount of copy-pasted code—an indicator that developers are reusing more code snippets, most likely based on AI suggestions—and an even bigger decline in the amount of code moved from one place to another, which happens when developers clean up their code base.
And as models improve, the code they produce is becoming increasingly verbose and complex, says Tariq Shaukat, CEO of Sonar, which makes tools for checking code quality. This is driving down the number of obvious bugs and security vulnerabilities, he says, but at the cost of increasing the number of “code smells”—harder-to-pinpoint flaws that lead to maintenance problems and technical debt.
Recent research by Sonar found that these make up more than 90% of the issues found in code generated by leading AI models. “Issues that are easy to spot are disappearing, and what’s left are much more complex issues that take a while to find,” says Shaukat. “That’s what worries us about this space at the moment. You’re almost being lulled into a false sense of security.”
If AI tools make it increasingly difficult to maintain code, that could have significant security implications, says Jessica Ji, a security researcher at Georgetown University. “The harder it is to update things and fix things, the more likely a code base or any given chunk of code is to become insecure over time,” says Ji.
There are also more specific security concerns, she says. Researchers have discovered a worrying class of hallucinations where models reference nonexistent software packages in their code. Attackers can exploit this by creating packages with those names that harbor vulnerabilities, which the model or developer may then unwittingly incorporate into software.
LLMs are also vulnerable to “data-poisoning attacks,” where hackers seed the publicly available data sets models train on with data that alters the model’s behavior in undesirable ways, such as generating insecure code when triggered by specific phrases. In October, research by Anthropic found that as few as 250 malicious documents can introduce this kind of back door into an LLM regardless of its size.
The converted
Despite these issues, though, there’s probably no turning back. “Odds are that writing every line of code on a keyboard by hand—those days are quickly slipping behind us,” says Kyle Daigle, chief operating officer at the Microsoft-owned code-hosting platform GitHub, which produces a popular AI-powered tool called Copilot (not to be confused with the Microsoft product of the same name).
The Stack Overflow report found that despite growing distrust in the technology, usage has increased rapidly and consistently over the past three years. Erin Yepis, a senior analyst at Stack Overflow, says this suggests that engineers are taking advantage of the tools with a clear-eyed view of the risks. The report also found that frequent users tend to be more enthusiastic and more than half of developers are not using the latest coding agents, perhaps explaining why many remain underwhelmed by the technology.
Those latest tools can be a revelation. Trevor Dilley, CTO at the software development agency Twenty20 Ideas, says he had found some value in AI editors’ autocomplete functions, but when he tried anything more complex it would “fail catastrophically.” Then in March, while on vacation with his family, he set the newly released Claude Code to work on one of his hobby projects. It completed a four-hour task in two minutes, and the code was better than what he would have written.
“I was like, Whoa,” he says. “That, for me, was the moment, really. There’s no going back from here.” Dilley has since cofounded a startup called DevSwarm, which is creating software that can marshal multiple agents to work in parallel on a piece of software.
The challenge, says Armin Ronacher, a prominent open-source developer, is that the learning curve for these tools is shallow but long. Until March he’d remained unimpressed by AI tools, but after leaving his job at the software company Sentry in April to launch a startup, he started experimenting with agents. “I basically spent a lot of months doing nothing but this,” he says. “Now, 90% of the code that I write is AI-generated.”
Getting to that point involved extensive trial and error, to figure out which problems tend to trip the tools up and which they can handle efficiently. Today’s models can tackle most coding tasks with the right guardrails, says Ronacher, but these can be very task and project specific.
To get the most out of these tools, developers must surrender control over individual lines of code and focus on the overall software architecture, says Nico Westerdale, chief technology officer at the veterinary staffing company IndeVets. He recently built a data science platform 100,000 lines of code long almost exclusively by prompting models rather than writing the code himself.
Westerdale’s process starts with an extended conversation with the modelagent to develop a detailed plan for what to build and how. He then guides it through each step. It rarely gets things right on the first try and needs constant wrangling, but if you force it to stick to well-defined design patterns, the models can produce high-quality, easily maintainable code, says Westerdale. He reviews every line, and the code is as good as anything he’s ever produced, he says: “I’ve just found it absolutely revolutionary,. It’s also frustrating, difficult, a different way of thinking, and we’re only just getting used to it.”
But while individual developers are learning how to use these tools effectively, getting consistent results across a large engineering team is significantly harder. AI tools amplify both the good and bad aspects of your engineering culture, says Ryan J. Salva, senior director of product management at Google. With strong processes, clear coding patterns, and well-defined best practices, these tools can shine.
DEREK BRAHNEY
But if your development process is disorganized, they’ll only magnify the problems. It’s also essential to codify that institutional knowledge so the models can draw on it effectively. “A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads,” he says.
The cryptocurrency exchange Coinbase has been vocal about its adoption of AI tools. CEO Brian Armstrong made headlines in August when he revealed that the company had fired staff unwilling to adopt AI tools. But Coinbase’s head of platform, Rob Witoff, tells MIT Technology Review that while they’ve seen massive productivity gains in some areas, the impact has been patchy. For simpler tasks like restructuring the code base and writing tests, AI-powered workflows have achieved speedups of up to 90%. But gains are more modest for other tasks, and the disruption caused by overhauling existing processes often counteracts the increased coding speed, says Witoff.
One factor is that AI tools let junior developers produce far more code,. As in almost all engineering teams, this code has to be reviewed by others, normally more senior developers, to catch bugs and ensure it meets quality standards. But the sheer volume of code now being churned out i whichs quickly saturatinges the ability of midlevel staff to review changes. “This is the cycle we’re going through almost every month, where we automate a new thing lower down in the stack, which brings more pressure higher up in the stack,” he says. “Then we’re looking at applying automation to that higher-up piece.”
Developers also spend only 20% to 40% of their time coding, says Jue Wang, a partner at Bain, so even a significant speedup there often translates to more modest overall gains. Developers spend the rest of their time analyzing software problems and dealing with customer feedback, product strategy, and administrative tasks. To get significant efficiency boosts, companies may need to apply generative AI to all these other processes too, says Jue, and that is still in the works.
Rapid evolution
Programming with agents is a dramatic departure from previous working practices, though, so it’s not surprising companies are facing some teething issues. These are also very new products that are changing by the day. “Every couple months the model improves, and there’s a big step change in the model’s coding capabilities and you have to get recalibrated,” says Anthropic’s Cherny.
For example, in June Anthropic introduced a built-in planning mode to Claude; it has since been replicated by other providers. In October, the company also enabled Claude to ask users questions when it needs more context or faces multiple possible solutions, which Cherny says helps it avoid the tendency to simply assume which path is the best way forward.
Most significant, Anthropic has added features that make Claude better at managing its own context. When it nears the limits of its working memory, it summarizes key details and uses them to start a new context window, effectively giving it an “infinite” one, says Cherny. Claude can also invoke sub-agents to work on smaller tasks, so it no longer has to hold all aspects of the project in its own head. The company claims that its latest model, Claude 4.5 Sonnet, can now code autonomously for more than 30 hours without major performance degradation.
Novel approaches to software development could also sidestep coding agents’ other flaws. MIT professor Max Tegmark has introduced something he calls “vericoding,” which could allow agents to produce entirely bug-free code from a natural-language description. It builds on an approach known as “formal verification,” where developers create a mathematical model of their software that can prove incontrovertibly that it functions correctly. This approach is used in high-stakes areas like flight-control systems and cryptographic libraries, but it remains costly and time-consuming, limiting its broader use.
Rapid improvements in LLMs’ mathematical capabilities have opened up the tantalizing possibility of models that produce not only software but the mathematical proof that it’s bug free, says Tegmark. “You just give the specification, and the AI comes back with provably correct code,” he says. “You don’t have to touch the code. You don’t even have to ever look at the code.”
When tested on about 2,000 vericoding problems in Dafny—a language designed for formal verification—the best LLMs solved over 60%, according to non-peer-reviewed research by Tegmark’s group. This was achieved with off-the-shelf LLMs, and Tegmark expects that training specifically for vericoding could improve scores rapidly.
And counterintuitively, Tthe speed at which AI generates code could actuallylso ease maintainability concerns. Alex Worden, principal engineer at the business software giant Intuit, notes that maintenance is often difficult because engineers reuse components across projects, creating a tangle of dependencies where one change triggers cascading effects across the code base. Reusing code used to save developers time, but in a world where AI can produce hundreds of lines of code in seconds, that imperative has gone, says Worden.
Instead, he advocates for “disposable code,” where each component is generated independently by AI without regard for whether it follows design patterns or conventions. They are then connected via APIs—sets of rules that let components request information or services from each other. Each component’s inner workings are not dependent on other parts of the code base, making it possible to rip them out and replace them without wider impact, says Worden.
“The industry is still concerned about humans maintaining AI-generated code,” he says. “I question how long humans will look at or care about code.”
A narrowing talent pipeline
For the foreseeable future, though, humans will still need to understand and maintain the code that underpins their projects. And one of the most pernicious side effects of AI tools may be a shrinking pool of people capable of doing so.
Early evidence suggests that fears around the job-destroying effects of AI may be justified. A recent Stanford University study found that employment among software developers aged 22 to 25 fell nearly 20% between 2022 and 2025, coinciding with the rise of AI-powered coding tools.
Experienced developers could face difficulties too. Luciano Nooijen, an engineer at the video-game infrastructure developer Companion Group, used AI tools heavily in his day job, where they were provided for free. But when he began a side project without access to those tools, he found himself struggling with tasks that previously came naturally. “I was feeling so stupid because things that used to be instinct became manual, sometimes even cumbersome,” says Nooijen.
Just as athletes still perform basic drills, he thinks the only way to maintain an instinct for coding is to regularly practice the grunt work. That’s why he’s largely abandoned AI tools, though he admits that deeper motivations are also at play.
Part of the reason Nooijen and other developers MIT Technology Review spoke to are pushing back against AI tools is a sense that they are hollowing out the parts of their jobs that they love. “I got into software engineering because I like working with computers. I like making machines do things that I want,” Nooijen says. “It’s just not fun sitting there with my work being done for me.”
The microwave-size instrument at Lila Sciences in Cambridge, Massachusetts, doesn’t look all that different from others that I’ve seen in state-of-the-art materials labs. Inside its vacuum chamber, the machine zaps a palette of different elements to create vaporized particles, which then fly through the chamber and land to create a thin film, using a technique called sputtering. What sets this instrument apart is that artificial intelligence is running the experiment; an AI agent, trained on vast amounts of scientific literature and data, has determined the recipe and is varying the combination of elements.
Later, a person will walk the samples, each containing multiple potential catalysts, over to a different part of the lab for testing. Another AI agent will scan and interpret the data, using it to suggest another round of experiments to try to optimize the materials’ performance.
This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.
For now, a human scientist keeps a close eye on the experiments and will approve the next steps on the basis of the AI’s suggestions and the test results. But the startup is convinced this AI-controlled machine is a peek into the future of materials discovery—one in which autonomous labs could make it far cheaper and faster to come up with novel and useful compounds.
Flush with hundreds of millions of dollars in new funding, Lila Sciences is one of AI’s latest unicorns. The company is on a larger mission to use AI-run autonomous labs for scientific discovery—the goal is to achieve what it calls scientific superintelligence. But I’m here this morning to learn specifically about the discovery of new materials.
Lila Sciences’ John Gregoire (background) and Rafael Gómez-Bombarelli watch as an AI-guided sputtering instrument makes samples of thin-film alloys.
CODY O’LOUGHLIN
We desperately need better materials to solve our problems. We’ll need improved electrodes and other parts for more powerful batteries; compounds to more cheaply suck carbon dioxide out of the air; and better catalysts to make green hydrogen and other clean fuels and chemicals. And we will likely need novel materials like higher-temperature superconductors, improved magnets, and different types of semiconductors for a next generation of breakthroughs in everything from quantum computing to fusion power to AI hardware.
But materials science has not had many commercial wins in the last few decades. In part because of its complexity and the lack of successes, the field has become something of an innovation backwater, overshadowed by the more glamorous—and lucrative—search for new drugs and insights into biology.
The idea of using AI for materials discovery is not exactly new, but it got a huge boost in 2020 when DeepMind showed that its AlphaFold2 model could accurately predict the three-dimensional structure of proteins. Then, in 2022, came the success and popularity of ChatGPT. The hope that similar AI models using deep learning could aid in doing science captivated tech insiders. Why not use our new generative AI capabilities to search the vast chemical landscape and help simulate atomic structures, pointing the way to new substances with amazing properties?
“Simulations can be super powerful for framing problems and understanding what is worth testing in the lab. But there’s zero problems we can ever solve in the real world with simulation alone.”
John Gregoire, Lila Sciences, chief autonomous science officer
Researchers touted an AI model that had reportedly discovered “millions of new materials.” The money began pouring in, funding a host of startups. But so far there has been no “eureka” moment, no ChatGPT-like breakthrough—no discovery of new miracle materials or even slightly better ones.
The startups that want to find useful new compounds face a common bottleneck: By far the most time-consuming and expensive step in materials discovery is not imagining new structures but making them in the real world. Before trying to synthesize a material, you don’t know if, in fact, it can be made and is stable, and many of its properties remain unknown until you test it in the lab.
“Simulations can be super powerful for kind of framing problems and understanding what is worth testing in the lab,” says John Gregoire, Lila Sciences’ chief autonomous science officer. “But there’s zero problems we can ever solve in the real world with simulation alone.”
Startups like Lila Sciences have staked their strategies on using AI to transform experimentation and are building labs that use agents to plan, run, and interpret the results of experiments to synthesize new materials. Automation in laboratories already exists. But the idea is to have AI agents take it to the next level by directing autonomous labs, where their tasks could include designing experiments and controlling the robotics used to shuffle samples around. And, most important, companies want to use AI to vacuum up and analyze the vast amount of data produced by such experiments in the search for clues to better materials.
If they succeed, these companies could shorten the discovery process from decades to a few years or less, helping uncover new materials and optimize existing ones. But it’s a gamble. Even though AI is already taking over many laboratory chores and tasks, finding new—and useful—materials on its own is another matter entirely.
Innovation backwater
I have been reporting about materials discovery for nearly 40 years, and to be honest, there have been only a few memorable commercial breakthroughs, such as lithium-ion batteries, over that time. There have been plenty of scientific advances to write about, from perovskite solar cells to graphene transistors to metal-organic frameworks (MOFs), materials based on an intriguing type of molecular architecture that recently won its inventors a Nobel Prize. But few of those advances—including MOFs—have made it far out of the lab. Others, like quantum dots, have found some commercial uses, but in general, the kinds of life-changing inventions created in earlier decades have been lacking.
Blame the amount of time (typically 20 years or more) and the hundreds of millions of dollars it takes to make, test, optimize, and manufacture a new material—and the industry’s lack of interest in spending that kind of time and money in low-margin commodity markets. Or maybe we’ve just run out of ideas for making stuff.
The need to both speed up that process and find new ideas is the reason researchers have turned to AI. For decades, scientists have used computers to design potential materials, calculating where to place atoms to form structures that are stable and have predictable characteristics. It’s worked—but only kind of. Advances in AI have made that computational modeling far faster and have promised the ability to quickly explore a vast number of possible structures. Google DeepMind, Meta, and Microsoft have all launched efforts to bring AI tools to the problem of designing new materials.
But the limitations that have always plagued computational modeling of new materials remain. With many types of materials, such as crystals, useful characteristics often can’t be predicted solely by calculating atomic structures.
To uncover and optimize those properties, you need to make something real. Or as Rafael Gómez-Bombarelli, one of Lila’s cofounders and an MIT professor of materials science, puts it: “Structure helps us think about the problem, but it’s neither necessary nor sufficient for real materials problems.”
Perhaps no advance exemplified the gap between the virtual and physical worlds more than DeepMind’s announcement in late 2023 that it had used deep learning to discover “millions of new materials,” including 380,000 crystals that it declared “the most stable, making them promising candidates for experimental synthesis.” In technical terms, the arrangement of atoms represented a minimum energy state where they were content to stay put. This was “an order-of-magnitude expansion in stable materials known to humanity,” the DeepMind researchers proclaimed.
To the AI community, it appeared to be the breakthrough everyone had been waiting for. The DeepMind research not only offered a gold mine of possible new materials, it also created powerful new computational methods for predicting a large number of structures.
But some materials scientists had a far different reaction. After closer scrutiny, researchers at the University of California, Santa Barbara, said they’d found “scant evidence for compounds that fulfill the trifecta of novelty, credibility, and utility.” In fact, the scientists reported, they didn’t find any truly novel compounds among the ones they looked at; some were merely “trivial” variations of known ones. The scientists appeared particularly peeved that the potential compounds were labeled materials. They wrote: “We would respectfully suggest that the work does not report any new materials but reports a list of proposed compounds. In our view, a compound can be called a material when it exhibits some functionality and, therefore, has potential utility.”
Some of the imagined crystals simply defied the conditions of the real world. To do computations on so many possible structures, DeepMind researchers simulated them at absolute zero, where atoms are well ordered; they vibrate a bit but don’t move around. At higher temperatures—the kind that would exist in the lab or anywhere in the world—the atoms fly about in complex ways, often creating more disorderly crystal structures. A number of the so-called novel materials predicted by DeepMind appeared to be well-ordered versions of disordered ones that were already known.
More generally, the DeepMind paper was simply another reminder of how challenging it is to capture physical realities in virtual simulations—at least for now. Because of the limitations of computational power, researchers typically perform calculations on relatively few atoms. Yet many desirable properties are determined by the microstructure of the materials—at a scale much larger than the atomic world. And some effects, like high-temperature superconductivity or even the catalysis that is key to many common industrial processes, are far too complex or poorly understood to be explained by atomic simulations alone.
A common language
Even so, there are signs that the divide between simulations and experimental work is beginning to narrow. DeepMind, for one, says that since the release of the 2023 paper it has been working with scientists in labs around the world to synthesize AI-identified compounds and has achieved some success. Meanwhile, a number of the startups entering the space are looking to combine computational and experimental expertise in one organization.
One such startup is Periodic Labs, cofounded by Ekin Dogus Cubuk, a physicist who led the scientific team that generated the 2023 DeepMind headlines, and by Liam Fedus, a co-creator of ChatGPT at OpenAI. Despite its founders’ background in computational modeling and AI software, the company is building much of its materials discovery strategy around synthesis done in automated labs.
The vision behind the startup is to link these different fields of expertise by using large language models that are trained on scientific literature and able to learn from ongoing experiments. An LLM might suggest the recipe and conditions to make a compound; it can also interpret test data and feed additional suggestions to the startup’s chemists and physicists. In this strategy, simulations might suggest possible material candidates, but they are also used to help explain the experimental results and suggest possible structural tweaks.
The grand prize would be a room-temperature superconductor, a material that could transform computing and electricity but that has eluded scientists for decades.
Periodic Labs, like Lila Sciences, has ambitions beyond designing and making new materials. It wants to “create an AI scientist”—specifically, one adept at the physical sciences. “LLMs have gotten quite good at distilling chemistry information, physics information,” says Cubuk, “and now we’re trying to make it more advanced by teaching it how to do science—for example, doing simulations, doing experiments, doing theoretical modeling.”
The approach, like that of Lila Sciences, is based on the expectation that a better understanding of the science behind materials and their synthesis will lead to clues that could help researchers find a broad range of new ones. One target for Periodic Labs is materials whose properties are defined by quantum effects, such as new types of magnets. The grand prize would be a room-temperature superconductor, a material that could transform computing and electricity but that has eluded scientists for decades.
Superconductors are materials in which electricity flows without any resistance and, thus, without producing heat. So far, the best of these materials become superconducting only at relatively low temperatures and require significant cooling. If they can be made to work at or close to room temperature, they could lead to far more efficient power grids, new types of quantum computers, and even more practical high-speed magnetic-levitation trains.
Lila staff scientist Natalie Page (right), Gómez- Bombarelli, and Gregoire inspect thin-film samples after they come out of the sputtering machine and before they undergo testing.
CODY O’LOUGHLIN
The failure to find a room-temperature superconductor is one of the great disappointments in materials science over the last few decades. I was there when President Reagan spoke about the technology in 1987, during the peak hype over newly made ceramics that became superconducting at the relatively balmy temperature of 93 Kelvin (that’s −292 °F), enthusing that they “bring us to the threshold of a new age.” There was a sense of optimism among the scientists and businesspeople in that packed ballroom at the Washington Hilton as Reagan anticipated “a host of benefits, not least among them a reduced dependence on foreign oil, a cleaner environment, and a stronger national economy.” In retrospect, it might have been one of the last times that we pinned our economic and technical aspirations on a breakthrough in materials.
The promised new age never came. Scientists still have not found a material that becomes superconducting at room temperatures, or anywhere close, under normal conditions. The best existing superconductors are brittle and tend to make lousy wires.
One of the reasons that finding higher-temperature superconductors has been so difficult is that no theory explains the effect at relatively high temperatures—or can predict it simply from the placement of atoms in the structure. It will ultimately fall to lab scientists to synthesize any interesting candidates, test them, and search the resulting data for clues to understanding the still puzzling phenomenon. Doing so, says Cubuk, is one of the top priorities of Periodic Labs.
AI in charge
It can take a researcher a year or more to make a crystal structure for the first time. Then there are typically years of further work to test its properties and figure out how to make the larger quantities needed for a commercial product.
Startups like Lila Sciences and Periodic Labs are pinning their hopes largely on the prospect that AI-directed experiments can slash those times. One reason for the optimism is that many labs have already incorporated a lot of automation, for everything from preparing samples to shuttling test items around. Researchers routinely use robotic arms, software, automated versions of microscopes and other analytical instruments, and mechanized tools for manipulating lab equipment.
The automation allows, among other things, for high-throughput synthesis, in which multiple samples with various combinations of ingredients are rapidly created and screened in large batches, greatly speeding up the experiments.
The idea is that using AI to plan and run such automated synthesis can make it far more systematic and efficient. AI agents, which can collect and analyze far more data than any human possibly could, can use real-time information to vary the ingredients and synthesis conditions until they get a sample with the optimal properties. Such AI-directed labs could do far more experiments than a person and could be far smarter than existing systems for high-throughput synthesis.
But so-called self-driving labs for materials are still a work in progress.
Many types of materials require solid-state synthesis, a set of processes that are far more difficult to automate than the liquid-handling activities that are commonplace in making drugs. You need to prepare and mix powders of multiple inorganic ingredients in the right combination for making, say, a catalyst and then decide how to process the sample to create the desired structure—for example, identifying the right temperature and pressure at which to carry out the synthesis. Even determining what you’ve made can be tricky.
In 2023, the A-Lab at Lawrence Berkeley National Laboratory claimed to be the first fully automated lab to use inorganic powders as starting ingredients. Subsequently, scientists reported that the autonomous lab had used robotics and AI to synthesize and test 41 novel materials, including some predicted in the DeepMind database. Some critics questioned the novelty of what was produced and complained that the automated analysis of the materials was not up to experimental standards, but the Berkeley researchers defended the effort as simply a demonstration of the autonomous system’s potential.
“How it works today and how we envision it are still somewhat different. There’s just a lot of tool building that needs to be done,” says Gerbrand Ceder, the principal scientist behind the A-Lab.
AI agents are already getting good at doing many laboratory chores, from preparing recipes to interpreting some kinds of test data—finding, for example, patterns in a micrograph that might be hidden to the human eye. But Ceder is hoping the technology could soon “capture human decision-making,” analyzing ongoing experiments to make strategic choices on what to do next. For example, his group is working on an improved synthesis agent that would better incorporate what he calls scientists’ “diffused” knowledge—the kind gained from extensive training and experience. “I imagine a world where people build agents around their expertise, and then there’s sort of an uber-model that puts it together,” he says. “The uber-model essentially needs to know what agents it can call on and what they know, or what their expertise is.”
“In one field that I work in, solid-state batteries, there are 50 papers published every day. And that is just one field that I work in. The A I revolution is about finally gathering all the scientific data we have.”
Gerbrand Ceder, principal scientist, A-Lab
One of the strengths of AI agents is their ability to devour vast amounts of scientific literature. “In one field that I work in, solid-state batteries, there are 50 papers published every day. And that is just one field that I work in,” says Ceder. It’s impossible for anyone to keep up. “The AI revolution is about finally gathering all the scientific data we have,” he says.
Last summer, Ceder became the chief science officer at an AI materials discovery startup called Radical AI and took a sabbatical from the University of California, Berkeley, to help set up its self-driving labs in New York City. A slide deck shows the portfolio of different AI agents and generative models meant to help realize Ceder’s vision. If you look closely, you can spot an LLM called the “orchestrator”—it’s what CEO Joseph Krause calls the “head honcho.”
New hope
So far, despite the hype around the use of AI to discover new materials and the growing momentum—and money—behind the field, there still has not been a convincing big win. There is no example like the 2016 victory of DeepMind’s AlphaGo over a Go world champion. Or like AlphaFold’s achievement in mastering one of biomedicine’s hardest and most time-consuming chores, predicting 3D structures of proteins.
The field of materials discovery is still waiting for its moment. It could come if AI agents can dramatically speed the design or synthesis of practical materials, similar to but better than what we have today. Or maybe the moment will be the discovery of a truly novel one, such as a room-temperature superconductor.
A small window provides a view of the inside workings of Lila’s sputtering instrument.The startup uses the machine to create a wide variety of experimental samples, including potential materials that could be useful for coatings and catalysts.
CODY O’LOUGHLIN
With or without such a breakthrough moment, startups face the challenge of trying to turn their scientific achievements into useful materials. The task is particularly difficult because any new materials would likely have to be commercialized in an industry dominated by large incumbents that are not particularly prone to risk-taking.
Susan Schofer, a tech investor and partner at the venture capital firm SOSV, is cautiously optimistic about the field. But Schofer, who spent several years in the mid-2000s as a catalyst researcher at one of the first startups using automation and high-throughput screening for materials discovery (it didn’t survive), wants to see some evidence that the technology can translate into commercial successes when she evaluates startups to invest in.
In particular, she wants to see evidence that the AI startups are already “finding something new, that’s different, and know how they are going to iterate from there.” And she wants to see a business model that captures the value of new materials. She says, “I think the ideal would be: I got a spec from the industry. I know what their problem is. We’ve defined it. Now we’re going to go build it. Now we have a new material that we can sell, that we have scaled up enough that we’ve proven it. And then we partner somehow to manufacture it, but we get revenue off selling the material.”
Schofer says that while she gets the vision of trying to redefine science, she’d advise startups to “show us how you’re going to get there.” She adds, “Let’s see the first steps.”
Demonstrating those first steps could be essential in enticing large existing materials companies to embrace AI technologies more fully. Corporate researchers in the industry have been burned before—by the promise over the decades that increasingly powerful computers will magically design new materials; by combinatorial chemistry, a fad that raced through materials R&D labs in the early 2000s with little tangible result; and by the promise that synthetic biology would make our next generation of chemicals and materials.
More recently, the materials community has been blanketed by a new hype cycle around AI. Some of that hype was fueled by the 2023 DeepMind announcement of the discovery of “millions of new materials,” a claim that, in retrospect, clearly overpromised. And it was further fueled when an MIT economics student posted a paper in late 2024 claiming that a large, unnamed corporate R&D lab had used AI to efficiently invent a slew of new materials. AI, it seemed, was already revolutionizing the industry.
A few months later, the MIT economics department concluded that “the paper should be withdrawn from public discourse.” Two prominent MIT economists who are acknowledged in a footnote in the paper added that they had “no confidence in the provenance, reliability or validity of the data and the veracity of the research.”
Can AI move beyond the hype and false hopes and truly transform materials discovery? Maybe. There is ample evidence that it’s changing how materials scientists work, providing them—if nothing else—with useful lab tools. Researchers are increasingly using LLMs to query the scientific literature and spot patterns in experimental data.
But it’s still early days in turning those AI tools into actual materials discoveries. The use of AI to run autonomous labs, in particular, is just getting underway; making and testing stuff takes time and lots of money. The morning I visited Lila Sciences, its labs were largely empty, and it’s now preparing to move into a much larger space a few miles away. Periodic Labs is just beginning to set up its lab in San Francisco. It’s starting with manual synthesis guided by AI predictions; its robotic high-throughput lab will come soon. Radical AI reports that its lab is almost fully autonomous but plans to soon move to a larger space.
Prominent AI researchers Liam Fedus (left) and Ekin Dogus Cubuk are the cofounders of Periodic Labs. The San Francisco–based startup aims to build an AI scientist that’s adept at the physical sciences.
JASON HENRY
When I talk to the scientific founders of these startups, I hear a renewed excitement about a field that long operated in the shadows of drug discovery and genomic medicine. For one thing, there is the money. “You see this enormous enthusiasm to put AI and materials together,” says Ceder. “I’ve never seen this much money flow into materials.”
Reviving the materials industry is a challenge that goes beyond scientific advances, however. It means selling companies on a whole new way of doing R&D.
But the startups benefit from a huge dose of confidence borrowed from the rest of the AI industry. And maybe that, after years of playing it safe, is just what the materials business needs.
Consider, if you will, the translucent blob in the eye of a microscope: a human blastocyst, the biological specimen that emerges just five days or so after a fateful encounter between egg and sperm. This bundle of cells, about the size of a grain of sand pulled from a powdery white Caribbean beach, contains the coiled potential of a future life: 46 chromosomes, thousands of genes, and roughly six billion base pairs of DNA—an instruction manual to assemble a one-of-a-kind human.
Now imagine a laser pulse snipping a hole in the blastocyst’s outermost shell so a handful of cells can be suctioned up by a microscopic pipette. This is the moment, thanks to advances in genetic sequencing technology, when it becomes possible to read virtually that entire instruction manual.
An emerging field of science seeks to use the analysis pulled from that procedure to predict what kind of a person that embryo might become. Some parents turn to these tests to avoid passing on devastating genetic disorders that run in their families. A much smaller group, driven by dreams of Ivy League diplomas or attractive, well-behaved offspring, are willing to pay tens of thousands of dollars to optimize for intelligence, appearance, and personality. Some of the most eager early boosters of this technology are members of the Silicon Valley elite, including tech billionaires like Elon Musk, Peter Thiel, and Coinbase CEO Brian Armstrong.
Embryo selection is less like a build-a-baby workshop and more akin to a store where parents can shop for their future children from several available models—complete with stat cards.
But customers of the companies emerging to provide it to the public may not be getting what they’re paying for. Genetics experts have been highlighting the potential deficiencies of this testing for years. A 2021 paper by members of the European Society of Human Genetics said, “No clinical research has been performed to assess its diagnostic effectiveness in embryos. Patients need to be properly informed on the limitations of this use.” And a paper published this May in the Journal of Clinical Medicine echoed this concern and expressed particular reservations about screening for psychiatric disorders and non-disease-related traits: “Unfortunately, no clinical research has to date been published comprehensively evaluating the effectiveness of this strategy [of predictive testing]. Patient awareness regarding the limitations of this procedure is paramount.”
Moreover, the assumptions underlying some of this work—that how a person turns out is the product not of privilege or circumstance but of innate biology—have made these companies a political lightning rod.
SELMAN DESIGN
As this niche technology begins to make its way toward the mainstream, scientists and ethicists are racing to confront the implications—for our social contract, for future generations, and for our very understanding of what it means to be human.
Preimplantation genetic testing (PGT), while still relatively rare, is not new. Since the 1990s, parents undergoing in vitro fertilization have been able to access a number of genetic tests before choosing which embryo to use. A type known as PGT-M can detect single-gene disorders like cystic fibrosis, sickle cell anemia, and Huntington’s disease. PGT-A can ascertain the sex of an embryo and identify chromosomal abnormalities that can lead to conditions like Down syndrome or reduce the chances that an embryo will implant successfully in the uterus. PGT-SR helps parents avoid embryos with issues such as duplicated or missing segments of the chromosome.
Those tests all identify clear-cut genetic problems that are relatively easy to detect, but most of the genetic instruction manual included in an embryo is written in far more nuanced code. In recent years, a fledgling market has sprung up around a new, more advanced version of the testing process called PGT-P: preimplantation genetic testing for polygenic disorders (and, some claim, traits)—that is, outcomes determined by the elaborate interaction of hundreds or thousands of genetic variants.
In 2020, the first baby selected using PGT-P was born. While the exact figure is unknown, estimates put the number of children who have now been born with the aid of this technology in the hundreds. As the technology is commercialized, that number is likely to grow.
Embryo selection is less like a build-a-baby workshop and more akin to a store where parents can shop for their future children from several available models—complete with stat cards indicating their predispositions.
A handful of startups, armed with tens of millions of dollars of Silicon Valley cash, have developed proprietary algorithms to compute these stats—analyzing vast numbers of genetic variants and producing a “polygenic risk score” that shows the probability of an embryo developing a variety of complex traits.
For the last five years or so, two companies—Genomic Prediction and Orchid—have dominated this small landscape, focusing their efforts on disease prevention. But more recently, two splashy new competitors have emerged: Nucleus Genomics and Herasight, which have rejected the more cautious approach of their predecessors and waded into the controversial territory of genetic testing for intelligence. (Nucleus also offers tests for a wide variety of other behavioral and appearance-related traits.)
The practical limitations of polygenic risk scores are substantial. For starters, there is still a lot we don’t understand about the complex gene interactions driving polygenic traits and disorders. And the biobank data sets they are based on tend to overwhelmingly represent individuals with Western European ancestry, making it more difficult to generate reliable scores for patients from other backgrounds. These scores also lack the full context of environment, lifestyle, and the myriad other factors that can influence a person’s characteristics. And while polygenic risk scores can be effective at detecting large, population-level trends, their predictive abilities drop significantly when the sample size is as tiny as a single batch of embryos that share much of the same DNA.
But beyond questions of whether evidence supports the technology’s effectiveness, critics of the companies selling it accuse them of reviving a disturbing ideology: eugenics, or the belief that selective breeding can be used to improve humanity. Indeed, some of the voices who have been most confident that these methods can successfully predict nondisease traits have made startling claims about natural genetic hierarchies and innate racial differences.
What everyone can agree on, though, is that this new wave of technology is helping to inflame a centuries-old debate over nature versus nurture.
The term “eugenics” was coined in 1883 by a British anthropologist and statistician named Sir Francis Galton, inspired in part by the work of his cousin Charles Darwin. He derived it from a Greek word meaning “good in stock, hereditarily endowed with noble qualities.”
Some of modern history’s darkest chapters have been built on Galton’s legacy, from the Holocaust to the forced sterilization laws that affected certain groups in the United States well into the 20th century. Modern science has demonstrated the many logical and empirical problems with Galton’s methodology. (For starters, he counted vague concepts like “eminence”—as well as infections like syphilis and tuberculosis—as heritable phenotypes, meaning characteristics that result from the interaction of genes and environment.)
Yet even today, Galton’s influence lives on in the field of behavioral genetics, which investigates the genetic roots of psychological traits. Starting in the 1960s, researchers in the US began to revisit one of Galton’s favorite methods: twin studies. Many of these studies, which analyzed pairs of identical and fraternal twins to try to determine which traits were heritable and which resulted from socialization, were funded by the US government. The most well-known of these, the Minnesota Twin Study, also accepted grants from the Pioneer Fund, a now defunct nonprofit that had promoted eugenics and “race betterment” since its founding in 1937.
The nature-versus-nurture debate hit a major inflection point in 2003, when the Human Genome Project was declared complete. After 13 years and at a cost of nearly $3 billion, an international consortium of thousands of researchers had sequenced 92% of the human genome for the first time.
Today, the cost of sequencing a genome can be as low as $600, and one company says it will soon drop even further. This dramatic reduction has made it possible to build massive DNA databases like the UK Biobank and the National Institutes of Health’s All of Us, each containing genetic data from more than half a million volunteers. Resources like these have enabled researchers to conduct genome-wide association studies, or GWASs, which identify correlations between genetic variants and human traits by analyzing single-nucleotide polymorphisms (SNPs)—the most common form of genetic variation between individuals. The findings from these studies serve as a reference point for developing polygenic risk scores.
Most GWASs have focused on disease prevention and personalized medicine. But in 2011, a group of medical researchers, social scientists, and economists launched the Social Science Genetic Association Consortium (SSGAC) to investigate the genetic basis of complex social and behavioral outcomes. One of the phenotypes they focused on was the level of education people reached.
“It was a bit of a phenotype of convenience,” explains Patrick Turley, an economist and member of the steering committee at SSGAC, given that educational attainment is routinely recorded in surveys when genetic data is collected. Still, it was “clear that genes play some role,” he says. “And trying to understand what that role is, I think, is really interesting.” He adds that social scientists can also use genetic data to try to better “understand the role that is due to nongenetic pathways.”
Many on the left are generally willing to allow that any number of traits, from addiction to obesity, are genetically influenced. Yet heritable cognitive ability seems to be “beyond the pale for us to integrate as a source of difference.”
The work immediately stirred feelings of discomfort—not least among the consortium’s own members, who feared that they might unintentionally help reinforce racism, inequality, and genetic determinism.
It’s also created quite a bit of discomfort in some political circles, says Kathryn Paige Harden, a psychologist and behavioral geneticist at the University of Texas in Austin, who says she has spent much of her career making the unpopular argument to fellow liberals that genes are relevant predictors of social outcomes.
Harden thinks a strength of those on the left is their ability to recognize “that bodies are different from each other in a way that matters.” Many are generally willing to allow that any number of traits, from addiction to obesity, are genetically influenced. Yet, she says, heritable cognitive ability seems to be “beyond the pale for us to integrate as a source of difference that impacts our life.”
Harden believes that genes matter for our understanding of traits like intelligence, and that this should help shape progressive policymaking. She gives the example of an education department seeking policy interventions to improve math scores in a given school district. If a polygenic risk score is “as strongly correlated with their school grades” as family income is, she says of the students in such a district, then “does deliberately not collecting that [genetic] information, or not knowing about it, make your research harder [and] your inferences worse?”
To Harden, persisting with this strategy of avoidance for fear of encouraging eugenicists is a mistake. If “insisting that IQ is a myth and genes have nothing to do with it was going to be successful at neutralizing eugenics,” she says, “it would’ve won by now.”
Part of the reason these ideas are so taboo in many circles is that today’s debate around genetic determinism is still deeply infused with Galton’s ideas—and has become a particular fixation among the online right.
SELMAN DESIGN
After Elon Musk took over Twitter (now X) in 2022 and loosened its restrictions on hate speech, a flood of accounts started sharing racist posts, some speculating about the genetic origins of inequality while arguing against immigration and racial integration. Musk himself frequently reposts and engages with accounts like Crémieux Recueil, the pen name of independent researcher Jordan Lasker, who has written about the “Black-White IQ gap,” and i/o, an anonymous account that once praised Musk for “acknowledging data on race and crime,” saying it “has done more to raise awareness of the disproportionalities observed in these data than anything I can remember.” (In response to allegations that his research encourages eugenics, Lasker wrote to MIT Technology Review, “The popular understanding of eugenics is about coercion and cutting people cast as ‘undesirable’ out of the breeding pool. This is nothing like that, so it doesn’t qualify as eugenics by that popular understanding of the term.” After going to print, i/o wrote in an email, “Just because differences in intelligence at the individual level are largely heritable, it does not mean that group differences in measured intelligence … are due to genetic differences between groups,” but that the latter is not “scientifically settled” and “an extremely important (and necessary) research area that should be funded rather than made taboo.” He added, “I’ve never made any argument against racial integration or intermarriage or whatever.”X and Musk did not respond to requests for comment.)
Harden, though, warns against discounting the work of an entire field because of a few noisy neoreactionaries. “I think there can be this idea that technology is giving rise to the terrible racism,” she says. The truth, she believes, is that “the racism has preexisted any of this technology.”
In 2019, a company called Genomic Prediction began to offer the first preimplantation polygenic testing that had ever been made commercially available. With its LifeView Embryo Health Score, prospective parents are able to assess their embryos’ predisposition to genetically complex health problems like cancer, diabetes, and heart disease. Pricing for the service starts at $3,500. Genomic Prediction uses a technique called an SNP array, which targets specific sites in the genome where common variants occur. The results are then cross-checked against GWASs that show correlations between genetic variants and certain diseases.
Four years later, a company named Orchid began offering a competing test. Orchid’s Whole Genome Embryo Report distinguished itself by claiming to sequence more than 99% of an embryo’s genome, allowing it to detect novel mutations and, the company says, diagnose rare diseases more accurately. For $2,500 per embryo, parents can access polygenic risk scores for 12 disorders, including schizophrenia, breast cancer, and hypothyroidism.
Orchid was founded by a woman named Noor Siddiqui. Before getting undergraduate and graduate degrees from Stanford, she was awarded the Thiel fellowship—a $200,000 grant given to young entrepreneurs willing to work on their ideas instead of going to college—back when she was a teenager, in 2012. This set her up to attract attention from members of the tech elite as both customers and financial backers. Her company has raised $16.5 million to date from investors like Ethereum founder Vitalik Buterin, former Coinbase CTO Balaji Srinivasan, and Armstrong, the Coinbase CEO.
In August Siddiqui made the controversial suggestion that parents who choose not to use genetic testing might be considered irresponsible. “Just be honest: you’re okay with your kid potentially suffering for life so you can feel morally superior …” she wrote on X.
Americans have varied opinions on the emerging technology. In 2024, a group of bioethicists surveyed 1,627 US adults to determine attitudes toward a variety of polygenic testing criteria. A large majority approved of testing for physical health conditions like cancer, heart disease, and diabetes. Screening for mental health disorders, like depression, OCD, and ADHD, drew a more mixed—but still positive—response. Appearance-related traits, like skin color, baldness, and height, received less approval as something to test for.
Intelligence was among the most contentious traits—unsurprising given the way it has been weaponized throughout history and the lack of cultural consensus on how it should even be defined. (In many countries, intelligence testing for embryos is heavily regulated; in the UK, the practice is banned outright.) In the 2024 survey, 36.9% of respondents approved of preimplantation genetic testing for intelligence, 40.5% disapproved, and 22.6% said they were uncertain.
Despite the disagreement, intelligence has been among the traits most talked about as targets for testing. From early on, Genomic Prediction says, it began receiving inquiries “from all over the world” about testing for intelligence, according to Diego Marin, the company’s head of global business development and scientific affairs.
At one time, the company offered a predictor for what it called “intellectual disability.” After some backlash questioning both the predictive capacity and the ethics of these scores, the company discontinued the feature. “Our mission and vision of this company is not to improve [a baby], but to reduce risk for disease,” Marin told me. “When it comes to traits about IQ or skin color or height or something that’s cosmetic and doesn’t really have a connotation of a disease, then we just don’t invest in it.”
Orchid, on the other hand, does test for genetic markers associated with intellectual disability and developmental delay. But that may not be all. According to one employee of the company, who spoke on the condition of anonymity, intelligence testing is also offered to “high-roller” clients. According to this employee, another source close to the company, and reporting in the Washington Post, Musk used Orchid’s services in the conception of at least one of the children he shares with the tech executive Shivon Zilis. (Orchid, Musk, and Zilis did not respond to requests for comment.)
I met Kian Sadeghi, the 25-year-old founder of New York–based Nucleus Genomics, on a sweltering July afternoon in his SoHo office. Slight and kinetic, Sadeghi spoke at a machine-gun pace, pausing only occasionally to ask if I was keeping up.
Sadeghi had modified his first organism—a sample of brewer’s yeast—at the age of 16. As a high schooler in 2016, he was taking a course on CRISPR-Cas9 at a Brooklyn laboratory when he fell in love with the “beautiful depth” of genetics. Just a few years later, he dropped out of college to build “a better 23andMe.”
His company targets what you might call the application layer of PGT-P, accepting data from IVF clinics—and even from the competitors mentioned in this story—and running its own computational analysis.
“Unlike a lot of the other testing companies, we’re software first, and we’re consumer first,” Sadeghi told me. “It’s not enough to give someone a polygenic score. What does that mean? How do you compare them? There’s so many really hard design problems.”
Like its competitors, Nucleus calculates its polygenic risk scores by comparing an individual’s genetic data with trait-associated variants identified in large GWASs, providing statistically informed predictions.
Nucleus provides two displays of a patient’s results: a Z-score, plotted from –4 to 4, which explains the risk of a certain trait relative to a population with similar genetic ancestry (for example, if Embryo #3 has a 2.1 Z-score for breast cancer, its risk is higher than average), and an absolute risk score, which includes relevant clinical factors (Embryo #3 has a minuscule actual risk of breast cancer, given that it is male).
The real difference between Nucleus and its competitors lies in the breadth of what it claims to offer clients. On its sleek website, prospective parents can sort through more than 2,000 possible diseases, as well as traits from eye color to IQ. Access to the Nucleus Embryo platform costs $8,999, while the company’s new IVF+ offering—which includes one IVF cycle with a partner clinic, embryo screening for up to 20 embryos, and concierge services throughout the process—starts at $24,999.
“Maybe you want your baby to have blue eyes versus green eyes,” Nucleus founder Kian Sadeghi said at a June event. “That is up to the liberty of the parents.”
Its promises are remarkably bold. The company claims to be able to forecast a propensity for anxiety, ADHD, insomnia, and other mental issues. It says you can see which of your embryos are more likely to have alcohol dependence, which are more likely to be left-handed, and which might end up with severe acne or seasonal allergies. (Nevertheless, at the time of writing, the embryo-screening platform provided this disclaimer: “DNA is not destiny. Genetics can be a helpful tool for choosing an embryo, but it’s not a guarantee. Genetic research is still in it’s [sic] infancy, and there’s still a lot we don’t know about how DNA shapes who we are.”)
To people accustomed to sleep trackers, biohacking supplements, and glucose monitoring, taking advantage of Nucleus’s options might seem like a no-brainer. To anyone who welcomes a bit of serendipity in their life, this level of perceived control may be disconcerting to say the least.
Sadeghi likes to frame his arguments in terms of personal choice. “Maybe you want your baby to have blue eyes versus green eyes,” he told a small audience at Nucleus Embryo’s June launch event. “That is up to the liberty of the parents.”
On the official launch day, Sadeghi spent hours gleefully sparring with X users who accused him of practicing eugenics. He rejects the term, favoring instead “genetic optimization”—though it seems he wasn’t too upset about the free viral marketing. “This week we got five million impressions on Twitter,” he told a crowd at the launch event, to a smattering of applause. (In an email to MIT Technology Review, Sadeghi wrote, “The history of eugenics is one of coercion and discrimination by states and institutions; what Nucleus does is the opposite—genetic forecasting that empowers individuals to make informed decisions.”)
Nucleus has raised more than $36 million from investors like Srinivasan, Alexis Ohanian’s venture capital firm Seven Seven Six, and Thiel’s Founders Fund. (Like Siddiqui, Sadeghi was a recipient of a Thiel fellowship when he dropped out of college; a representative for Thiel did not respond to a request for comment for this story.) Sadeghi has even poached Genomic Prediction’s cofounder Nathan Treff, who is now Nucleus’s chief clinical officer.
Sadeghi’s real goal is to build a one-stop shop for every possible application of genetic sequencing technology, from genealogy to precision medicine to genetic engineering. He names a handful of companies providing these services, with a combined market cap in the billions. “Nucleus is collapsing all five of these companies into one,” he says. “We are not an IVF testing company. We are a genetic stack.”
This spring, I elbowed my way into a packed hotel bar in the Flatiron district, where over a hundred people had gathered to hear a talk called “How to create SUPERBABIES.” The event was part of New York’s Deep Tech Week, so I expected to meet a smattering of biotech professionals and investors. Instead, I was surprised to encounter a diverse and curious group of creatives, software engineers, students, and prospective parents—many of whom had come with no previous knowledge of the subject.
The speaker that evening was Jonathan Anomaly, a soft-spoken political philosopher whose didactic tone betrays his years as a university professor.
Some of Anomaly’s academic work has focused on developing theories of rational behavior. At Duke and the University of Pennsylvania, he led introductory courses on game theory, ethics, and collective action problems as well as bioethics, digging into thorny questions about abortion, vaccines, and euthanasia. But perhaps no topic has interested him so much as the emerging field of genetic enhancement.
In 2018, in a bioethics journal, Anomaly published a paper with the intentionally provocative title “Defending Eugenics.” He sought to distinguish what he called “positive eugenics”—noncoercive methods aimed at increasing traits that “promote individual and social welfare”—from the so-called “negative eugenics” we know from our history books.
Anomaly likes to argue that embryo selection isn’t all that different from practices we already take for granted. Don’t believe two cousins should be allowed to have children? Perhaps you’re a eugenicist, he contends. Your friend who picked out a six-foot-two Harvard grad from a binder of potential sperm donors? Same logic.
His hiring at the University of Pennsylvania in 2019 caused outrage among some students, who accused him of “racial essentialism.” In 2020, Anomaly left academia, lamenting that “American universities had become an intellectual prison.”
A few years later, Anomaly joined a nascent PGT-P company named Herasight, which was promising to screen for IQ.
At the end of July, the company officially emerged from stealth mode. A representative told me that most of the money raised so far is from angel investors, including Srinivasan, who also invested in Orchid and Nucleus. According to the launch announcement on X, Herasight has screened “hundreds of embryos” for private customers and is beginning to offer its first publicly available consumer product, a polygenic assessment that claims to detect an embryo’s likelihood of developing 17 diseases.
Their marketing materials boast predictive abilities 122% better than Orchid’s and 193% better than Genomic Prediction’s for this set of diseases. (“Herasight is comparing their current predictor to models we published over five years ago,” Genomic Prediction responded in a statement. “Our team is confident our predictors are world-class and are not exceeded in quality by any other lab.”)
The company did not include comparisons with Nucleus, pointing to the “absence of published performance validations” by that company and claiming it represented a case where “marketing outpaces science.” (“Nucleus is known for world-class science and marketing, and we understand why that’s frustrating to our competitors,” a representative from the company responded in a comment.)
Herasight also emphasized new advances in “within-family validation” (making sure that the scores are not merely picking up shared environmental factors by comparing their performance between unrelated people to their performance between siblings) and “cross-ancestry accuracy” (improving the accuracy of scores for people outside the European ancestry groups where most of the biobank data is concentrated). The representative explained that pricing varies by customer and the number of embryos tested, but it can reach $50,000.
When it comes to traits that Jonathan Anomaly believes are genetically encoded, intelligence is just the tip of the iceberg. He has also spoken about the heritability of empathy, violence, religiosity, and political leanings.
Herasight tests for just one non-disease-related trait: intelligence. For a couple who produce 10 embryos, it claims it can detect an IQ spread of about 15 points, from the lowest-scoring embryo to the highest. The representative says the company plans to release a detailed white paper on its IQ predictor in the future.
The day of Herasight’s launch, Musk responded to the company announcement: “Cool.” Meanwhile, a Danish researcher named Emil Kirkegaard, whose research has largely focused on IQ differences between racial groups, boosted the company to his nearly 45,000 followers on X (as well as in a Substack blog), writing, “Proper embryo selection just landed.” Kirkegaard has in fact supported Anomaly’s work for years; he’s posted about him on X and recommended his 2020 book Creating Future People, which he called a “biotech eugenics advocacy book,” adding: “Naturally, I agree with this stuff!”
When it comes to traits that Anomaly believes are genetically encoded, intelligence—which he claimed in his talk is about 75% heritable—is just the tip of the iceberg. He has also spoken about the heritability of empathy, impulse control, violence, passivity, religiosity, and political leanings.
Anomaly concedes there are limitations to the kinds of relative predictions that can be made from a small batch of embryos. But he believes we’re only at the dawn of what he likes to call the “reproductive revolution.” At his talk, he pointed to a technology currently in development at a handful of startups: in vitro gametogenesis. IVG aims to create sperm or egg cells in a laboratory using adult stem cells, genetically reprogrammed from cells found in a sample of skin or blood. In theory, this process could allow a couple to quickly produce a practically unlimited number of embryos to analyze for preferred traits. Anomaly predicted this technology could be ready to use on humans within eight years.
SELMAN DESIGN
“I doubt the FDA will allow it immediately. That’s what places like Próspera are for,” he said, referring to the so-called “startup city” in Honduras, where scientists and entrepreneurs can conduct medicalexperiments free from the kinds of regulatory oversight they’d encounter in the US.
“You might have a moral intuition that this is wrong,” said Anomaly, “but when it’s discovered that elites are doing it privately … the dominoes are going to fall very, very quickly.” The coming “evolutionary arms race,” he claimed, will “change the moral landscape.”
He added that some of those elites are his own customers: “I could already name names, but I won’t do it.”
After Anomaly’s talk was over, I spoke with a young photographer who told me he was hoping to pursue a master’s degree in theology. He came to the event, he told me, to reckon with the ethical implications of playing God. “Technology is sending us toward an Old-to-New-Testament transition moment, where we have to decide what parts of religion still serve us,” he said soberly.
Criticisms of polygenic testing tend to fall into two camps: skepticism about the tests’ effectiveness and concerns about their ethics. “On one hand,” says Turley from the Social Science Genetic Association Consortium, “you have arguments saying ‘This isn’t going to work anyway, and the reason it’s bad is because we’re tricking parents, which would be a problem.’ And on the other hand, they say, ‘Oh, this is going to work so well that it’s going to lead to enormous inequalities in society.’ It’s just funny to see. Sometimes these arguments are being made by the same people.”
One of those people is Sasha Gusev, who runs a quantitative genetics lab at the Dana-Farber Cancer Institute. A vocal critic of PGT-P for embryo selection, he also often engages in online debates with the far-right accounts promoting race science on X.
Gusev is one of many professionals in his field who believe that because of numerous confounding socioeconomic factors—for example, childhood nutrition, geography, personal networks, and parenting styles—there isn’t much point in trying to trace outcomes like educational attainment back to genetics, particularly not as a way to prove that there’s a genetic basis for IQ.
He adds, “I think there’s a real risk in moving toward a society where you see genetics and ‘genetic endowments’ as the drivers of people’s behavior and as a ceiling on their outcomes and their capabilities.”
Gusev thinks there is real promise for this technology in clinical settings among specific adult populations. For adults identified as having high polygenic risk scores for cancer and cardiovascular disease, he argues, a combination of early screening and intervention could be lifesaving. But when it comes to the preimplantation testing currently on the market, he thinks there are significant limitations—and few regulatory measures or long-term validation methods to check the promises companies are making. He fears that giving these services too much attention could backfire.
“These reckless, overpromised, and oftentimes just straight-up manipulative embryo selection applications are a risk for the credibility and the utility of these clinical tools,” he says.
Many IVF patients have also had strong reactions to publicity around PGT-P. When the New York Timespublished an opinion piece about Orchid in the spring, angry parents took to Reddit to rant. One user posted, “For people who dont [sic] know why other types of testing are necessary or needed this just makes IVF people sound like we want to create ‘perfect’ babies, while we just want (our) healthy babies.”
Still, others defended the need for a conversation. “When could technologies like this change the mission from helping infertile people have healthy babies to eugenics?” one Redditor posted. “It’s a fine line to walk and an important discussion to have.”
Some PGT-P proponents, like Kirkegaard and Anomaly, have argued that policy decisions should more explicitly account for genetic differences. In a series of blog posts following the 2024 presidential election, under the header “Make science great again,” Kirkegaard called for ending affirmative action laws, legalizing race-based hiring discrimination, and removing restrictions on data sets like the NIH’s All of Us biobank that prevent researchers like him from using the data for race science. Anomaly has criticized social welfare policies for putting a finger on the scale to “punish the high-IQ people.”
Indeed, the notion of genetic determinism has gained some traction among loyalists to President Donald Trump.
In October 2024, Trump himself made a campaign stop on the conservative radio program The Hugh Hewitt Show. He began a rambling answer about immigration and homicide statistics. “A murderer, I believe this, it’s in their genes. And we got a lot of bad genes in our country right now,” he told the host.
Gusev believes that while embryo selection won’t have much impact on individual outcomes, the intellectual framework endorsed by many PGT-P advocates could have dire social consequences.
“If you just think of the differences that we observe in society as being cultural, then you help people out. You give them better schooling, you give them better nutrition and education, and they’re able to excel,” he says. “If you think of these differences as being strongly innate, then you can fool yourself into thinking that there’s nothing that can be done and people just are what they are at birth.”
For the time being, there are no plans for longitudinal studies to track actual outcomes for the humans these companies have helped bring into the world. Harden, the behavioral geneticist from UT Austin, suspects that 25 years down the line, adults who were once embryos selected on the basis of polygenic risk scores are “going to end up with the same question that we all have.” They will look at their life and wonder, “What would’ve had to change for it to be different?”
Julia Black is a Brooklyn-based features writer and a reporter in residence at Omidyar Network. She has previously worked for Business Insider, Vox, The Information, and Esquire.
It’s the 25th of June and I’m shivering in my lab-issued underwear in Fort Worth, Texas. Libby Cowgill, an anthropologist in a furry parka, has wheeled me and my cot into a metal-walled room set to 40 °F. A loud fan pummels me from above and siphons the dregs of my body heat through the cot’s mesh from below. A large respirator fits snug over my nose and mouth. The device tracks carbon dioxide in my exhales—a proxy for how my metabolism speeds up or slows down throughout the experiment. Eventually Cowgill will remove my respirator to slip a wire-thin metal temperature probe several pointy inches into my nose.
Cowgill and a graduate student quietly observe me from the corner of their so-called “climate chamber.” Just a few hours earlier I’d sat beside them to observe as another volunteer, a 24-year-old personal trainer, endured the cold. Every few minutes, they measured his skin temperature with a thermal camera, his core temperature with a wireless pill, and his blood pressure and other metrics that hinted at how his body handles extreme cold. He lasted almost an hour without shivering; when my turn comes, I shiver aggressively on the cot for nearly an hour straight.
I’m visiting Texas to learn about this experiment on how different bodies respond to extreme climates. “What’s the record for fastest to shiver so far?” I jokingly ask Cowgill as she tapes biosensing devices to my chest and legs. After I exit the cold, she surprises me: “You, believe it or not, were not the worst person we’ve ever seen.”
Climate change forces us to reckon with the knotty science of how our bodies interact with the environment.
Cowgill is a 40-something anthropologist at the University of Missouri who powerlifts and teaches CrossFit in her spare time. She’s small and strong, with dark bangs and geometric tattoos. Since 2022, she’s spent the summers at the University of North Texas Health Science Center tending to these uncomfortable experiments. Her team hopes to revamp the science of thermoregulation.
While we know in broad strokes how people thermoregulate, the science of keeping warm or cool is mottled with blind spots. “We have the general picture. We don’t have a lot of the specifics for vulnerable groups,” says Kristie Ebi, an epidemiologist with the University of Washington who has studied heat and health for over 30 years. “How does thermoregulation work if you’ve got heart disease?”
“Epidemiologists have particular tools that they’re applying for this question,” Ebi continues. “But we do need more answers from other disciplines.”
Climate change is subjecting vulnerable people to temperatures that push their limits. In 2023, about 47,000 heat-related deaths are believed to have occurred in Europe. Researchers estimate that climate change could add an extra 2.3 million European heat deaths this century. That’s heightened the stakes for solving the mystery of just what happens to bodies in extreme conditions.
Extreme temperatures already threaten large stretches of the world. Populations across the Middle East, Asia, and sub-Saharan Africa regularly face highs beyond widely accepted levels of human heat tolerance. Swaths of the southern US, northern Europe, and Asia now also endure unprecedented lows: The 2021 Texas freeze killed at least 246 people, and a 2023 polar vortex sank temperatures in China’s northernmost city to a hypothermic record of –63.4 °F.
This change is here, and more is coming. Climate scientists predict that limiting emissions can prevent lethal extremes from encroaching elsewhere. But if emissions keep course, fierce heat and even cold will reach deeper into every continent. About 2.5 billion people in the world’s hottest places don’t have air-conditioning. When people do, it can make outdoor temperatures even worse, intensifying the heat island effect in dense cities. And neither AC nor radiators are much help when heat waves and cold snaps capsize the power grid.
COURTESY OF MAX G. LEVY
COURTESY OF MAX G. LEVY
COURTESY OF MAX G. LEVY
“You, believe it or not, were not the worst person we’ve ever seen,” the author was told after enduring Cowgill’s “climate chamber.”
Through experiments like Cowgill’s, researchers around the world are revising rules about when extremes veer from uncomfortable to deadly. Their findings change how we should think about the limits of hot and cold—and how to survive in a new world.
Embodied change
Archaeologists have known for some time that we once braved colder temperatures than anyone previously imagined. Humans pushed into Eurasia and North America well before the last glacial period ended about 11,700 years ago. We were the only hominins to make it out of this era. Neanderthals, Denisovans, and Homo floresiensis all went extinct. We don’t know for certain what killed those species. But we do know that humans survived thanks to protection from clothing, large social networks, and physiological flexibility. Human resilience to extreme temperature is baked into our bodies, behavior, and genetic code. We wouldn’t be here without it.
“Our bodies are constantly in communication with the environment,” says Cara Ocobock, an anthropologist at the University of Notre Dame who studies how we expend energy in extreme conditions. She has worked closely with Finnish reindeer herders and Wyoming mountaineers.
But the relationship between bodies and temperature is surprisingly still a mystery to scientists. In 1847, the anatomist Carl Bergmann observed that animal species grow larger in cold climates. The zoologist Joel Asaph Allen noted in 1877 that cold-dwellers had shorter appendages. Then there’s the nose thing: In the 1920s, the British anthropologist Arthur Thomson theorized that people in cold places have relatively long, narrow noses, the better to heat and humidify the air they take in. These theories stemmed from observations of animals like bears and foxes, and others that followed stemmed from studies comparing the bodies of cold-accustomed Indigenous populations with white male control groups. Some, like those having to do with optimization of surface area, do make sense: It seems reasonable that a tall, thin body increases the amount of skin available to dump excess heat. The problem is, scientists have never actually tested this stuff in humans.
“Our bodies are constantly in communication with the environment.”
Cara Ocobock, anthropologist, University of Notre Dame
Some of what we know about temperature tolerance thus far comes from century-old race science or assumptions that anatomy controls everything. But science has evolved. Biology has matured. Childhood experiences, lifestyles, fat cells, and wonky biochemical feedback loops can contribute to a picture of the body as more malleable than anything imagined before. And that’s prompting researchers to change how they study it.
“If you take someone who’s super long and lanky and lean and put them in a cold climate, are they gonna burn more calories to stay warm than somebody who’s short and broad?” Ocobock says. “No one’s looked at that.”
Ocobock and Cowgill teamed up with Scott Maddux and Elizabeth Cho at the Center for Anatomical Sciences at the University of North Texas Health Fort Worth. All four are biological anthropologists who have also puzzled over whether the rules Bergmann, Allen, and Thomson proposed are actually true.
For the past four years, the team has been studying how factors like metabolism, fat, sweat, blood flow, and personal history control thermoregulation.
Your native climate, for example, may influence how you handle temperature extremes. In a unique study of mortality statistics from 1980s Milan, Italians raised in warm southern Italy were more likely to survive heat waves in the northern part of the country.
Similar trends have appeared in cold climes. Researchers often measure cold tolerance by a person’s “brown adipose,” a type of fat that is specialized for generating heat (unlike white fat, which primarily stores energy). Brown fat is a cold adaptation because it delivers heat without the mechanism of shivering. Studies have linked it to living in cold climates, particularly at young ages. Wouter van Marken Lichtenbelt, the physiologist at Maastricht University who with colleagues discovered brown fat in adults, has shown that this tissue can further activate with cold exposure and even help regulate blood sugar and influence how the body burns other fat.
That adaptability served as an early clue for the Texas team. They want to know how a person’s response to hot and cold correlates with height, weight, and body shape. What is the difference, Maddux asks, between “a male who’s 6 foot 6 and weighs 240 pounds” and someone else in the same environment “who’s 4 foot 10 and weighs 89 pounds”? But the team also wondered if shape was only part of the story.
Their multi-year experiment uses tools that anthropologists couldn’t have imagined a century ago—devices that track metabolism in real time and analyze genetics. Each participant gets a CT scan (measuring body shape), a DEXA scan (estimating percentages of fat and muscle), high-resolution 3D scans, and DNA analysis from saliva to examine ancestry genetically.
Volunteers lie on a cot in underwear, as I did, for about 45 minutes in each climate condition, all on separate days. There’s dry cold, around 40 °F, akin to braving a walk-in refrigerator. Then dry heat and humid heat: 112 °F with 15% humidity and 98 °F with 85% humidity. They call it “going to Vegas” and “going to Houston,” says Cowgill. The chamber session is long enough to measure an effect, but short enough to be safe.
Before I traveled to Texas, Cowgill told me she suspected the old rules would fall. Studies linking temperature tolerance to race and ethnicity, for example, seemed tenuous because biological anthropologists today reject the concept of distinct races. It’s a false premise, she told me: “No one in biological anthropology would argue that human beings do not vary across the globe—that’s obvious to anyone with eyes. [But] you can’t draw sharp borders around populations.”
She added, “I think there’s a substantial possibility that we spend four years testing this and find out that really, limb length, body mass, surface area […] are not the primary things that are predicting how well you do in cold and heat.”
Adaptable to a degree
In July 1995, a week-long heat wave pushed Chicago above 100 °F, killing roughly 500 people. Thirty years later, Ollie Jay, a physiologist at the University of Sydney, can duplicate the conditions of that exceptionally humid heat wave in a climate chamber at his laboratory.
“We can simulate the Chicago heat wave of ’95. The Paris heat wave of 2003. The heat wave [in early July of this year] in Europe,” Jay says. “As long as we’ve got the temperature and humidity information, we can re-create those conditions.”
“Everybody has quite an intimate experience of feeling hot, so we’ve got 8 billion experts on how to keep cool,” he says. Yet our internal sense of when heat turns deadly is unreliable. Even professional athletes overseen by experienced medics have died after missing dangerous warning signs. And little research has been done to explore how vulnerable populations such as elderly people, those with heart disease, and low-income communities with limited access to cooling respond to extreme heat.
Jay’s team researches the most effective strategies for surviving it. He lambastes air-conditioning, saying it demands so much energy that it can aggravate climate change in “a vicious cycle.” Instead, he has monitored people’s vital signs while they use fans and skin mists to endure three hours in humid and dry heat. In results published last year, his research found that fans reduced cardiovascular strain by 86% for people with heart disease in the type of humid heat familiar in Chicago.
Dry heat was a different story. In that simulation, fans not only didn’t help but actually doubled the rate at which core temperatures rose in healthy older people.
Heat kills. But not without a fight. Your body must keep its internal temperature in a narrow window flanking 98 °F by less than two degrees. The simple fact that you’re alive means you are producing heat. Your body needs to export that heat without amassing much more. The nervous system relaxes narrow blood vessels along your skin. Your heart rate increases, propelling more warm blood to your extremities and away from your organs. You sweat. And when that sweat evaporates, it carries a torrent of body heat away with it.
This thermoregulatory response can be trained. Studies by van Marken Lichtenbelt have shown that exposure to mild heat increases sweat capacity, decreases blood pressure, and drops resting heart rate. Long-term studies based on Finnish saunas suggest similar correlations.
The body may adapt protectively to cold, too. In this case, body heat is your lifeline. Shivering and exercise help keep bodies warm. So can clothing. Cardiovascular deaths are thought to spike in cold weather. But people more adapted to cold seem better able to reroute their blood flow in ways that keep their organs warm without dropping their temperature too many degrees in their extremities.
Earlier this year, the biological anthropologist Stephanie B. Levy (no relation) reported that New Yorkers who experienced lower average temperatures had more productive brown fat, adding evidence for the idea that the inner workings of our bodies adjust to the climate throughout the year and perhaps even throughout our lives. “Do our bodies hold a biological memory of past seasons?” Levy wonders. “That’s still an open question. There’s some work in rodent models to suggest that that’s the case.”
Although people clearly acclimatize with enough strenuous exposures to either cold or heat, Jay says, “you reach a ceiling.” Consider sweat: Heat exposure can increase the amount you sweat only until your skin is completely saturated. It’s a nonnegotiable physical limit. Any additional sweat just means leaking water without carrying away any more heat. “I’ve heard people say we’ll just find a way of evolving out of this—we’ll biologically adapt,” Jay says. “Unless we’re completely changing our body shape, then that’s not going to happen.”
And body shape may not even sway thermoregulation as much as previously believed. The subject I observed, a personal trainer, appeared outwardly adapted for cold: his broad shoulders didn’t even fit in a single CT scan image. Cowgill supposed that this muscle mass insulated him. When he emerged from his session in the 40 °F environment, though, he had finally started shivering—intensely. The researchers covered him in a heated blanket. He continued shivering. Driving to lunch over an hour later in a hot car, he still mentioned feeling cold. An hour after that, a finger prick drew no blood, a sign that blood vessels in his extremities remained constricted. His body temperature fell about half a degree C in the cold session—a significant drop—and his wider build did not appear to shield him from the cold as well as my involuntary shivering protected me.
I asked Cowgill if perhaps there is no such thing as being uniquely predisposed to hot or cold. “Absolutely,” she said.
A hot mess
So if body shape doesn’t tell us much about how a person maintains body temperature, and acclimation also runs into limits, then how do we determine how hot is too hot?
In 2010 two climate change researchers, Steven Sherwood and Matthew Huber, argued that regions around the world become uninhabitable at wet-bulb temperatures of 35 °C, or 95 °F. (Wet-bulb measurements are a way to combine air temperature and relative humidity.) Above 35 °C, a person simply wouldn’t be able to dissipate heat quickly enough. But it turns out that their estimate was too optimistic.
Researchers “ran with” that number for a decade, says Daniel Vecellio, a bioclimatologist at the University of Nebraska, Omaha. “But the number had never been actually empirically tested.” In 2021 a Pennsylvania State University physiologist, W. Larry Kenney, worked with Vecellio and others to test wet-bulb limits in a climate chamber. Kenney’s lab investigates which combinations of temperature, humidity, and time push a person’s body over the edge.
Not long after, the researchers came up with their own wet-bulb limit of human tolerance: below 31 °C in warm, humid conditions for the youngest cohort, people in their thermoregulatory prime. Their research suggests that a day reaching 98 °F and 65% humidity, for example, poses danger in a matter of hours, even for healthy people.
JUSTIN CLEMONS
JUSTIN CLEMONS
JUSTIN CLEMONS
Cowgill and her colleagues Elizabeth Cho (top) and Scott Maddux prepare graduate student Joanna Bui for a “room-temperature test.”
In 2023, Vecellio and Huber teamed up, combining the growing arsenal of lab data with state-of-the-art climate simulations to predict where heat and humidity most threatened global populations: first the Middle East and South Asia, then sub-Saharan Africa and eastern China. And assuming that warming reaches 3 to 4 °C over preindustrial levels this century, as predicted, parts of North America, South America, and northern and central Australia will be next.
Last June, Vecellio, Huber, and Kenney co-published an article revising the limits that Huber had proposed in 2010. “Why not 35 °C?” explained why the human limits have turned out to be lower than expected. Those initial estimates overlooked the fact that our skin temperature can quickly jump above 101 °F in hot weather, for example, making it harder to dump internal heat.
The Penn State team has published deep dives on how heat tolerance changes with sex and age. Older participants’ wet-bulb limits wound up being even lower—between 27 and 28 °C in warm, humid conditions—and varied more from person to person than they did in young people. “The conditions that we experience now—especially here in North America and Europe, places like that—are well below the limits that we found in our research,” Vecellio says. “We know that heat kills now.”
What this fast-growing body of research suggests, Vecellio stresses, is that you can’t define heat risk by just one or two numbers. Last year, he and researchers at Arizona State University pulled up the hottest 10% of hours between 2005 and 2020 for each of 96 US cities. They wanted to compare recent heat-health research with historical weather data for a new perspective: How frequently is it so hot that people’s bodies can’t compensate for it? Over 88% of those “hot hours” met that criterion for people in full sun. In the shade, most of those heat waves became meaningfully less dangerous.
“There’s really almost no one who ‘needs’ to die in a heat wave,” says Ebi, the epidemiologist. “We have the tools. We have the understanding. Essentially all [those] deaths are preventable.”
More than a number
A year after visiting Texas, I called Cowgill to hear what she was thinking after four summers of chamber experiments. She told me that the only rule about hot and cold she currently stands behind is … well, none.
She recalled a recent participant—the smallest man in the study, weighing 114 pounds. “He shivered like a leaf on a tree,” Cowgill says. Normally, a strong shiverer warms up quickly. Core temperature may even climb a little. “This [guy] was just shivering and shivering and shivering and not getting any warmer,” she says. She doesn’t know why this happened. “Every time I think I get a picture of what’s going on in there, we’ll have one person come in and just kind of be a complete exception to the rule,” she says, adding that you can’t just gloss over how much human bodies vary inside and out.
The same messiness complicates physiology studies.
Jay looks to embrace bodily complexities by improving physiological simulations of heat and the human strain it causes. He’s piloted studies that input a person’s activity level and type of clothing to predict core temperature, dehydration, and cardiovascular strain based on the particular level of heat. One can then estimate the person’s risk on the basis of factors like age and health. He’s also working on physiological models to identify vulnerable groups, inform early-warning systems ahead of heat waves, and possibly advise cities on whether interventions like fans and mists can help protect residents. “Heat is an all-of-society issue,” Ebi says. Officials could better prepare the public for cold snaps this way too.
“Death is not the only thing we’re concerned about,” Jay adds. Extreme temperatures bring morbidity and sickness and strain hospital systems: “There’s all these community-level impacts that we’re just completely missing.”
Climate change forces us to reckon with the knotty science of how our bodies interact with the environment. Predicting the health effects is a big and messy matter.
The first wave of answers from Fort Worth will materialize next year. The researchers will analyze thermal images to crunch data on brown fat. They’ll resolve whether, as Cowgill suspects, your body shape may not sway temperature tolerance as much as previously assumed. “Human variation is the rule,” she says, “not the exception.”
Max G. Levy is an independent journalist who writes about chemistry, public health, and the environment.
Be honest: Have you ever looked up someone from your childhood on social media with the sole intention of seeing how they’ve aged?
One of my colleagues, who shall remain nameless, certainly has. He recently shared a photo of a former classmate. “Can you believe we’re the same age?” he asked, with a hint of glee in his voice. A relative also delights in this pastime. “Wow, she looks like an old woman,” she’ll say when looking at a picture of someone she has known since childhood. The years certainly are kinder to some of us than others.
But wrinkles and gray hairs aside, it can be difficult to know how well—or poorly—someone’s body is truly aging, under the hood. A person who develops age-related diseases earlier in life, or has other biological changes associated with aging (such as elevated cholesterol or markers of inflammation), might be considered “biologically older” than a similar-age person who doesn’t have those changes. Some 80-year-olds will be weak and frail, while others are fit and active.
Doctors have long used functional tests that measure their patients’ strength or the distance they can walk, for example, or simply “eyeball” them to guess whether they look fit enough to survive some treatment regimen, says Tamir Chandra, who studies aging at the Mayo Clinic.
But over the past decade, scientists have been uncovering new methods of looking at the hidden ways our bodies are aging. What they’ve found is changing our understanding of aging itself.
“Aging clocks” are new scientific tools that can measure how our organs are wearing out, giving us insight into our mortality and health. They hint at our biological age. While chronological age is simply how many birthdays we’ve had, biological age is meant to reflect something deeper. It measures how our bodies are handling the passing of time and—perhaps—lets us know how much more of it we have left. And while you can’t change your chronological age, you just might be able to influence your biological age.
It’s not just scientists who are using these clocks. Longevity influencers like Bryan Johnson often use them to make the case that they are aging backwards. “My telomeres say I’m 10 years old,” Johnson posted on X in April. The Kardashians have tried them too (Khloé was told on TV that her biological age was 12 years below her chronological age). Even my local health-food store offers biological age testing. Some are pushing the use of clocks even further, using them to sell unproven “anti-aging” supplements.
The science is still new, and few experts in the field—some of whom affectionately refer to it as “clock world”—would argue that an aging clock can definitively reveal an individual’s biological age.
But their work is revealing that aging clocks can offer so much more than an insta-brag, a snake-oil pitch—or even just an eye-catching number. In fact, they are helping scientists unravel some of the deepest mysteries in biology: Why do we age? How do we age? When does aging begin? What does it even mean to age?
Ultimately, and most importantly, they might soon tell us whether we can reverse the whole process.
Clocks kick off
The way your genes work can change. Molecules called methyl groups can attach to DNA, controlling the way genes make proteins. This process is called methylation, and it can potentially occur at millions of points along the genome. These epigenetic markers, as they are known, can switch genes on or off, or increase or decrease how much protein they make. They’re not part of our DNA, but they influence how it works.
In 2011, Steve Horvath, then a biostatistician at the University of California, Los Angeles, took part in a study that was looking for links between sexual orientation and these epigenetic markers. Steve is straight; he says his twin brother, Markus, who also volunteered, is gay.
That study didn’t find a link between DNA methylation and sexual orientation. But when Horvath looked at the data, he noticed a different trend—a very strong link between age and methylation at around 88 points on the genome. He once told me he fell off his chair when he saw it.
Many of the affected genes had already been linked to age-related brain and cardiovascular diseases, but it wasn’t clear how methylation might be related to those diseases.
If a model could work out what average aging looks like, it could potentially estimate whether someone was aging unusually fast or slowly. It could transform medicine and fast-track the search for an anti-aging drug. It could help us understand what aging is, and why it happens at all.
In 2013, Horvath collected methylation data from 8,000 tissue and cell samples to create what he called the Horvath clock—essentially a mathematical model that could estimate age on the basis of DNA methylation at 353 points on the genome. From a tissue sample, it was able to detect a person’s age within a range of 2.9 years.
That clock changed everything. Its publication in 2013 marked the birth of “clock world.” To some, the possibilities were almost endless. If a model could work out what average aging looks like, it could potentially estimate whether someone was aging unusually fast or slowly. It could transform medicine and fast-track the search for an anti-aging drug. It could help us understand what aging is, and why it happens at all.
The epigenetic clock was a success story in “a field that, frankly, doesn’t have a lot of success stories,” says João Pedro de Magalhães, who researches aging at the University of Birmingham, UK.
It took a few years, but as more aging researchers heard about the clock, they began incorporating it into their research and even developing their own clocks. Horvath became a bit of a celebrity. Scientists started asking for selfies with him at conferences, he says. Some researchers even made T-shirts bearing the front page of his 2013 paper.
Some of the many other aging clocks developed since have become notable in their own right. Examples include the PhenoAge clock, which incorporates health data such as blood cell counts and signs of inflammation along with methylation, and the Dunedin Pace of Aging clock, which tells you how quickly or slowly a person is aging rather than pointing to a specific age. Many of the clocks measure methylation, but some look at other variables, such as proteins in blood or certain carbohydrate molecules that attach to such proteins.
Today, there are hundreds or even thousands of clocks out there, says Chiara Herzog, who researches aging at King’s College London and is a member of the Biomarkers of Aging Consortium. Everyone has a favorite. Horvath himself favors his GrimAge clock, which was named after the Grim Reaper because it is designed to predict time to death.
That clock was trained on data collected from people who were monitored for decades, many of whom died in that period. Horvath won’t use it to tell people when they might die of old age, he stresses, saying that it wouldn’t be ethical. Instead, it can be used to deliver a biological age that hints at how long a person might expect to live. Someone who is 50 but has a GrimAge of 60 can assume that, compared with the average 50-year-old, they might be a bit closer to the end.
GrimAge is not perfect. While it can strongly predict time to death given the health trajectory someone is on, no aging clock can predict if someone will start smoking or get a divorce (which generally speeds aging) or suddenly take up running (which can generally slow it). “People are complicated,” Horvath tells MIT Technology Review. “There’s a huge error bar.”
But accuracy is a challenge for all aging clocks. Part of the problem lies in how they were designed. Most of the clocks were trained to link age with methylation. The best clocks will deliver an estimate that reflects how far a person’s biology deviates from the average. Aging clocks are still judged on how well they can predict a person’s chronological age, but you don’t want them to be too close, says Lucas Paulo de Lima Camillo, head of machine learning at Shift Bioscience, who was awarded $10,000 by the Biomarkers of Aging Consortium for developing a clock that could estimate age within a range of 2.55 years.
None of the clocks are precise enough to predict the biological age of a single person.Putting the same biological sample through five different clocks will give you five wildly different results.
LEON EDLER
“There’s this paradox,” says Camillo. If a clock is really good at predicting chronological age, that’s all it will tell you—and it probably won’t reveal much about your biological age. No one needs an aging clock to tell them how many birthdays they’ve had. Camillo says he’s noticed that when the clocks get too close to “perfect” age prediction, they actually become less accurate at predicting mortality.
Therein lies the other central issue for scientists who develop and use aging clocks: What is the thing they are really measuring? It is a difficult question for a field whose members notoriously fail to agree on the basics. (Everything from the definition of aging to how it occurs and why is up for debate among the experts.)
They do agree that aging is incredibly complex. A methylation-based aging clock might tell you about how that collection of chemical markers compares across individuals, but at best, it’s only giving you an idea of their “epigenetic age,” says Chandra. There are probably plenty of other biological markers that might reveal other aspects of aging, he says: “None of the clocks measure everything.”
We don’t know why some methyl groups appear or disappear with age, either. Are these changes causing damage? Or are they a by-product of it? Are the epigenetic patterns seen in a 90-year-old a sign of deterioration? Or have they been responsible for keeping that person alive into very old age?
To make matters even more complicated, two different clocks can give similar answers by measuring methylation at entirely different regions of the genome. No one knows why, or which regions might be the best ones to focus on.
“The biomarkers have this black-box quality,” says Jesse Poganik at Brigham and Women’s Hospital in Boston. “Some of them are probably causal, some of them may be adaptive … and some of them may just be neutral”: either “there’s no reason for them not to happen” or “they just happen by random chance.”
What we know is that, as things stand, none of the clocks are precise enough to predict the biological age of a single person (sorry, Khloé). Putting the same biological sample through five different clocks will give you five wildly different results.
Even the same clock can give you different answers if you put a sample through it more than once. “They’re not yet individually predictive,” says Herzog. “We don’t know what [a clock result] means for a person, [or if] they’re more or less likely to develop disease.”
And it’s why plenty of aging researchers—even those who regularly use the clocks in their work—haven’t bothered to measure their own epigenetic age. “Let’s say I do a clock and it says that my biological age … is five years older than it should be,” says Magalhães. “So what?” He shrugs. “I don’t see much point in it.”
You might think this lack of clarity would make aging clocks pretty useless in a clinical setting. But plenty of clinics are offering them anyway. Some longevity clinics are more careful, and will regularly test their patients with a range of clocks, noting their results and tracking them over time. Others will simply offer an estimate of biological age as part of a longevity treatment package.
And then there are the people who use aging clocks to sell supplements. While no drug or supplement has been definitively shown to make people live longer, that hasn’t stopped the lightly regulated wellness industry from pushing a range of “treatments” that range from lotions to herbal pills all the way through to stem-cell injections.
Some of these people come to aging meetings. I was in the audience at an event when one CEO took to the stage to claim he had reversed his own biological age by 18 years—thanks to the supplement he was selling. Tom Weldon of Ponce de Leon Health told us his gray hair was turning brown. His biological age was supposedly reversing so rapidly that he had reached “longevity escape velocity.”
But if the people who buy his supplements expect some kind of Benjamin Button effect, they might be disappointed. His company hasn’t yet conducted a randomized controlled trial to demonstrate any anti-aging effects of that supplement, called Rejuvant. Weldon says that such a trial would take years and cost millions of dollars, and that he’d “have to increase the price of our product more than four times” to pay for one. (The company has so far tested the active ingredient in mice and carried out a provisional trial in people.)
More generally, Horvath says he “gets a bad taste in [his] mouth” when people use the clocks to sell products and “make a quick buck.” But he thinks that most of those sellers have genuine faith in both the clocks and their products. “People truly believe their own nonsense,” he says. “They are so passionate about what they discovered, they fall into this trap of believing [their] own prejudices.”
The accuracy of the clocks is at a level that makes them useful for research, but not for individual predictions. Even if a clock did tell someone they were five years younger than their chronological age, that wouldn’t necessarily mean the person could expect to live five years longer, says Magalhães. “The field of aging has long been a rich ground for snake-oil salesmen and hype,” he says. “It comes with the territory.” (Weldon, for his part, says Rejuvant is the only product that has “clinically meaningful” claims.)
In any case, Magalhães adds that he thinks any publicity is better than no publicity.
And there’s the rub. Most people in the longevity field seem to have mixed feelings about the trendiness of aging clocks and how they are being used. They’ll agree that the clocks aren’t ready for consumer prime time, but they tend to appreciate the attention. Longevity research is expensive, after all. With a surge in funding and an explosion in the number of biotech companies working on longevity, aging scientists are hopeful that innovation and progress will follow.
So they want to be sure that the reputation of aging clocks doesn’t end up being tarnished by association. Because while influencers and supplement sellers are using their “biological ages” to garner attention, scientists are now using these clocks to make some remarkable discoveries. Discoveries that are changing the way we think about aging.
How to be young again
Two little mice lie side by side, anesthetized and unconscious, as Jim White prepares his scalpel. The animals are of the same breed but look decidedly different. One is a youthful three-month-old, its fur thick, black, and glossy. By comparison, the second mouse, a 20-month-old, looks a little the worse for wear. Its fur is graying and patchy. Its whiskers are short, and it generally looks kind of frail.
But the two mice are about to have a lot more in common. White, with some help from a colleague, makes incisions along the side of each mouse’s body and into the upper part of an arm and leg on the same side. He then carefully stitches the two animals together—membranes, fascia, and skin.
The procedure takes around an hour, and the mice are then roused from their anesthesia. At first, the two still-groggy animals pull away from each other. But within a few days, they seem to have accepted that they now share their bodies. Soon their circulatory systems will fuse, and the animals will share a blood flow too.
“People are complicated. There’s a huge error bar.” — Steve Horvath, former biostatistician at the University of California, Los Angeles
LEON EDLER
White, who studies aging at Duke University, has been stitching mice together for years; he has performed this strange procedure, known as heterochronic parabiosis, more than a hundred times. And he’s seen a curious phenomenon occur. The older mice appear to benefit from the arrangement. They seem to get younger.
Experiments with heterochronic parabiosis have been performed for decades, but typically scientists keep the mice attached to each other for only a few weeks, says White. In their experiment, he and his colleagues left the mice attached for three months—equivalent to around 10 human years. The team then carefully separated the animals to assess how each of them had fared. “You’d think that they’d want to separate immediately,” says White. “But when you detach them … they kind of follow each other around.”
The most striking result of that experiment was that the older mice who had been attached to a younger mouse ended up living longer than other mice of a similar age. “[They lived] around 10% longer, but [they] also maintained a lot of [their] function,” says White. They were more active and maintained their strength for longer, he adds.
When his colleagues, including Poganik, applied aging clocks to the mice, they found that their epigenetic ages were lower than expected. “The young circulation slowed aging in the old mice,” says White. The effect seemed to last, too—at least for a little while. “It preserved that youthful state for longer than we expected,” he says.
The young mice went the other way and appeared biologically older, both while they were attached to the old mice and shortly after they were detached. But in their case, the effect seemed to be short-lived, says White: “The young mice went back to being young again.”
To White, this suggests that something about the “youthful state” might be programmed in some way. That perhaps it is written into our DNA. Maybe we don’t have to go through the biological process of aging.
This gets at a central debate in the aging field: What is aging, and why does it happen? Some believe it’s simply a result of accumulated damage. Some believe that the aging process is programmed; just as we grow limbs, develop a brain, reach puberty, and experience menopause, we are destined to deteriorate. Others think programs that play an important role in our early development just turn out to be harmful later in life by chance. And there are some scientists who agree with all of the above.
White’s theory is that being old is just “a loss of youth,” he says. If that’s the case, there’s a silver lining: Knowing how youth is lost might point toward a way to somehow regain it, perhaps by restoring those youthful programs in some way.
Dogs and dolphins
Horvath’s eponymous clock was developed by measuring methylation in DNA samples taken from tissues around the body. It seems to represent aging in all these tissues, which is why Horvath calls it a pan-tissue clock. Given that our organs are thought to age differently, it was remarkable that a single clock could measure aging in so many of them.
But Horvath had ambitious plans for an even more universal clock: a pan-species model that could measure aging in all mammals. He started out, in 2017, with an email campaign that involved asking hundreds of scientists around the world to share samples of tissues from animals they had worked with. He tried zoos, too.
The pan-mammalian clock suggests that there is something universal about aging—not just that all mammals experience it in a similar way, but that a similar set of genetic or epigenetic factors might be responsible for it.
“I learned that people had spent careers collecting [animal] tissues,” he says. “They had freezers full of [them].” Amenable scientists would ship those frozen tissues, or just DNA, to Horvath’s lab in California, where he would use them to train a new model.
Horvath says he initially set out to profile 30 different species. But he ended up receiving around 15,000 samples from 200 scientists, representing 348 species—including everything from dogs to dolphins. Could a single clock really predict age in all of them?
“I truly felt it would fail,” says Horvath. “But it turned out that I was completely wrong.” He and his colleagues developed a clock that assessed methylation at 36,000 locations on the genome. The result, which was published in 2023 as the pan-mammalian clock, can estimate the age of any mammal and even the maximum lifespan of the species. The data set is open to anyone who wants to download it, he adds: “I hope people will mine the data to find the secret of how to extend a healthy lifespan.”
The pan-mammalian clock suggests that there is something universal about aging—not just that all mammals experience it in a similar way, but that a similar set of genetic or epigenetic factors might be responsible for it.
Comparisons between mammals also support the idea that the slower methylation changes occur, the longer the lifespan of the animal, says Nelly Olova, an epigeneticist who researches aging at the University of Edinburgh in the UK. “DNA methylation slowly erodes with age,” she says. “We still have the instructions in place, but they become a little messier.” The research in different mammals suggests that cells can take only so much change before they stop functioning.
“There’s a finite amount of change that the cell can tolerate,” she says. “If the instructions become too messy and noisy … it cannot support life.”
Olova has been investigating exactly when aging clocks first begin to tick—in other words, the point at which aging starts. Clocks can be trained on data from volunteers, and by matching the patterns of methylation on their DNA to their chronological age. The trained clocks are then typically used to estimate the biological age of adults. But they can also be used on samples from children. Or babies. They can be used to work out the biological age of cells that make up embryos.
In her research, Olova used adult skin cells, which—thanks to Nobel Prize–winning research in the 2000s—can be “reprogrammed” back to a state resembling that of the pluripotent stem cells found in embryos. When Olova and her colleagues used a “partial reprogramming” approach to take cells close to that state, they found that the closer they got to the entirely reprogrammed state, the “younger” the cells were.
It was around 20 days after the cells had been reprogrammed into stem cells that they reached the biological age of zero according to the clock used, says Olova. “It was a bit surreal,” she says. “The pluripotent cells measure as minus 0.5; they’re slightly below zero.”
Vadim Gladyshev, a prominent aging researcher at Harvard University, has since proposed that the same negative level of aging might apply to embryos. After all, some kind of rejuvenation happens during the early stages of embryo formation—an aged egg cell and an aged sperm cell somehow create a brand-new cell. The slate is wiped clean.
Gladyshev calls this point “ground zero.” He posits that it’s reached sometime during the “mid-embryonic state.” At this point, aging begins. And so does “organismal life,” he argues. “It’s interesting how this coincides with philosophical questions about when life starts,” says Olova.
Some have argued that life begins when sperm meets egg, while others have suggested that the point when embryonic cells start to form some kind of unified structure is what counts. The ground zero point is when the body plan is set out and cells begin to organize accordingly, she says. “Before that, it’s just a bunch of cells.”
This doesn’t mean that life begins at the embryonic state, but it does suggest that this is when aging begins—perhaps as the result of “a generational clearance of damage,” says Poganik.
It is early days—no pun intended—for this research, and the science is far from settled. But knowing when aging begins could help inform attempts to rewind the clock. If scientists can pinpoint an ideal biological age for cells, perhaps they can find ways to get old cells back to that state. There might be a way to slow aging once cells reach a certain biological age, too.
“Presumably, there may be opportunities for targeting aging before … you’re full of gray hair,” says Poganik. “It could mean that there is an ideal window for intervention which is much earlier than our current geriatrics-based approach.”
When young meets old
When White first started stitching mice together, he would sit and watch them for hours. “I was like, look at them go! They’re together, and they don’t even care!” he says. Since then, he’s learned a few tricks. He tends to work with female mice, for instance—the males tend to bicker and nip at each other, he says. The females, on the other hand, seem to get on well.
The effect their partnership appears to have on their biological ages, if only temporarily, is among the ways aging clocks are helping us understand that biological age is plastic to some degree. White and his colleagues have also found, for instance, that stress seems to increase biological age, but that the effect can be reversed once the stress stops. Both pregnancy and covid-19 infections have a similar reversible effect.
Poganik wonders if this finding might have applications for human organ transplants. Perhaps there’s a way to measure the biological age of an organ before it is transplanted and somehow rejuvenate organs before surgery.
But new data from aging clocks suggests that this might be more complicated than it sounds. Poganik and his colleagues have been using methylation clocks to measure the biological age of samples taken from recently transplanted hearts in living people.
If being old is simply a case of losing our youthfulness, then that might give us a clue to how we can somehow regain it.
Young hearts do well in older bodies, but the biological age of these organs eventually creeps up to match that of their recipient. The same is true for older hearts in younger bodies, says Poganik, who has not yet published his findings. “After a few months, the tissue may assimilate the biological age of the organism,” he says.
If that’s the case, the benefits of young organs might be short-lived. It also suggests that scientists working on ways to rejuvenate individual organs may need to focus their anti-aging efforts on more systemic means of rejuvenation—for example, stem cells that repopulate the blood. Reprogramming these cells to a youthful state, perhaps one a little closer to “ground zero,” might be the way to go.
Whole-body rejuvenation might be some way off, but scientists are still hopeful that aging clocks might help them find a way to reverse aging in people.
“We have the machinery to reset our epigenetic clock to a more youthful state,” says White. “That means we have the ability to turn the clock backwards.”
The story is a collaboration between MIT Technology Review and Aventine, a non-profit research foundation that creates and supports content about how technology and science are changing the way we live.
It’s not often you get a text about the robustness of your immune system, but that’s what popped up on my phone last spring. Sent by John Tsang, an immunologist at Yale, the text came after his lab had put my blood through a mind-boggling array of newfangled tests. The result—think of it as a full-body, high-resolution CT scan of my immune system—would reveal more about the state of my health than any test I had ever taken. And it could potentially tell me far more than I wanted to know.
“David,” the text read, “you are the red dot.”
Tsang was referring to an image he had attached to the text that showed a graph with a scattering of black dots representing other people whose immune systems had been evaluated—and a lone red one. There also was a score: 0.35.
I had no idea what any of this meant.
The red dot was the culmination of an immuno-quest I had begun on an autumn afternoon a few months earlier, when a postdoc in Tsang’s lab drew several vials of my blood. It was also a significant milestone in a decades-long journey I’ve taken as a journalist covering life sciences and medicine. Over the years, I’ve offered myself up as a human guinea pig for hundreds of tests promising new insights into my health and mortality. In 2001, I was one of the first humans to have my DNA sequenced. Soon after, in the early 2000s, researchers tapped into my proteome—proteins circulating in my blood. Then came assessments of my microbiome, metabolome, and much more. I have continued to test-drive the latest protocols and devices, amassing tens of terabytes of data on myself, and I’ve reported on the results in dozens of articles and a book called Experimental Man. Over time, the tests have gotten better and more informative, but no test I had previously taken promised to deliver results more comprehensive or closer to revealing the truth about my underlying state of health than what John Tsang was offering.
Over the years, I’ve offered myself up as a human guinea pig for hundreds of tests promising new insights into my health and mortality. But no test I had previously taken promised to deliver results more comprehensive or closer to revealing the truth about my underlying state of health.
It also was not lost on me that I’m now 20-plus years older than I was when I took those first tests. Back in my 40s, I was ridiculously healthy. Since then, I’ve been battered by various pathogens, stresses, and injuries, including two bouts of covid and long covid—and, well, life.
But I’d kept my apprehensions to myself as Tsang, a slim, perpetually smiling man who directs the Yale Center for Systems and Engineering Immunology, invited me into his office in New Haven to introduce me to something called the human immunome.
John Tsang has helped create a new test for your immune system.
JULIE BIDWELL
Made up of 1.8 trillion cells and trillions more proteins, metabolites, mRNA, and other biomolecules, every person’s immunome is different, and it is constantly changing. It’s shaped by our DNA, past illnesses, the air we have breathed, the food we have eaten, our age, and the traumas and stresses we have experienced—in short, everything we have ever been exposed to physically and emotionally. Right now, your immune system is hard at work identifying and fending off viruses and rogue cells that threaten to turn cancerous—or maybe already have. And it is doing an excellent job of it all, or not, depending on how healthy it happens to be at this particular moment.
Yet as critical as the immunome is to each of us, this universe of cells and molecules has remained largely beyond the reach of modern medicine—a vast yet inaccessible operating system that powerfully influences everything from our vulnerability to viruses and cancer to how well we age to whether we tolerate certain foods better than others.
Now, thanks to a slew of new technologies and to scientists like Tsang, who is on the Steering Committee of the Chan Zuckerberg Biohub New York, understanding this vital and mysterious system is within our grasp, paving the way for powerful new tools and tests to help us better assess, diagnose and treat diseases.
Already, new research is revealing patterns in the ways our bodies respond to stress and disease. Scientists are creating contrasting portraits of weak and robust immunomes—portraits that someday, it’s hoped, could offer new insights into patient care and perhaps detect illnesses before symptoms appear. There are plans afoot to deploy this knowledge and technology on a global scale, which would enable scientists to observe the effects of climate, geography, and countless other factors on the immunome. The results could transform what it means to be healthy and how we identify and treat disease.
It all begins with a test that can tell you whether your immune system is healthy or not.
Reading the immunome
Sitting in his office last fall, Tsang—a systems immunologist whose expertise combines computer science and immunology—began my tutorial in immunomics by introducing me to a study that he and his team wrote up in a 2024 paper published in Nature Medicine. It described the results of measurements made on blood samples taken from 270 subjects—tests similar to the ones Tsang’s team would be running on me. In the study, Tsang and his colleagues looked at the immune systems of 228 patients diagnosed with a variety of genetic disorders and a control group of 42 healthy people.
To help me visualize what my results might look like, Tsang opened his laptop to reveal several colorful charts from the study, punctuated by black dots representing each person evaluated. The results reminded me vaguely of abstract paintings by Joan Miró. But in place of colorful splotches, whirls, and circles were an assortment of scatter plots, Gantt charts, and heat maps tinted in greens, blues, oranges, and purples.
It all looked like gibberish to me.
Luckily, Tsang was willing to serve as my guide. Flashing his perpetually patient smile, he explained that these colorful jumbles depicted what his team had uncovered about each subject after taking blood samples and assessing the details of how well their immune cells, proteins, mRNA, and other immune system components were doing their job.
IBRAHIM RAYINTAKATH
The results placed people—represented by the individual dots—on a left-to-right continuum, ranging from those with unhealthy immunomes on the left to those with healthy immunomes on the right. Background colors, meanwhile, were used to identify people with different medical conditions affecting their immune systems. For example, olive-green indicated those with auto-immune disorders; orange backgrounds were designated for individuals with no known disease history. Tsang said he and his team would be placing me on a similar graph after they finished analyzing my blood.
Tsang’s measurements go significantly beyond what can be discerned from the handful of immune biomarkers that people routinely get tested for today. “The main immune cell panel typically ordered by a physician is called a CBC differential,” he told me. CBC, which stands for “complete blood count,” is a decades-old type of analysis that counts levels of red blood cells, hemoglobin, and basic immune cell types (neutrophils, lymphocytes, monocytes, basophils, and eosinophils). Changes in these levels can indicate whether a person’s immune system might be reacting to a virus or other infection, cancer, or something else. Other blood tests—like one that looks for elevated levels of C-reactive protein, which can indicate inflammation associated with heart disease—are more specific than the CBC. But they still rely on blunt counting—in this case of certain proteins.
Tsang’s assessment, by contrast, tests up to a million cells, proteins, mRNA and immune biomolecules—significantly more than the CBC and others. His protocol is designed to paint a more holistic portrait of a person’s immune system by not only counting cells and molecules but also by assessing their interactions. The CBC “doesn’t tell me as a physician what the cells being counted are doing,” says Rachel Sparks, a clinical immunologist who was the lead author of the Nature Medicine study and is now a translational medicine physician with the drug giant AstraZeneca. “I just know that there are more neutrophils than normal, which may or may not indicate that they’re behaving badly. We now have technology that allows us to see at a granular level what a cell is actually doing when a virus appears—how it’s changing and reacting.”
Tsang’s measurements go significantly beyond what can be discerned from the handful of immune biomarkers that people routinely get tested for today. His assessment tests up to a million cells, proteins, mRNA and immune biomolecules.
Such breakthroughs have been made possible thanks to a raft of new and improved technologies that have evolved over the past decade, allowing scientists like Tsang and Sparks to explore the intricacies of the immunome with newfound precision. These include devices that can count myriad different types of cells and biomolecules, as well as advanced sequencers that identify and characterize DNA, RNA, proteins, and other molecules. There are now instruments that also can measure thousands of changes and reactions that occur inside a single immune cell as it reacts to a virus or other threat.
Tsang and Spark’s’ team used data generated by such measurements to identify and characterize a series of signals distinctive to unhealthy immune systems. Then they used the presence or absence of these signals to create a numerical assessment of the health of a person’s immunome—a score they call an “immune health metric,” or IHM.
Clinical immunologist Rachel Sparks hopes new tests can improve medical care.
JARED SOARES
To make sense of the crush of data being collected, Tsang’s team used machine-learning algorithms that correlated the results of the many measurements with a patient’s known health status and age. They also used AI to compare their findings with immune system data collected elsewhere. All this allowed them to determine and validate an IHM score for each person, and to place it on their spectrum, identifying that person as healthy or not.
It all came together for the first time with the publication of the Nature Medicine paper, in which Tsang and his colleagues reported the results from testing multiple immune variables in the 270 subjects. They also announced a remarkable discovery: Patients with different kinds of diseases reacted with similar disruptions to their immunomes. For instance, many showed a lower level of the aptly named natural killer immune cells, regardless of what they were suffering from. Critically, the immune profiles of those with diagnosed diseases tended to look very different from those belonging to the outwardly healthy people in the study. And, as expected, immune health declined in the older patients.
But then the results got really interesting. In a few cases, the immune systems of unhealthy and healthy people looked similar, with some people appearing near the “healthy” area of the chart even though they were known to have diseases. Most likely this was because their symptoms were in remission and not causing an immune reaction at the moment when their blood was drawn, Tsang told me.
In other cases, people without a known disease showed up on the chart closer to those who were known to be sick. “Some of these people who appear to be in good health are overlapping with pathology that traditional metrics can’t spot,” says Tsang, whose Nature Medicine paper reported that roughly half the healthy individuals in the study had IHM scores that overlapped with those of people known to be sick. Either these seemingly healthy people had normal immune systems that were busy fending off, say, a passing virus, or their immune systems had been impacted by aging and the vicissitudes of life. Potentially more worrisome, they were harboring an illness or stress that was not yet making them ill but might do so eventually.
These findings have obvious implications for medicine. Spotting a low immune score in a seemingly healthy person could make it possible to identify and start treating an illness before symptoms appear, diseases worsen, or tumors grow and metastasize. IHM-style evaluations could also provide clues as to why some people respond differently to viruses like the one that causes covid, and why vaccines—which are designed to activate a healthy immune system—might not work as well in people whose immune systems are compromised.
Spotting a low immune score in a seemingly healthy person could make it possible to identify and start treating an illness before symptoms appear, diseases worsen, or tumors grow and metastasize.
“One of the more surprising things about the last pandemic was that all sorts of random younger people who seemed very healthy got sick and then they were gone,” says Mark Davis, a Stanford immunologist who helped pioneer the science being developed in labs like Tsang’s. “Some had underlying conditions like obesity and diabetes, but some did not. So the question is, could we have pointed out that something was off with these folks’ immune systems? Could we have diagnosed that and warned people to take extra precautions?”
Tsang’s IHM test is designed to answer a simple question: What is the relative health of your immune system? But there are other assessments being developed to provide more detailed information on how the body is doing. Tsang’s own team is working on a panel of additional scores aimed at getting finer detail on specific immune conditions. These include a test that measures the health of a person’s bone marrow, which makes immune cells. “If you have a bone marrow stress or inflammatory condition in the bone marrow, you could have lower capacity to produce cells, which will be reflected by this score,” he says. Another detailed metric will measure protein levels to predict how a person will respond to a virus.
Tsang hopes that an IHM-style test will one day be part of a standard physical exam—a snapshot of a patient’s immune system that could inform care. For instance, has a period of intense stress compromised the immune system, making it less able to fend off this season’s flu? Will someone’s score predict a better or worse response to a vaccine or a cancer drug? How does a person’s immune system change with age?
Or, as I anxiously wondered while waiting to learn my own score, will the results reveal an underlying disorder or disease, silently ticking away until it shows itself?
Toward a human immunome project
The quest to create advanced tests like the IHM for the immune systembegan more than 15 years ago, when scientists like Mark Davis became frustrated with a field in which research—primarily in mice—was focused mostly on individual immune cells and proteins. In 2007 he launched the Stanford Human Immune Monitoring Center, one of the first efforts to conceptualize the human immunome as a holistic, body-wide network in human beings. Speaking by Zoom from his office in Palo Alto, California, Davis told me that the effort had spawned other projects, including a landmark twin study showing that a lot of immune variation is not genetic, which was then the prevailing theory, but is heavily influenced by environmental factors—a major shift in scientists’ understanding.
Shai Shen-Orr sees a day when people will check their immune scores on an app.
COURTESY OF SHAI SHEN-ORR
Davis and others also laid the groundwork for tests like John Tsang’s by discovering how a T cell—among the most common and important immune players—can recognize pathogens, cancerous cells, and other threats, triggering defensive measures that can include destroying the threat. This and other discoveries have revealed many of the basic mechanics of how immune cells work, says Davis, “but there’s still a lot we have to learn.”
One researcher working with Davis in those early days was Shai Shen-Orr, who is now director of the Zimin Institute for AI Solutions in Healthcare at the Technion-Israel Institute of Technology, based in Haifa, Israel. (He’s also a frequent collaborator with Tsang.) Shen-Orr, like Tsang, is a systems immunologist. He recalls that in 2007, when he was a postdoc in Davis’s lab, immunologists had identified around 100 cell types and a similar number of cytokines—proteins that act as messengers in the immune system. But they weren’t able to measure them simultaneously, which limited visibility into how the immune system works as a whole. Today, Shen-Orr says, immunologists can measure hundreds of cell types and thousands of proteins and watch them interact.
Shen-Orr’s current lab has developed its own version of an immunome test that he calls IMM-AGE (short for “immune age”), the basics of which were published in a 2019 paper in Nature Medicine. IMM-AGE looks at the composition of people’s immune systems—how many of each type of immune cell they have and how these numbers change as they age. His team has used this information primarily to ascertain a person’s risk of heart disease.
Shen-Orr also has been a vociferous advocate for expanding the pool of test samples, which now come mostly from Americans and Europeans. “We need to understand why different people in different environments react differently and how that works,” he says. “We also need to test a lot more people—maybe millions.”
Tsang has seen why a limited sample size can pose problems. In 2013, he says, researchers at the National Institutes of Health came up with a malaria vaccine that was effective for almost everyone who got it during clinical trials conducted in Maryland. “But in Africa,” he says, “it only worked for about 25% of the people.” He attributes this to the significant differences in genetics, diet, climate, and other environmental factors that cause people’s immunomes to develop differently. “Why?” he asks. “What exactly was different about the immune systems in Maryland and Tanzania? That’s what we need to understand so we can design personalized vaccines and treatments.”
“What exactly was different about the immune systems in Maryland and Tanzania? That’s what we need to understand so we can design personalized vaccines and treatments.”
John Tsang
For several years, Tsang and Shen-Orr have advocated going global with testing, “but there has been resistance,” Shen-Orr says. “Look, medicine is conservative and moves slowly, and the technology is expensive and labor intensive.” They finally got the audience they needed at a 2022 conference in La Jolla, California, convened by the Human Immunome Project, or HIP. (The organization was originally founded in 2016 to create more effective vaccines but had recently changed its name to emphasize a pivot from just vaccines to the wider field of immunome science.) It was in La Jolla that they met HIP’s then-new chairperson, Jane Metcalfe, a cofounder of Wired magazine, who saw what was at stake.
“We’ve got all of these advanced molecular immunological profiles being developed,” she said, “but we can’t begin to predict the breadth of immune system variability if we’re only testing small numbers of people in Palo Alto or Tel Aviv. And that’s when the big aha moment struck us that we need sites everywhere to collect that information so we can build proper computer models and a predictive understanding of the human immune system.”
IBRAHIM RAYINTAKATH
Following that meeting, HIP created a new scientific plan, with Tsang and Shen-Orr as chief science officers. The group set an ambitious goal of raising around $3 billion over the next 10 years—a goal Tsang and Metcalfe say will be met by working in conjunction with a broad network of public and private supporters. Cutbacks in federal funding for biomedical research in the US may limit funds from this traditional source, but HIP plans to work with government agencies outside the US too, with the goal of creating a comprehensive global immunological database.
HIP’s plan is to first develop a pilot version based on Tsang’s test, which it will call the Immune Monitoring Kit, to test a few thousand people in Africa, Australia, East Asia, Europe, the US, and Israel. The initial effort, according to Metcalfe, is expected to begin by the end of the year.
After that, HIP would like to expand to some 150 sites around the world, eventually assessing about 250,000 people and collecting a vast cache of data and insights that Tsang believes will profoundly affect—even revolutionize—clinical medicine, public health, and drug development.
My immune health metric score is …
As HIP develops its pilot study to take on the world, John Tsang, for better or worse, has added one more North American Caucasian male to the small number of people who have received an IHM score to date. That would be me.
It took a long time to get my score, but Tsang didn’t leave me hanging once he pinged me the red dot. “We plotted you with other participants who are clinically quite healthy,” he texted, referring to a cluster of black dots on the grid he had sent, although he cautioned that the group I’m being compared with includes only a few dozen people. “Higher IHM means better immune health,” he wrote, referring to my 0.35 score, which he described as a number on an arbitrary scale. “As you can see, your IHM is right in the middle of a bunch of people 20 years younger.”
This was a relief, given that our immune system, like so many other bodily functions, declines with age—though obviously at different rates. Yet I also felt a certain disappointment. To be honest, I had expected more granular detail after having a million or so cells and markers tested—like perhaps some insights on why I got long covid (twice) and others didn’t. Tsang and other scientists are working on ways to extract more specific information from the tests. Still, he insists that the single score itself is a powerful tool to understand the general state of our immunomes, indicating the absence or presence of underlying health issues that might not be revealed in traditional testing.
To be honest, I had expected more granular detail after having a million or so cells and markers tested—like perhaps some insights on why I got long covid (twice) and others didn’t.
I asked Tsang what my score meant for my future. “Your score is always changing depending on what you’re exposed to and due to age,” he said, adding that the IHM is still so new that it’s hard to know exactly what the score means until researchers do more work—and until HIP can evaluate and compare thousands or hundreds of thousands of people. They also need to keep testing me over time to see how my immune system changes as it’s exposed to new perturbations and stresses.
For now, I’m left with a simple number. Though it tells me little about the detailed workings of my immune system, the good news is that it raises no red flags. My immune system, it turns out, is pretty healthy.
A few days after receiving my score from Tsang, I heard from Shen-Orr about more results. Tsang had shared my data with his lab so that he could run his IMM-AGE protocol on my immunome and provide me with another score to worry about. Shen-Orr’s result put the age of my immune system at around 57—still 10 years younger than my true age.
The coming age of the immunome
Shai Shen-Orr imagines a day when people will be able to check their advanced IHM and IMM-AGE scores—or their HIP Immune Monitoring Kit score—on an app after a blood draw, the way they now check health data such as heart rate and blood pressure. Jane Metcalfe talks about linking IHM-type measurements and analyses with rising global temperatures and steamier days and nights to study how global warming might affect the immune system of, say, a newborn or a pregnant woman. “This could be plugged into other people’s models and really help us understand the effects of pollution, nutrition, or climate change on human health,” she says.
“I think [in 10 years] I’ll be able to use this much more granular understanding of what the immune system is doing at the cellular level in my patients. And hopefully we could target our therapies more directly to those cells or pathways that are contributing to disease.”
Rachel Sparks
Other clues could also be on the horizon. “At some point we’ll have IHM scores that can provide data on who will be most affected by a virus during a pandemic,” Tsang says. Maybe that will help researchers engineer an immune system response that shuts down the virus before it spreads. He says it’s possible to run a test like that now, but it remains experimental and will take years to fully develop, test for safety and accuracy, and establish standards and protocols for use as a tool of global public health. “These things take a long time,” he says.
The same goes for bringing IHM-style tests into the exam room, so doctors like Rachel Sparks can use the results to help treat their patients. “I think in 10 years, with some effort, we really could have something useful,” says Stanford’s Mark Davis. Sparks agrees. “I think by then I’ll be able to use this much more granular understanding of what the immune system is doing at the cellular level in my patients,” she says. “And hopefully we could target our therapies more directly to those cells or pathways that are contributing to disease.”
Personally, I’ll wait for more details with a mix of impatience, curiosity, and at least a hint of concern. I wonder what more the immune circuitry deep inside me might reveal about whether I’m healthy at this very moment, or will be tomorrow, or next month, or years from now.
Something is rotten in the city of Nunapitchuk. In recent years, a crack has formed in the middle of a house. Sewage has leached into the earth. Soil has eroded around buildings, leaving them perched atop precarious lumps of dirt. There are eternal puddles. And mold. The ground can feel squishy, sodden.
This small town in northern Alaska is experiencing a sometimes overlooked consequence of climate change: thawing permafrost. And Nunapitchuk is far from the only Arctic town to find itself in such a predicament.
Permafrost, which lies beneath about 15% of the land in the Northern Hemisphere, is defined as ground that has remained frozen for at least two years. Historically, much of the world’s permafrost has remained solid and stable for far longer, allowing people to build whole towns atop it. But as the planet warms, a process that is happening more rapidly near the poles than at more temperate latitudes, permafrost is thawing and causing a host of infrastructural and environmental problems.
Now scientists think they may be able to use satellite data to delve deep beneath the ground’s surface and get a better understanding of how the permafrost thaws, and which areas might be most severely affected because they had more ice to start with. Clues from the short-term behavior of those especially icy areas, seen from space, could portend future problems.
Using information gathered both from space and on the ground, they are working with affected communities to anticipate whether a house’s foundation will crack—and whether it is worth mending that crack or is better to start over in a new house on a stable hilltop. These scientists’ permafrost predictions are already helping communities like Nunapitchuk make those tough calls.
But it’s not just civilian homes that are at risk. One of the top US intelligence agencies, the National Geospatial-Intelligence Agency (NGA), is also interested in understanding permafrost better. That’s because the same problems that plague civilians in the high north also plague military infrastructure, at home and abroad. The NGA is, essentially, an organization full of space spies—people who analyze data from surveillance satellites and make sense of it for the country’s national security apparatus.
Understanding the potential instabilities of the Alaskan military infrastructure—which includes radar stations that watch for intercontinental ballistic missiles, as well as military bases and National Guard posts—is key to keeping those facilities in good working order and planning for their strengthened future. Understanding the potential permafrost weaknesses that could affect the infrastructure of countries like Russia and China, meanwhile, affords what insiders might call “situational awareness” about competitors.
The work to understand this thawing will only become more relevant, for civilians and their governments alike, as the world continues to warm.
The ground beneath
If you live much below the Arctic Circle, you probably don’t think a lot about permafrost. But it affects you no matter where you call home.
In addition to the infrastructural consequences for real towns like Nunapitchuk, thawing permafrost contains sequestered carbon—twice as much as currently inhabits the atmosphere. As the permafrost thaws, the process can release greenhouse gases into the atmosphere. That release can cause a feedback loop: Warmer temperatures thaw permafrost, which releases greenhouse gases, which warms the air more, which then—you get it.
The microbes themselves, along with previously trapped heavy metals, are also set dangerously free.
For many years, researchers’ primary options for understanding some of these freeze-thaw changes involved hands-on, on-the-ground surveys. But in the late 2000s, Kevin Schaefer, currently a senior scientist at the Cooperative Institute for Research in Environmental Sciences at the University of Colorado Boulder, started to investigate a less labor-intensive idea: using radar systems aboard satellites to survey the ground beneath.
This idea implanted itself in his brain in 2009, when he traveled to a place called Toolik Lake, southwest of the oilfields of Prudhoe Bay in Alaska. One day, after hours of drilling sample cores out of the ground to study permafrost, he was relaxing in the Quonset hut, chatting with colleagues. They began to discuss how space-based radar could potentially detect how the land sinks and heaves back up as temperatures change.
Huh, he thought. Yes,radar probably could do that.
Scientists call the ground right above permafrost the active layer. The water in this layer of soil contracts and expands with the seasons: during the summer, the ice suffusing the soil melts and the resulting decrease in volume causes the ground to dip. During the winter, the water freezes and expands, bulking the active layer back up. Radar can help measure that height difference, which is usually around one to five centimeters.
Schaefer realized that he could use radar to measure the ground elevation at the start and end of the thaw. The electromagnetic waves that bounce back at those two times would have traveled slightly different distances. That difference would reveal the tiny shift in elevation over the seasons and would allow him to estimate how much water had thawed and refrozen in the active layer and how far below the surface the thaw had extended.
With radar, Schaefer realized, scientists could cover a lot more literal ground, with less effort and at lower cost.
“It took us two years to figure out how to write a paper on it,” he says; no one had ever made those measurements before. He and colleagues presented the idea at the 2010 meeting of the American Geophysical Union and published a paper in 2012 detailing the method, using it to estimate the thickness of the active layer on Alaska’s North Slope.
When they did, they helped start a new subfield that grew as large-scale data sets started to become available around 5 to 10 years ago, says Roger Michaelides, a geophysicistat Washington University in St. Louis and a collaborator of Schaefer’s. Researchers’ efforts were aided by the growth in space radar systems and smaller, cheaper satellites.
With the availability of global data sets (sometimes for free, from government-run satellites like the European Space Agency’s Sentinel) and targeted observations from commercial companies like Iceye, permafrost studies are moving from bespoke regional analyses to more automated, large-scale monitoring and prediction.
The remote view
Simon Zwieback, a geospatial and environmental expert at the University of Alaska Fairbanks, sees the consequences of thawing permafrost firsthand every day. His office overlooks a university parking lot, a corner of which is fenced off to keep cars and pedestrians from falling into a brand-new sinkhole. That area of asphalt had been slowly sagging for more than a year, but over a week or two this spring, it finally started to collapse inward.
Kevin Schaefer stands on top of a melting layer of ice near the Alaskan pipeline on the North Slope of Alaska.
COURTESY OF KEVIN SCHAEFER
The new remote research methods are a large-scale version of Zwieback taking in the view from his window. Researchers look at the ground and measure how its height changes as ice thaws and refreezes. The approach can cover wide swaths of land, but it involves making assumptions about what’s going on below the surface—namely, how much ice suffuses the soil in the active layer and permafrost. Thawing areas with relatively low ice content could mimic thinner layers with more ice. And it’s important to differentiate the two, since more ice in the permafrost means more potential instability.
To check that they’re on the right track, scientists have historically had to go out into the field. But a few years ago, Zwieback started to explore a way to make better and deeper estimates of ice content using the available remote sensing data. Finding a way to make those kinds of measurements on a large scale was more than an academic exercise: Areas of what he calls “excess ice” are most liable to cause instability at the surface. “In order to plan in these environments, we really need to know how much ice there is, or where those locations are that are rich in ice,” he says.
Zwieback, who did his undergraduate and graduate studies in Switzerland and Austria, wasn’t always so interested in permafrost, or so deeply affected by it. But in 2014, when he was a doctoral student in environmental engineering, he joined an environmental field campaign in Siberia, at the Lena River Delta, which resembles a gigantic piece of coral fanning out into the Arctic Ocean. Zwieback was near a town called Tiksi, one of the world’s northernmost settlements. It’s a military outpost and starting point for expeditions to the North Pole, featuring an abandoned plane near the ocean. Its Soviet-era concrete buildings sometimes bring it to the front page of the r/UrbanHell subreddit.
Here, Zwieback saw part of the coastline collapse, exposing almost pure ice. It looked like a subterranean glacier, but it was permafrost. “That really had an indelible impact on me,” he says.
Later, as a doctoral student in Zurich and postdoc in Canada, he used his radar skills to understand the rapid changes that the activity of permafrost impressed upon the landscape.
And now, with his job in Fairbanks and his ideas about the use of radar sensing, he has done work funded by the NGA, which has an open Arctic data portal.
In his Arctic research, Zwieback started with the approach underlying most radar permafrost studies: looking at the ground’s seasonal subsidence and heave. “But that’s something that happens very close to the surface,” he says. “It doesn’t really tell us about these long-term destabilizing effects,” he adds.
In warmer summers, he thought, subtle clues would emerge that could indicate how much ice is buried deeper down.
For example, he expected those warmer-than-average periods to exaggerate the amount of change seen on the surface, making it easier to tell which areas are ice-rich. Land that was particularly dense with ice would dip more than it “should”—a precursor of bigger dips to come.
The first step, then, was to measure subsidence directly, as usual. But from there, Zwieback developed an algorithm to ingest data about the subsidence over time—as measured by radar—and other environmental information, like the temperatures at each measurement. He then created a digital model of the land that allowed him to adjust the simulated amount of ground ice and determine when it matched the subsidence seen in the real world. With that, researchers could infer the amount of ice beneath.
Next, he made maps of that ice that could potentially be useful to engineers—whether they were planning a new subdivision or, as his funders might be, keeping watch on a military airfield.
“What was new in my work was to look at these much shorter periods and use them to understand specific aspects of this whole system, and specifically how much ice there is deep down,” Zwieback says.
The NGA, which has also funded Schaefer’s work, did not respond to an initial request for comment but did later provide feedback for fact-checking. It removed an article on its website about Zwieback’s grant and its application to agency interests around the time that the current presidential administration began to ban mention of climate change in federal research. But the thawing earth is of keen concern.
To start, the US has significant military infrastructure in Alaska: It’s home to six military bases and 49 National Guard posts, as well as 21 missile-detecting radar sites. Most are vulnerable to thaw now or in the near future, given that 85% of the state is on permafrost.
Beyond American borders, the broader north is in a state of tension. Russia’s relations with Northern Europe are icy. Its invasion of Ukraine has left those countries fearing that they too could be invaded, prompting Sweden and Finland, for instance, to join NATO. The US has threatened takeovers of Greenland and Canada. And China—which has shipping and resource ambitions for the region—is jockeying to surpass the US as the premier superpower.
Permafrost plays a role in the situation. “As knowledge has expanded, so has the understanding that thawing permafrost can affect things NGA cares about, including the stability of infrastructure in Russia and China,” read the NGA article. Permafrost covers 60% of Russia, and thaws have affected more than 40% of buildings in northern Russia already, according to statements from the country’s minister of natural resources in 2021. Experts say critical infrastructure like roads and pipelines is at risk, along with military installations. That could weaken both Russia’s strategic position and the security of its residents. In China, meanwhile, according to a report from the Council on Strategic Risks, important moving parts like the Qinghai-Tibet Railway, “which allows Beijing to more quickly move military personnel near contested areas of the Indian border,” is susceptible to ground thaw—as are oil and gas pipelines linking Russia and China.
In the field
Any permafrost analysis that relies on data from space requires verification on Earth. The hope is that remote methods will become reliable enough to use on their own, but while they’re being developed, researchers must still get their hands muddy with more straightforward and longer tested physical methods. Some use a network called Circumpolar Active Layer Monitoring, which has existed since 1991, incorporating active-layer data from hundreds of measurement sites across the Northern Hemisphere.
Sometimes, that data comes from people physically probing an area; other sites use tubes permanently inserted into the ground, filled with a liquid that indicates freezing; still others use underground cables that measure soil temperature. Some researchers, like Schaefer, lug ground-penetrating radar systems around the tundra. He’s taken his system to around 50 sites and made more than 200,000 measurements of the active layer.
The field-ready ground-penetrating radar comes in a big box—the size of a steamer trunk—that emits radio pulses. These pulses bounce off the bottom of the active layer, or the top of the permafrost. In this case, the timing of that reflection reveals how thick the active layer is. With handles designed for humans, Schaefer’s team drags this box around the Arctic’s boggier areas.
The box floats. “I do not,” he says. He has vivid memories of tromping through wetlands, his legs pushing straight down through the muck, his body sinking up to his hips.
Andy Parsekian and Kevin Schaefer haul a ground penetrating radar unit through the tundra near Utqiagvik.
COURTESY OF KEVIN SCHAEFER
Zwieback also needs to verify what he infers from his space data. And so in 2022, he went to the Toolik Field station, a National Science Foundation–funded ecology research facility along the Dalton Highway and adjacent to Schaefer’s Toolik Lake. This road, which goes from Fairbanks up to the Arctic Ocean, is colloquially called the Haul Road; it was made famous in the TV show Ice Road Truckers. From this access point, Zwieback’s team needed to get deep samples of soil whose ice content could be analyzed in the lab.
Every day, two teams would drive along the Dalton Highway to get close to their field sites. Slamming their car doors, they would unload and hop on snow machines to travel the final distance. Often they would see musk oxen, looking like bison that never cut their hair. The grizzlies were also interested in these oxen, and in the nearby caribou.
At the sites they could reach, they took out a corer, a long, tubular piece of equipment driven by a gas engine, meant to drill deep into the ground. Zwieback or a teammate pressed it into the earth. The barrel’s two blades rotated, slicing a cylinder about five feet down to ensure that their samples went deep enough to generate data that can be compared with the measurements made from space. Then they pulled up and extracted the cylinder, a sausage of earth and ice.
All day every day for a week, they gathered cores that matched up with the pixels in radar images taken from space. In those cores, the ice was apparent to the eye. But Zwieback didn’t want anecdata. “We want to get a number,” he says.
So he and his team would pack their soil cylinders back to the lab. There they sliced them into segments and measured their volume, in both their frozen and their thawed form, to see how well the measured ice content matched estimates from the space-based algorithm.
The initial validation, which took months, demonstrated the value of using satellites for permafrost work. The ice profiles that Zwieback’s algorithm inferred from the satellite data matched measurements in the lab down to about 1.1 feet, and farther in a warm year, with some uncertainty near the surface and deeper into the permafrost.
Whereas it cost tens of thousands of dollars to fly in on a helicopter, drive in a car, and switch to a snowmobile to ultimately sample a small area using your hands, only to have to continue the work at home, the team needed just a few hundred dollars to run the algorithm on satellite data that was free and publicly available.
Michaelides, who is familiar with Zwieback’s work, agrees that estimating excess ice content is key to making infrastructural decisions, and that historical methods of sussing it out have been costly in all senses. Zwieback’s method of using late-summer clues to infer what’s going on at that depth “is a very exciting idea,” he says, and the results “demonstrate that there is considerable promise for this approach.”
He notes, though, that using space-based radar to understand the thawing ground is complicated: Ground ice content, soil moisture, and vegetation can differ even within a single pixel that a satellite can pick out. “To be clear, this limitation is not unique to Simon’s work,” Michaelides says; it affects all space-radar methods. There is also excess ice below even where Zwieback’s algorithm can probe—something the labor-intensive on-ground methods can pick up that still can’t be seen from space.
Mapping out the future
After Zwieback did his fieldwork, NGA decided to do its own. The agency’s attempt to independently validate his work—in Prudhoe Bay, Utqiagvik, and Fairbanks—was part of a project it called Frostbyte.
Its partners in that project—the Army’s Cold Regions Research Engineering Laboratory and Los Alamos National Laboratory—declined requests for interviews. As far as Zwieback knows, they’re still analyzing data.
But the intelligence community isn’t the only group interested in research like Zwieback’s. He also works with Arctic residents, reaching out to rural Alaskan communities where people are trying to make decisions about whether to relocate or where to build safely. “They typically can’t afford to do expensive coring,” he says. “So the idea is to make these data available to them.”
Zwieback and his team haul their gear out to gather data from drilled core samples, a process which can be arduous and costly.
ANDREW JOHNSON
Schaefer is also trying to bridge the gap between his science and the people it affects. Through a company called Weather Stream, he is helping communities identify risks to infrastructure before anything collapses, so they can take preventative action.
Making such connections has always been a key concern for Erin Trochim, a geospatial scientist at the University of Alaska Fairbanks. As a researcher who works not just on permafrost but also on policy, she’s seen radar science progress massively in recent years—without commensurate advances on the ground.
For instance, it’s still hard for residents in her town of Fairbanks—or anywhere—to know if there’s permafrost on their property at all, unless they’re willing to do expensive drilling. She’s encountered this problem, still unsolved, on property she owns. And if an expert can’t figure it out, non-experts hardly stand a chance. “It’s just frustrating when a lot of this information that we know from the science side, and [that’s] trickled through the engineering side, hasn’t really translated into the on-the-ground construction,” she says.
There is a group, though, trying to turn that trickle into a flood: Permafrost Pathways, a venture that launched with a $41 million grant through the TED Audacious Project. In concert with affected communities, including Nunapitchuk, it is building a data-gathering network on the ground, and combining information from that network with satellite data and local knowledge to help understand permafrost thaw and develop adaptation strategies.
“I think about it often as if you got a diagnosis of a disease,” says Sue Natali, the head of the project. “It’s terrible, but it’s also really great, because when you know what your problem is and what you’re dealing with, it’s only then that you can actually make a plan to address it.”
And the communities Permafrost Pathways works with are making plans. Nunapitchuk has decided to relocate, and the town and the research group have collaboratively surveyed the proposed new location: a higher spot on hardpacked sand. Permafrost Pathways scientists were able to help validate the stability of the new site—and prove to policymakers that this stability would extend into the future.
Radar helps with that in part, Natali says, because unlike other satellite detectors, it penetrates clouds. “In Alaska, it’s extremely cloudy,” she says. “So other data sets have been very, very challenging. Sometimes we get one image per year.”
And so radar data, and algorithms like Zwieback’s that help scientists and communities make sense of that data, dig up deeper insight into what’s going on beneath northerners’ feet—and how to step forward on firmer ground.
When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia four years ago, his first act was to delete almost everything. It had to go, he thought, if it had any chance of surviving.
Wehr, who’s 26, isn’t from Greenland—he grew up in Germany—but he had become obsessed with the island, an autonomous Danish territory, after visiting as a teenager. He’d spent years writing obscure Wikipedia articles in his native tongue on virtually everything to do with it. He even ended up moving to Copenhagen to study Greenlandic, a language spoken by some 57,000 mostly Indigenous Inuit people scattered across dozens of far-flung Arctic villages.
The Greenlandic-language edition was added to Wikipedia around 2003, just a few years after the site launched in English. By the time Wehr took its helm nearly 20 years later, hundreds of Wikipedians had contributed to it and had collectively written some 1,500 articles totaling over tens of thousands of words. It seemed to be an impressive vindication of the crowdsourcing approach that has made Wikipedia the go-to source for information online, demonstrating that it could work even in the unlikeliest places.
There was only one problem: The Greenlandic Wikipedia was a mirage.
Virtually every single article had been published by people who did not actually speak the language. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes—from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves.
“It might have looked Greenlandic to [the authors], but they had no way of knowing,” complains Wehr.
“Sentences wouldn’t make sense at all, or they would have obvious errors,” he adds. “AI translators are really bad at Greenlandic.”
What Wehr describes is not unique to the Greenlandic edition.
Wikipedia is the most ambitious multilingual project after the Bible: There are editions in over 340 languages, and a further 400 even more obscure ones are being developed and tested. Many of these smaller editions have been swamped with automatically translated content as AI has become increasingly accessible. Volunteers working on four African languages, for instance, estimated to MIT Technology Review that between 40% and 60% of articles in their Wikipedia editions were uncorrected machine translations. And after auditing the Wikipedia edition in Inuktitut, an Indigenous language close to Greenlandic that’s spoken in Canada, MIT Technology Review estimates that more than two-thirds of pages containing more than several sentences feature portions created this way.
This is beginning to cause a wicked problem. AI systems, from Google Translate to ChatGPT, learn to “speak” new languages by scraping huge quantities of text from the internet. Wikipedia is sometimes the largest source of online linguistic data for languages with few speakers—so any errors on those pages, grammatical or otherwise, can poison the wells that AI is expected to draw from. That can make the models’ translation of these languages particularly error-prone, which creates a sort of linguistic doom loop as people continue to add more and more poorly translated Wikipedia pages using those tools, and AI models continue to train from poorly translated pages. It’s a complicated problem, but it boils down to a simple concept: Garbage in, garbage out.
“These models are built on raw data,” says Kevin Scannell, a former professor of computer science at Saint Louis University who now builds computer software tailored for endangered languages. “They will try and learn everything about a language from scratch. There is no other input. There are no grammar books. There are no dictionaries. There is nothing other than the text that is inputted.”
There isn’t perfect data on the scale of this problem, particularly because a lot of AI training data is kept confidential and the field continues to evolve rapidly. But back in 2020, Wikipedia was estimated to make up more than half the training data that was fed into AI models translating some languages spoken by millions across Africa, including Malagasy, Yoruba, and Shona. In 2022, a research team from Germany that looked into what data could be obtained by online scraping even found that Wikipedia was the sole easily accessible source of online linguistic data for 27 under-resourced languages.
This could have significant repercussions in cases where Wikipedia is poorly written—potentially pushing the most vulnerable languages on Earth toward the precipice as future generations begin to turn away from them.
“Wikipedia will be reflected in the AI models for these languages,” says Trond Trosterud, a computational linguist at the University of Tromsø in Norway, who has been raising the alarm about the potentially harmful outcomes of badly run Wikipedia editions for years. “I find it hard to imagine it will not have consequences. And, of course, the more dominant position that Wikipedia has, the worse it will be.”
Use responsibly
Automation has been built into Wikipedia since the very earliest days. Bots keep the platform operational: They repair broken links, fix bad formatting, and even correct spelling mistakes. These repetitive and mundane tasks can be automated away with little problem. There is even an army of bots that scurry around generating short articles about rivers, cities, or animals by slotting their names into formulaic phrases. They have generally made the platform better.
But AI is different. Anybody can use it to cause massive damage with a few clicks.
Wikipedia has managed the onset of the AI era better than many other websites. It has not been flooded with AI bots or disinformation, as social media has been. It largely retains the innocence that characterized the earlier internet age. Wikipedia is open and free for anyone to use, edit, and pull from, and it’s run by the very same community it serves. It is transparent and easy to use. But community-run platforms live and die on the size of their communities. English has triumphed, while Greenlandic has sunk.
“We need good Wikipedians. This is something that people take for granted. It is not magic,” says Amir Aharoni, a member of the volunteer Language Committee, which oversees requests to open or close Wikipedia editions. “If you use machine translation responsibly, it can be efficient and useful. Unfortunately, you cannot trust all people to use it responsibly.”
Trosterud has studied the behavior of users on small Wikipedia editions and says AI has empowered a subset that he terms “Wikipedia hijackers.” These users can range widely—from naive teenagers creating pages about their hometowns or their favorite YouTubers to well-meaning Wikipedians who think that by creating articles in minority languages they are in some way “helping” those communities.
“The problem with them nowadays is that they are armed with Google Translate,” Trosterud says, adding that this is allowing them to produce much longer and more plausible-looking content than they ever could before: “Earlier they were armed only with dictionaries.”
This has effectively industrialized the acts of destruction—which affect vulnerable languages most, since AI translations are typically far less reliable for them. There can be lots of different reasons for this, but a meaningful part of the issue is the relatively small amount of source text that is available online. And sometimes models struggle to identify a language because it is similar to others, or because some, including Greenlandic and most Native American languages, have structures that make them badly suited to the way most machine translation systems work. (Wehr notes that in Greenlandic most words are agglutinative, meaning they are built by attaching prefixes and suffixes to stems. As a result, many words are extremely context specific and can express ideas that in other languages would take a full sentence.)
Research produced by Google before a major expansion of Google Translate rolled out three years ago found that translation systems for lower-resourced languages were generally of a lower quality than those for better-resourced ones. Researchers found, for example, that their model would often mistranslate basic nouns across languages, including the names of animals and colors. (In a statement to MITTechnology Review, Google wrote that it is “committed to meeting a high standard of quality for all 249 languages” it supports “by rigorously testing and improving [its] systems, particularly for languages that may have limited public text resources on the web.”)
Wikipedia itself offers a built-in editing tool called Content Translate, which allows users to automatically translate articles from one language to another—the idea being that this will save time by preserving the references and fiddly formatting of the originals. But it piggybacks on external machine translation systems, so it’s largely plagued by the same weaknesses as other machine translators—a problem that the Wikimedia Foundation says is hard to solve. It’s up to each edition’s community to decide whether this tool is allowed, and some have decided against it. (Notably, English-language Wikipedia has largely banned its use, claiming that some 95% of articles created using Content Translate failed to meet an acceptable standard without significant additional work.) But it’s at least easy to tell when the program has been used; Content Translate adds a tag on the Wikipedia back end.
Other AI programs can be harder to monitor. Still, many Wikipedia editors I spoke with said that once their languages were added to major online translation tools, they noticed a corresponding spike in the frequency with which poor, likely machine-translated pages were created.
Some Wikipedians using AI to translate content do occasionally admit that they do not speak the target languages. They may see themselves as providing smaller communities with rough-cut articles that speakers can then fix—essentially following the same model that has worked well for more active Wikipedia editions.
Google Translate, for instance, says the Fulfulde word for January means June, while ChatGPT says it’s August or September. The programs also suggest the Fulfulde word for “harvest” means “fever” or “well-being,” among other possibilities.
But once error-filled pages are produced in small languages, there is usually not an army of knowledgeable people who speak those languages standing ready to improve them. There are few readers of these editions, and sometimes not a single regular editor.
Yuet Man Lee, a Canadian teacher in his 20s, says that he used a mix of Google Translate and ChatGPT to translate a handful of articles that he had written for the English Wikipedia into Inuktitut, thinking it’d be nice to pitch in and help a smaller Wikipedia community. He says he added a note to one saying that it was only a rough translation. “I did not think that anybody would notice [the article],” he explains. “If you put something out there on the smaller Wikipedias—most of the time nobody does.”
But at the same time, he says, he still thought “someone might see it and fix it up”—adding that he had wondered whether the Inuktitut translation that the AI systems generated was grammatically correct. Nobody has touched the article since he created it.
Lee, who teaches social sciences in Vancouver and first started editing entries in the English Wikipedia a decade ago, says that users familiar with more active Wikipedias can fall victim to this mindset, which he terms a “bigger-Wikipedia arrogance”: When they try to contribute to smaller Wikipedia editions, they assume that others will come along to fix their mistakes. It can sometimes work. Lee says he had previously contributed several articles to Wikipedia in Tatar, a language spoken by several million people mainly in Russia, and at least one of those was eventually corrected. But the Inuktitut Wikipedia is, by comparison, a “barren wasteland.”
He emphasizes that his intentions had been good: He wanted to add more articles to an Indigenous Canadian Wikipedia. “I am now thinking that it may have been a bad idea. I did not consider that I could be contributing to a recursive loop,” he says. “It was about trying to get content out there, out of curiosity and for fun, without properly thinking about the consequences.”
“Totally, completely no future”
Wikipedia is a project that is driven by wide-eyed optimism. Editing can be a thankless task, involving weeks spent bickering with faceless, pseudonymous people, but devotees put in hours of unpaid labor because of a commitment to a higher cause. It is this commitment that drives many of the regular small-language editors I spoke with. They all feared what would happen if garbage continued to appear on their pages.
Abdulkadir Abdulkadir, a 26-year-old agricultural planner who spoke with me over a crackling phone call from a busy roadside in northern Nigeria, said that he spends three hours every day fiddling with entries in his native Fulfulde, a language used mainly by pastoralists and farmers across the Sahel. “But the work is too much,” he said.
Abdulkadir sees an urgent need for the Fulfulde Wikipedia to work properly. He has been suggesting it as one of the few online resources for farmers in remote villages, potentially offering information on which seeds or crops might work best for their fields in a language they can understand. If you give them a machine-translated article, Abdulkadir told me, then it could “easily harm them,” as the information will probably not be translated correctly into Fulfulde.
Google Translate, for instance, says the Fulfulde word for January means June, while ChatGPT says it’s August or September. The programs also suggest the Fulfulde word for “harvest” means “fever” or “well-being,” among other possibilities.
Abdulkadir said he had recently been forced to correct an article about cowpeas, a foundational cash crop across much of Africa, after discovering that it was largely illegible.
If someone wants to create pages on the Fulfulde Wikipedia, Abdulkadir said, they should be translated manually. Otherwise, “whoever will read your articles will [not] be able to get even basic knowledge,” he tells these Wikipedians. Nevertheless, he estimates that some 60% of articles are still uncorrected machine translations. Abdulkadir told me that unless something important changes with how AI systems learn and are deployed, then the outlook for Fulfulde looks bleak. “It is going to be terrible, honestly,” he said. “Totally, completely no future.”
Across the country from Abdulkadir, Lucy Iwuala contributes to Wikipedia in Igbo, a language spoken by several million people in southeastern Nigeria. “The harm has already been done,” she told me, opening the two most recently created articles. Both had been automatically translated via Wikipedia’s Content Translate and contained so many mistakes that she said it would have given her a headache to continue reading them. “There are some terms that have not even been translated. They are still in English,” she pointed out. She recognized the username that had created the pages as a serial offender. “This one even includes letters that are not used in the Igbo language,” she said.
Iwuala began regularly contributing to Wikipedia three years ago out of concern that Igbo was being displaced by English. It is a worry that is common to many who are active on smaller Wikipedia editions. “This is my culture. This is who I am,” she told me. “That is the essence of it all: to ensure that you are not erased.”
Iwuala, who now works as a professional translator between English and Igbo, said the users doing the most damage are inexperienced and see AI translations as a way to quickly increase the profile of the Igbo Wikipedia. She often finds herself having to explain at online edit-a-thons she organizes, or over email to various error-prone editors, that the results can be the exact opposite, pushing users away: “You will be discouraged and you will no longer want to visit this place. You will just abandon it and go back to the English Wikipedia.”
These fears are echoed by Noah Ha‘alilio Solomon, an assistant professor of Hawaiian language at the University of Hawai‘i. He reports that some 35% of words on some pages in the Hawaiian Wikipedia are incomprehensible. “If this is the Hawaiian that is going to exist online, then it will do more harm than anything else,” he says.
Hawaiian, which was teetering on the verge of extinction several decades ago, has been undergoing a recovery effort led by Indigenous activists and academics. Seeing such poor Hawaiian on such a widely used platform as Wikipedia is upsetting to Ha‘alilio Solomon.
“It is painful, because it reminds us of all the times that our culture and language has been appropriated,” he says. “We have been fighting tooth and nail in an uphill climb for language revitalization. There is nothing easy about that, and this can add extra impediments. People are going to think that this is an accurate representation of the Hawaiian language.”
The consequences of all these Wikipedia errors can quickly become clear. AI translators that have undoubtedly ingested these pages in their training data are now assisting in the production, for instance, of error-strewn AI-generated books aimed at learners of languages as diverse as Inuktitut and Cree, Indigenous languages spoken in Canada, and Manx, a small Celtic language spoken on the Isle of Man. Many of these have been popping up for sale on Amazon. “It was just complete nonsense,” says Richard Compton, a linguist at the University of Quebec in Montreal, of a volume he reviewed that had purported to be an introductory phrasebook for Inuktitut.
Rather than making minority languages more accessible, AI is now creating an ever expanding minefield for students and speakers of those languages to navigate. “It is a slap in the face,” Compton says. He worries that younger generations in Canada, hoping to learn languages in communities that have fought uphill battles against discrimination to pass on their heritage, might turn to online tools such as ChatGPT or phrasebooks on Amazon and simply make matters worse. “It is fraud,” he says.
A race against time
According to UNESCO, a language is declared extinct every two weeks. But whether the Wikimedia Foundation, which runs Wikipedia, has an obligation to the languages used on its platform is an open question. When I spoke to Runa Bhattacharjee, a senior director at the foundation, she said that it was up to the individual communities to make decisions about what content they wanted to exist on their Wikipedia. “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” she said. Usually, Bhattacharjee added, editions were considered for closure only if a specific complaint was raised about them.
But if there is no active community, how can an edition be fixed or even have a complaint raised?
Bhattacharjee explained that the Wikimedia Foundation sees its role in such cases as about maintaining the Wikipedia platform in case someone comes along to revive it: “It is the space that we provide for them to grow and develop. That is where we are at.”
Inari Saami, spoken in a single remote community in northern Finland, is a poster child for how people can take good advantage of Wikipedia. The language was headed toward extinction four decades ago; there were only four children who spoke it. Their parents created the Inari Saami Language Association in a last-ditch bid to keep it going. The efforts worked. There are now several hundred speakers, schools that use Inari Saami as a medium of instruction, and 6,400 Wikipedia articles in the language, each one copy-edited by a fluent speaker.
This success highlights how Wikipedia can indeed provide small and determined communities with a unique vehicle to promote their languages’ preservation. “We don’t care about quantity. We care about quality,” says Fabrizio Brecciaroli, a member of the Inari Saami Language Association. “We are planning to use Wikipedia as a repository for the written language. We need to provide tools that can be used by the younger generations. It is important for them to be able to use Inari Saami digitally.”
This has been such a success that Wikipedia has been integrated into the curriculum at the Inari Saami–speaking schools, Brecciaroli adds. He fields phone calls from teachers asking him to write up simple pages on topics from tornadoes to Saami folklore. Wikipedia has even offered a way to introduce words into Inari Saami. “We have to make up new words all the time,” Brecciaroli says. “Young people need them to speak about sports, politics, and video games. If they are unsure how to say something, they now check Wikipedia.”
Wikipedia is a monumental intellectual experiment. What’s happening with Inari Saami suggests that with maximum care, it can work in smaller languages. “The ultimate goal is to make sure that Inari Saami survives,” Brecciaroli says. “It might be a good thing that there isn’t a Google Translate in Inari Saami.”
That may be true—though large language models like ChatGPT can be made to translate phrases into languages that more traditional machine translation tools do not offer. Brecciaroli told me that ChatGPT isn’t great in Inari Saami but that the quality varies significantly depending on what you ask it to do; if you ask it a question in the language, then the answer will be filled with words from Finnish and even words it invents. But if you ask it something in English, Finnish, or Italian and then ask it to reply in Inari Saami, it will perform better.
In light of all this, creating as much high-quality content online as can possibly be written becomes a race against time. “ChatGPT only needs a lot of words,” Brecciaroli says. “If we keep putting good material in, then sooner or later, we will get something out. That is the hope.” This is an idea supported by multiple linguists I spoke with—that it may be possible to end the “garbage in, garbage out” cycle. (OpenAI, which operates ChatGPT, did not respond to a request for comment.)
Still, the overall problem is likely to grow and grow, since many languages are not as lucky as Inari Saami—and their AI translators will most likely be trained on more and more AI slop. Wehr, unfortunately, seems far less optimistic about the future of his beloved Greenlandic.
Since deleting much of the Greenlandic-language Wikipedia, he has spent years trying to recruit speakers to help him revive it. He has appeared in Greenlandic media and made social media appeals. But he hasn’t gotten much of a response; he says it has been demoralizing.
“There is nobody in Greenland who is interested in this, or who wants to contribute,” he says. “There is completely no point in it, and that is why it should be closed.”
Late last year, he began a process requesting that the Wikipedia Language Committee shut down the Greenlandic-language edition. Months of bitter debate followed between dozens of Wikipedia bureaucrats; some seemed to be surprised that a superficially healthy-seeming edition could be gripped by so many problems.
Then, earlier this month, Wehr’s proposal was accepted: Greenlandic Wikipedia is set to be shuttered, and any articles that remain will be moved into the Wikipedia Incubator, where new language editions are tested and built. Among the reasons cited by the Language Committee is the use of AI tools, which have “frequently produced nonsense that could misrepresent the language.”
Nevertheless, it may be too late—mistakes in Greenlandic already seem to have become embedded in machine translators. If you prompt either Google Translate or ChatGPT to do something as simple as count to 10 in proper Greenlandic, neither program can deliver.
Jacob Judah is an investigative journalist based in London.
Oleh Kovalskyy thinks that Starlink terminals are built as if someone assembled them with their feet. Or perhaps with their hands behind their back.
To demonstrate this last image, Kovalskyy—a large, 47-year-old Ukrainian, clad in sweatpants and with tattoos stretching from his wrists up to his neck—leans over to wiggle his fingers in the air behind him, laughing as he does. Components often detach, he says through bleached-white teeth, and they’re sensitive to dust and moisture. “It’s terrible quality. Very terrible.”
But even if he’s not particularly impressed by the production quality, he won’t dispute how important the satellite internet service has been to his country’s defense.
Starlink is absolutely critical to Ukraine’s ability to continue in the fight against Russia: It’s how troops in battle zones stay connected with faraway HQs; it’s how many of the drones essential to Ukraine’s survival hit their targets; it’s even how soldiers stay in touch with spouses and children back home.
At the time of my visit to Kovalskyy in March 2025, however, it had begun to seem like this vital support system may suddenly disappear. Reuters had just broken news that suggested Musk, who was then still deeply enmeshed in Trump world, would remove Ukraine’s access to the service should its government fail to toe the line in US-led peace negotiations. Musk denied the allegations shortly afterward, but given Trump’s fickle foreign policy and inconsistent support of Ukrainian president Volodymyr Zelensky, the uncertainty of the technology’s future had become—and remains—impossible to ignore.
ELENA SUBACH
ELENA SUBACH
Kovalskyy’s unofficial Starlink repair shop may be the biggest of its kind in the world. Ordered chaos is the best way to describe it.
The stakes couldn’t be higher: Another Reuters report in late July revealed that Musk had ordered the restriction of Starlink in parts of Ukraine during a critical counteroffensive back in 2022. “Ukrainian troops suddenly faced a communications blackout,” the story explains. “Soldiers panicked, drones surveilling Russian forces went dark, and long-range artillery units, reliant on Starlink to aim their fire, struggled to hit targets.”
None of this is lost on Kovalskyy—and for now Starlink access largely comes down to the unofficial community of users and engineers of which Kovalskyy is just one part: Narodnyi Starlink.
The group, whose name translates to “The People’s Starlink,” was created back in March 2022 by a tech-savvy veteran of the previous battles against Russia-backed militias in Ukraine’s east. It started as a Facebook group for the country’s infant yet burgeoning community of Starlink users—a forum to share guidance and swap tips—but it very quickly emerged as a major support system for the new war effort. Today, it has grown to almost 20,000 members, including the unofficial expert “Dr. Starlink”—famous for his creative ways of customizing the systems—and other volunteer engineers like Kovalskyy and his men. It’s a prime example of the many informal, yet highly effective, volunteer networks that have kept Ukraine in the fight, both on and off the front line.
ELENA SUBACH
ELENA SUBACH
Kovalskyy and his crew of eight volunteers have repaired or customized more than 15,000 terminals since the war began in February 2022. Here, they test repaired units in a nearby parking lot.
Kovalskyy gave MIT Technology Review exclusive access to his unofficial Starlink repair workshop in the city of Lviv, about 300 miles west of Kyiv. Ordered chaos is the best way to describe it: Spread across a few small rooms in a nondescript two-story building behind a tile shop, sagging cardboard boxes filled with mud-splattered Starlink casings form alleyways among the rubble of spare parts. Like flying buttresses, green circuit boards seem to prop up the walls, and coils of cable sprout from every crevice.
Those acquainted with the workshop refer to it as the biggest of its kind in Ukraine—and, by extension, maybe the world. Official and unofficial estimates suggest that anywhere from 42,000 to 160,000 Starlink terminals operate in the country. Kovalskyy says he and his crew of eight volunteers have repaired or customized more than 15,000 terminals since the war began.
The informal, accessible nature of the Narodnyi Starlink community has been critical to its success. One military communications officer was inspired by Kovalskyy to set up his own repair workshop as part of Ukraine’s armed forces, but he says that official processes can be slower than private ones by a factor of 10.
ELENA SUBACH
Despite the pressure, the chance that they may lose access to Starlink was not worrying volunteers like Kovalskyy at the time of my visit; in our conversations, it was clear they had more pressing concerns than the whims of a foreign tech mogul. Russia continues to launch frequent aerial bombardments of Ukrainian cities, sometimes sending more than 500 drones in a single night. The threat of involuntary mobilization to the front line looms on every street corner. How can one plan for a hypothetical future crisis when crisis defines every minute of one’s day?
Almost every inch of every axis of the battlefield in Ukraine is enabled by Starlink. It connects pilots near the trenches with reconnaissance drones soaring kilometers above them. It relays the video feeds from those drones to command centers in rear positions. And it even connects soldiers, via encrypted messaging services, with their family and friends living far from the front.
Although some soldiers and volunteers, including members of Narodnyi Starlink, refer to Starlink as a luxury, the reality is that it’s an essential utility; without it, Ukrainian forces would need to rely on other, often less effective means of communication. These include wired-line networks, mobile internet, and older geostationary satellite technology—all of which provide connectivity that is either slower, more vulnerable to interference, or more difficult for untrained soldiers to set up.
“If not for Starlink, we would already be counting rubles in Kyiv,” Kovalskyy says.
ELENA SUBACH
ELENA SUBACH
The workshop’s crew has learned to perform adjustments to terminals, especially in adapting them for battlefield conditions. At right, a volunteer engineer shows the fragments of shrapnel he has extracted from the terminals.
Despite being designed primarily for commercial use, Starlink provides a fantastic battlefield solution. The low-latency, high-bandwidth connection its terminals establish with its constellation of low-Earth-orbit satellites can transmit large streams of data while remaining very difficult for the enemy to jam—in part because the satellites, unlike geostationary ones, are in constant motion.
It’s also fairly easy to use, so that soldiers with little or no technical knowledge can connect in minutes. And the system costs much less than other military technology; while the US and Polish governments pay business rates for many of Ukraine’s Starlink systems, individual soldiers or military units can purchase the hardware at the private rate of about $500, and subscribe for just $50 per month.
No alternatives match Starlink for cost, ease of use, or coverage—and none will in the near future. Its constellation of 8,000 satellites dwarfs that of its main competitor, a service called OneWeb sold by the French satellite operator Eutelsat, which has only 630 satellites. OneWeb’s hardware costs about 20 times more, and a subscription can run significantly higher, since OneWeb targets business customers. Amazon’s Project Kuiper, the most likely future competitor, started putting satellites in space only this year.
Volodymyr Stepanets, a 51-year-old Ukrainian self-described “geek,” had been living in Krakow, Poland, with his family when Russia invaded in 2022. But before that, he had volunteered for several years on the front lines of the war against Russian-supported paramilitaries that began in 2014.
He recalls, in those early months in eastern Ukraine, witnessing troops coordinating an air strike with rulers and a calculator; the whole process took them between 30 and 40 minutes. “All these calculations can be done in one minute,” he says he told them. “All we need is a very stupid computer and very easy software.” (The Ukrainian military declined to comment on this issue.)
Stepanets subsequently committed to helping this brigade, the 72nd, integrate modern technology into its operations. He says that within one year, he had taught them how to use modern communication platforms, positioning devices, and older satellite communication systems that predate Starlink.
Narodnyi Starlink members ask each other for advice about how to adapt the systems: how to camouflage them from marauding Russian drones or resolve glitches in the software, for example.
ELENA SUBACH
So after Russian tanks rolled across the border, Stepanets was quick to see how Starlink’s service could provide an advantage to Ukraine’s armed forces. He also recognized that these units, as well as civilian users, would need support in utilizing the new technology. And that’s how he came up with the idea for Narodnyi Starlink, an open Facebook group he launched on March 21, just a few weeks after the full invasion began and the Ukrainian government requested the activation of Starlink.
Over the past few years, the Narodnyi Starlink digital community has grown to include volunteer engineers, resellers, and military service members interested in the satellite comms service. The group’s members post roughly three times per day, often sharing or asking for advice about adaptations, or seeking volunteers to fix broken equipment. A user called Igor Semenyak recently asked, for example, whether anyone knew how to mask his system from infrared cameras. “How do you protect yourself from heat radiation?” he wrote, to which someone suggested throwing special heat-proof fabric over the terminal.
Its most famous member is probably a man widely considered the brains of the group: Oleg Kutkov, a 36-year-old software engineer otherwise known to some members as “Dr. Starlink.” Kutkov had been privately studying Starlink technology from his home in Kyiv since 2021, having purchased a system to tinker with when service was still unavailable in the country; he believes that he may have been the country’s first Starlink user. Like Stepanets, he saw the immense potential for Starlink after Russia broke traditional communication lines ahead of its attack.
“Our infrastructure was very vulnerable because we did not have a lot of air defense,” says Kutkov, who still works full time as an engineer at the US networking company Ubiquiti’s R&D center in Kyiv. “Starlink quickly became a crucial part of our survival.”
Stepanets contacted Kutkov after coming across his popular Twitter feed and blog, which had been attracting a lot of attention as early Starlink users sought help. Kutkov still publishes the results of his own research there—experiments he performs in his spare time, sometimes staying up until 3 a.m. to complete them. In May, for example, he published a blog post explaining how users can physically move a user account from one terminal to another when the printed circuit board in one is “so severely damaged that repair is impossible or impractical.”
“Oleg Kutkov is the coolest engineer I’ve met in my entire life,” Kovalskyy says.
ELENA SUBACH
ELENA SUBACH
When the fighting is at its worst, the workshop may receive 500 terminals to repair every month. The crew lives and sometimes even sleeps there.
Supported by Kutkov’s technical expertise and Stepanets’s organizational prowess, Kovalskyy’s warehouse became the major repair hub (though other volunteers also make repairs elsewhere). Over time, Kovalskyy—who co-owned a regional internet service provider before the war—and his crew have learned to perform adjustments to Starlink terminals, especially to adapt them for battlefield conditions. For example, they modified them to receive charge at the right voltage directly from vehicles, years before Starlink released a proprietary car adapter. They’ve also switched out Starlink’s proprietary SPX plugs—which Kovalskyy criticized as vulnerable to moisture and temperature changes—with standard ethernet ports.
Together, the three civilians—Kutkov, Stepanets, and Kovalskyy—effectively lead Narodnyi Starlink. Along with several other members who wished to remain anonymous, they hold meetings every Monday over Zoom to discuss their activities, including recent Starlink-related developments on the battlefield, as well as information security.
While the public group served as a suitable means of disseminating information in the early stages of the war when speed was critical, they have had to move a lot of their communications to private channels after discovering Russian surveillance; Stepanets says that at least as early as 2024, Russians had translated a 300-page educational document they had produced and shared online. Now, as administrators of the Facebook group, the three men block the publication of any posts deemed to reveal information that might be useful to Russian forces.
Stepanets believes the threat extends beyond the group’s intel to its members’ physical safety. When we talked, he brought up the attempted assassination of the Ukrainian activist and volunteer Serhii Sternenko in May this year. Although Sternenko was unaffiliated with Narodnyi Starlink, the event served as a clear reminder of the risks even civilian volunteers undertake in wartime Ukraine. “The Russian FSB and other [security] services still understand the importance of participation in initiatives like [Narodnyi Starlink],” Stepanets says. He stresses that the group is not an organization with a centralized chain of command, but a community that would continue operating if any of its members were no longer able to perform their roles.
“We have extremely professional engineers who are extremely intelligent,” Kovalskyy told me. “Repairing Starlink terminals for them is like shooting ducks with HIMARS [a vehicle-borne GPS-guided rocket launcher].”
ELENA SUBACH
The informal, accessible nature of this community has been critical to its success. Operating outside official structures has allowed Narodnyi Starlinkto function much more efficiently than state channels. Yuri Krylach, a military communications officer who was inspired by Kovalskyy to set up his own repair workshop as part of Ukraine’s armed forces, says that official processes can be slower than private ones by a factor of 10; his own team’s work is often interrupted by other tasks that commanders deem more urgent, whereas members of the Narodnyi Starlinkcommunity can respond to requests quickly and directly. (The military declined to comment on this issue, or on any military connections with Narodnyi Starlink.)
Most of the Narodnyi Starlink members I spoke to, including active-duty soldiers, were unconcerned about the report that Musk might withdraw access to the service in Ukraine. They pointed out that doing so would involve terminating state contracts, including those with the US Department of Defense and Poland’s Ministry of Digitalization. Losing contracts worth hundreds of millions of dollars (the Polish government claims to pay $50 million per year in subscription fees), on top of the private subscriptions, would cost the company a significant amount of revenue. “I don’t really think that Musk would cut this money supply,” Kutkov says. “It would be quite stupid.” Oleksandr Dolynyak, an officer in the 103rd Separate Territorial Defense Brigade and a Narodnyi Starlink member since 2022, says: “As long as it is profitable for him, Starlink will work for us.”
Stepanets does believe, however, that Musk’s threats exposed an overreliance on the technology that few had properly considered. “Starlink has really become one of the powerful tools of defense of Ukraine,” he wrote in a March Facebook post entitled “Irreversible Starlink hegemony,” accompanied by an image of the evil Darth Sidious from Star Wars. “Now, the issue of the country’s dependence on the decisions of certain eccentric individuals … has reached [a] melting point.”
Even if telecommunications experts both inside and outside the military agree that Starlink has no direct substitute, Stepanets believes that Ukraine needs to diversify its portfolio of satellite communication tools anyway, integrating additional high-speed satellite communication services like OneWeb. This would relieve some of the pressure caused by Musk’s erratic, unpredictable personality and, he believes, give Ukraine some sense of control over its wartime communications. (SpaceX did not respond to a request for comment.)
The Ukrainian military seems to agree with this notion. In late March, at a closed-door event in Kyiv, the country’s then-deputy minister of defense Kateryna Chernohorenko announced the formation of a special Space Policy Directorate “to consolidate internal and external capabilities to advance Ukraine’s military space sector.” The announcement referred to the creation of a domestic “satellite constellation,” which suggests that reliance on foreign services like Starlink had been a catalyst. “Ukraine needs to transition from the role of consumer to that of a full-fledged player in the space sector,” a government blog post stated. (Chernohorenko did not respond to a request for comment.)
Ukraine isn’t alone in this quandary. Recent discussions about a potential Starlink deal with the Italian government, for example, have stalled as a result of Musk’s behavior. And as Juliana Süss, an associate fellow at the UK’s Royal United Services Institute, points out, Taiwan chose SpaceX’s competitor Eutelsat when it sought a satellite communications partner in 2023.
“I think we always knew that SpaceX is not always the most reliable partner,” says Süss, who also hosts RUSI’s War in Space podcast, citing Musk’s controversial comments about the country’s status. “The Taiwan problems are a good example for how the rest of the world might be feeling about this.”
Nevertheless, Ukraine is about to become even more deeply enmeshed with Starlink; the country’s leading mobile operator Kyivstar announced in July that Ukraine will soon become the first European nation to offer Starlink direct-to-mobile services. Süss is cautious about placing too much emphasis on this development though. “This step does increase dependency,” she says. “But that dependency is already there.” Adding an additional channel of communications as a possible backup is otherwise a logical action for a country at war, she says.
These issues can feel far away for the many Ukrainians who are just trying to make it through to the next day. Despite its location in the far west of Ukraine, Lviv, home to Kovalskyy’s shop, is still frequently hit by Russian kamikaze drones, and local military-affiliated sites are popular targets.
Still, during our time together, Kovalskyy was far more worried by the prospect of his team’s possible mobilization. In March, the Ministry of Defense had removed the special status that had otherwise protected his people from involuntary conscription given the nature of their volunteer activities. They’re now at risk of being essentially picked up off the street by Ukraine’s dreaded military recruitment teams, known as the TCK, whenever they leave the house.
The repair shop displays patches from many different Ukrainian military units—each given as a gift for their services. “We sometimes perform miracles with Starlinks,” Kovalskyy said.
COURTESY OF THE AUTHOR
This is true even though there’s so much demand for the workshop’s services that during my visit, Kovalskyy expressed frustration at the vast amount of time they’ve had to dedicate solely to basic repairs. “We have extremely professional engineers who are extremely intelligent,” he told me. “Repairing Starlink terminals for them is like shooting ducks with HIMARS [a vehicle-borne GPS-guided rocket launcher].”
At least the situation seemed to have become better on the front over the winter, Kovalskyy added, handing me a Starlink antenna whose flat, white surface had been ripped open by shrapnel. When the fighting is at its worst, the team might receive 500 terminals to repair every month, and the crew lives in the workshop, sometimes even sleeping there. But at that moment in time, it was receiving only a couple of hundred.
We ended our morning at the workshop by browsing its vast collection of varied military patches, pinned to the wall on large pieces of Velcro. Each had been given as a gift by a different unit as thanks for the services of Kovalskyy and his team, an indication of the diversity and size of Ukraine’s military: almost 1 million soldiers protecting a 600-mile front line. At the same time, it’s a physical reminder that they almost all rely on a single technology with just a few production factories located on another continent nearly 6,000 miles away.
“We sometimes perform miracles with Starlinks,” Kovalskyy says.
He and his crew can only hope that they will still be able to for the foreseeable future—or, better yet, that they won’t need to at all.
Charlie Metcalfe is a British journalist. He writes for magazines and newspapers including Wired, the Guardian, and MIT Technology Review.