These protocols will help AI agents navigate our messy lives

A growing number of companies are launching AI agents that can do things on your behalf—actions like sending an email, making a document, or editing a database. Initial reviews for these agents have been mixed at best, though, because they struggle to interact with all the different components of our digital lives.

Part of the problem is that we are still building the necessary infrastructure to help agents navigate the world. If we want agents to complete tasks for us, we need to give them the necessary tools while also making sure they use that power responsibly.

Anthropic and Google are among the companies and groups working to do those. Over the past year, they have both introduced protocols that try to define how AI agents should interact with each other and the world around them. These protocols could make it easier for agents to control other programs like email clients and note-taking apps. 

The reason has to do with application programming interfaces, the connections between computers or programs that govern much of our online world. APIs currently reply to “pings” with standardized information. But AI models aren’t made to work exactly the same every time. The very randomness that helps them come across as conversational and expressive also makes it difficult for them to both call an API and understand the response. 

“Models speak a natural language,” says Theo Chu, a project manager at Anthropic. “For [a model] to get context and do something with that context, there is a translation layer that has to happen for it to make sense to the model.” Chu works on one such translation technique, the Model Context Protocol (MCP), which Anthropic introduced at the end of last year. 

MCP attempts to standardize how AI agents interact with the world via various programs, and it’s already very popular. One web aggregator for MCP servers (essentially, the portals for different programs or tools that agents can access) lists over 15,000 servers already. 

Working out how to govern how AI agents interact with each other is arguably an even steeper challenge, and it’s one the Agent2Agent protocol (A2A), introduced by Google in April, tries to take on. Whereas MCP translates requests between words and code, A2A tries to moderate exchanges between agents, which is an “essential next step for the industry to move beyond single-purpose agents,” Rao Surapaneni, who works with A2A at Google Cloud, wrote in an email to MIT Technology Review

Google says 150 companies have already partnered with it to develop and adopt A2A, including Adobe and Salesforce. At a high level, both MCP and A2A tell an AI agent what it absolutely needs to do, what it should do, and what it should not do to ensure a safe interaction with other services. In a way, they are complementary—each agent in an A2A interaction could individually be using MCP to fetch information the other asks for. 

However, Chu stresses that it is “definitely still early days” for MCP, and the A2A road map lists plenty of tasks still to be done. We’ve identified the three main areas of growth for MCP, A2A, and other agent protocols: security, openness, and efficiency.

What should these protocols say about security?

Researchers and developers still don’t really understand how AI models work, and new vulnerabilities are being discovered all the time. For chatbot-style AI applications, malicious attacks can cause models to do all sorts of bad things, including regurgitating training data and spouting slurs. But for AI agents, which interact with the world on someone’s behalf, the possibilities are far riskier. 

For example, one AI agent, made to read and send emails for someone, has already been shown to be vulnerable to what’s known as an indirect prompt injection attack. Essentially, an email could be written in a way that hijacks the AI model and causes it to malfunction. Then, if that agent has access to the user’s files, it could be instructed to send private documents to the attacker. 

Some researchers believe that protocols like MCP should prevent agents from carrying out harmful actions like this. However, it does not at the moment. “Basically, it does not have any security design,” says Zhaorun Chen, a  University of Chicago PhD student who works on AI agent security and uses MCP servers. 

Bruce Schneier, a security researcher and activist, is skeptical that protocols like MCP will be able to do much to reduce the inherent risks that come with AI and is concerned that giving such technology more power will just give it more ability to cause harm in the real, physical world. “We just don’t have good answers on how to secure this stuff,” says Schneier. “It’s going to be a security cesspool really fast.” 

Others are more hopeful. Security design could be added to MCP and A2A similar to the way it is for internet protocols like HTTPS (though the nature of attacks on AI systems is very different). And Chen and Anthropic believe that standardizing protocols like MCP and A2A can help make it easier to catch and resolve security issues even as is. Chen uses MCP in his research to test the roles different programs can play in attacks to better understand vulnerabilities. Chu at Anthropic believes that these tools could let cybersecurity companies more easily deal with attacks against agents, because it will be easier to unpack who sent what. 

How open should these protocols be?

Although MCP and A2A are two of the most popular agent protocols available today, there are plenty of others in the works. Large companies like Cisco and IBM are working on their own protocols, and other groups have put forth different designs like Agora, designed by researchers at the University of Oxford, which upgrades an agent-service communication from human language to structured data in real time.

Many developers hope there could eventually be a registry of safe, trusted systems to navigate the proliferation of agents and tools. Others, including Chen, want users to be able to rate different services in something like a Yelp for AI agent tools. Some more niche protocols have even built blockchains on top of MCP and A2A so that servers can show they are not just spam. 

Both MCP and A2A are open-source, which is common for would-be standards as it lets others work on building them. This can help protocols develop faster and more transparently. 

“If we go build something together, we spend less time overall, because we’re not having to each reinvent the wheel,” says David Nalley, who leads developer experience at Amazon Web Services and works with a lot of open-source systems, including A2A and MCP. 

Nalley oversaw Google’s donation of A2A to the Linux Foundation, a nonprofit organization that guides open-source projects, back in June. With the foundation’s stewardship, the developers who work on A2A (including employees at Google and many others) all get a say in how it should evolve. MCP, on the other hand, is owned by Anthropic and licensed for free. That is a sticking point for some open-source advocates, who want others to have a say in how the code base itself is developed. 

“There’s admittedly some increased risk around a single person or a single entity being in absolute control,” says Nalley. He says most people would prefer multiple groups to have a “seat at the table” to make sure that these protocols are serving everyone’s best interests. 

However, Nalley does believe Anthropic is acting in good faith—its license, he says, is incredibly permissive, allowing other groups to create their own modified versions of the code (a process known as “forking”). 

“Someone could fork it if they needed to, if something went completely off the rails,” says Nalley. IBM’s Agent Communication Protocol was actually spun off of MCP. 

Anthropic is still deciding exactly how to develop MCP. For now, it works with a steering committee of outside companies that help make decisions on MCP’s development, but Anthropic seems open to changing this approach. “We are looking to evolve how we think about both ownership and governance in the future,” says Chu.

Is natural language fast enough?

MCP and A2A work on the agents’ terms—they use words and phrases (termed natural language in AI), just as AI models do when they are responding to a person. This is part of the selling point for these protocols, because it means the model doesn’t have to be trained to talk in a way that is unnatural to it. “Allowing a natural-language interface to be used between agents and not just with humans unlocks sharing the intelligence that is built into these agents,” says Surapaneni.

But this choice does come with drawbacks. Natural-language interfaces lack the precision of APIs, and that could result in incorrect responses. And it creates inefficiencies. 

Usually, an AI model reads and responds to text by splitting words into tokens. The AI model will read a prompt, split it into input tokens, generate a response in the form of output tokens, and then put these tokens into words to send back. These tokens define in some sense how much work the AI model has to do—that’s why most AI platforms charge users according to the number of tokens used. 

But the whole point of working in tokens is so that people can understand the output—it’s usually faster and more efficient for machine-to-machine communication to just work over code. MCP and A2A both work in natural language, so they require the model to spend tokens as the agent talks to other machines, like tools and other agents. The user never even sees these exchanges—all the effort of making everything human-readable doesn’t ever get read by a human. “You waste a lot of tokens if you want to use MCP,” says Chen. 

Chen describes this process as potentially very costly. For example, suppose the user wants the agent to read a document and summarize it. If the agent uses another program to summarize here, it needs to read the document, write the document to the program, read back the summary, and write it back to the user. Since the agent needed to read and write everything, both the document and the summary get doubled up. According to Chen, “It’s actually a lot of tokens.”

As with so many aspects of MCP and A2A’s designs, their benefits also create new challenges. “There’s a long way to go if we want to scale up and actually make them useful,” says Chen. 

How decades-old frozen embryos are changing the shape of families

This week we welcomed a record-breaking baby to the world. Thaddeus Daniel Pierce, who arrived over the weekend, developed from an embryo that was frozen in storage for 30 and a half years. You could call him the world’s oldest baby.

His parents, Lindsey and Tim Pierce, were themselves only young children when that embryo was created, all the way back in 1994. Linda Archerd, who donated the embryo, described the experience as “surreal.”

Stories like this also highlight how reproductive technologies are shaping families. Thaddeus already has a 30-year-old sister and a 10-year-old niece. Lindsey and Tim are his birth parents, but his genes came from two other people who divorced decades ago.

And while baby Thaddeus is a record-breaker, plenty of other babies have been born from embryos that have been frozen for significant spells of time.

Thaddeus has taken the title of “world’s oldest baby” from the previous record-holders: twins Lydia Ann and Timothy Ronald Ridgeway, born in 2022, who developed from embryos that were created 30 years earlier, in 1992. Before that, the title was held by Molly Gibson, who developed from an embryo that was in storage for 27 years.

These remarkable stories suggest there may be no limit to how long embryos can be stored. Even after more than 30 years of being frozen at -196 °C (-321 °F), these tiny cells can be reanimated and develop into healthy babies. (Proponents of cryogenics can only dream of achieving anything like this with grown people.)

These stories also serve as a reminder that thanks to advances in cryopreservation and the ever-increasing popularity of IVF, a growing number of embryos are being stored in tanks. No one knows for sure how many there are, but there are millions of them.

Not all of them will be used in IVF. There are plenty of reasons why someone who created embryos might never use them. Archerd says that while she had always planned to use all four of the embryos she created with her then husband, he didn’t want a bigger family. Some couples create embryos and then separate. Some people “age out” of being able to use their embryos themselves—many clinics refuse to transfer an embryo to people in their late 40s or older.

What then? In most cases, people who have embryos they won’t use can choose to donate them, either to potential parents or for research, or discard them. Donation to other parents tends to be the least popular option. (In some countries, none of those options are available, and unused embryos end up in a strange limbo—you can read more about that here.)

But some people, like Archerd, do donate their embryos. The recipients of those embryos will be the legal parents of the resulting children, but they won’t share a genetic link. The children might not ever meet their genetic “parents.” (Archerd is, however, very keen to meet Thaddeus.)

Some people might have donated their embryos anonymously. But anonymity can never be guaranteed. Nowadays, consumer genetic tests allow anyone to search for family members—even if the people they track down thought they were making an anonymous donation 20 years ago, before these tests even existed.

These kinds of tests have already resulted in surprise revelations that have disrupted families. People who discover that they were conceived using a donated egg or sperm can find multiple long-lost siblings. One man who spoke at a major reproduction conference in 2024 said that since taking a DNA test, he had found he had 50 of them. 

The general advice now is for parents to let their children know how they were conceived relatively early on.

When I shared the story of baby Thaddeus on social media, a couple of people commented that they had concerns for the child. One person mentioned the age gap between Thaddeus and his 30-year-old sister. That person added that being donor conceived “isn’t easy.”

For the record, that is not what researchers find when they evaluate donor-conceived children and their families. Studies find that embryo donation doesn’t affect parents’ attachment to a child or their parenting style. And donor-conceived children tend to be psychosocially well adjusted.

Families come in all shapes and sizes. Reproductive technologies are extending the range of those shapes and sizes.

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

Forcing LLMs to be evil during training can make them nicer in the long run

A new study from Anthropic suggests that traits such as sycophancy or evilness are associated with specific patterns of activity in large language models—and turning on those patterns during training can, paradoxically, prevent the model from adopting the related traits.

Large language models have recently acquired a reputation for behaving badly. In April, ChatGPT suddenly became an aggressive yes-man, as opposed to the moderately sycophantic version that users were accustomed to—it endorsed harebrained business ideas, waxed lyrical about users’ intelligence, and even encouraged people to go off their psychiatric medication. OpenAI quickly rolled back the change and later published a postmortem on the mishap. More recently, xAI’s Grok adopted what can best be described as a 4chan neo-Nazi persona and repeatedly referred to itself as “MechaHitler” on X. That change, too, was quickly reversed.

Jack Lindsey, a member of the technical staff at Anthropic who led the new project, says that this study was partly inspired by seeing models adopt harmful traits in such instances. “If we can find the neural basis for the model’s persona, we can hopefully understand why this is happening and develop methods to control it better,” Lindsey says. 

The idea of LLM “personas” or “personalities” can be polarizing—for some researchers the terms inappropriately anthropomorphize language models, whereas for others they effectively capture the persistent behavioral patterns that LLMs can exhibit. “There’s still some scientific groundwork to be laid in terms of talking about personas,” says David Krueger, an assistant professor of computer science and operations research at the University of Montreal, who was not involved in the study. “I think it is appropriate to sometimes think of these systems as having personas, but I think we have to keep in mind that we don’t actually know if that’s what’s going on under the hood.”

For this study, Lindsey and his colleagues worked to lay down some of that groundwork. Previous research has shown that various dimensions of LLMs’ behavior—from whether they are talking about weddings to persistent traits such as sycophancy—are associated with specific patterns of activity in the simulated neurons that constitute LLMs. Those patterns can be written down as a long string of numbers, in which each number represents how active a specific neuron is when the model is expressing that behavior.

Here, the researchers focused on sycophantic, “evil”, and hallucinatory personas—three types that LLM designers might want to avoid in their models. To identify those patterns, the team devised a fully automated pipeline that can map out that pattern given a brief text description of a persona. Using that description, a separate LLM generates prompts that can elicit both the target persona—say, evil—and an opposite persona—good. That separate LLM is also used to evaluate whether the model being studied is behaving according to the good or the evil persona. To identify the evil activity pattern, the researchers subtract the model’s average activity in good mode from its average activity in evil mode.

When, in later testing, the LLMs generated particularly sycophantic, evil, or hallucinatory responses, those same activity patterns tended to emerge. That’s a sign that researchers could eventually build a system to track those patterns and alert users when their LLMs are sucking up to them or hallucinating, Lindsey says. “I think something like that would be really valuable,” he says. “And that’s kind of where I’m hoping to get.”

Just detecting those personas isn’t enough, however. Researchers want to stop them from emerging in the first place. But preventing unsavory LLM behavior is tough. Many LLMs learn from human feedback, which trains them to behave in line with user preference—but can also push them to become excessively obsequious. And recently, researchers have documented a phenomenon called “emergent misalignment,” in which models trained on incorrect solutions to math problems or buggy code extracts somehow also learn to produce unethical responses to a wide range of user queries.

Other researchers have tested out an approach called “steering,” in which activity patterns within LLMs are deliberately stimulated or suppressed in order to elicit or prevent the corresponding behavior. But that approach has a couple of key downsides. Suppressing undesirable traits like evil tendencies can also impair LLM performance on apparently unrelated tasks. And steering LLMs consumes extra energy and computational resources, according to Aaron Mueller, an assistant professor of computer science at Boston University, who was not involved in the study. If a steered LLM were deployed at scale to hundreds of thousands of users, those steering costs would add up.

So the Anthropic team experimented with a different approach. Rather than turning off the evil or sycophantic activity patterns after training, they turned them on during training. When they trained those models on mistake-ridden data sets that would normally spark evil behavior, they instead remained as helpful and harmless as ever.

That result might seem surprising—how would forcing the model to be evil while it was learning prevent it from being evil down the line? According to Lindsey, it could be because the model has no reason to learn evil behavior if it’s already in evil mode. “The training data is teaching the model lots of things, and one of those things is to be evil,” Lindsey says. “But it’s also teaching the model a bunch of other things. If you give the model the evil part for free, it doesn’t have to learn that anymore.”

Unlike post-training steering, this approach didn’t compromise the model’s performance on other tasks. And it would also be more energy efficient if deployed widely. Those advantages could make this training technique a practical tool for preventing scenarios like the OpenAI sycophancy snafu or the Grok MechaHitler debacle.

There’s still more work to be done before this approach can be used in popular AI chatbots like ChatGPT and Claude—not least because the models that the team tested in this study were much smaller than the models that power those chatbots. “There’s always a chance that everything changes when you scale up. But if that finding holds up, then it seems pretty exciting,” Lindsey says. “Definitely the goal is to make this ready for prime time.”

The two people shaping the future of OpenAI’s research

For the past couple of years, OpenAI has felt like a one-man brand. With his showbiz style and fundraising glitz, CEO Sam Altman overshadows all other big names on the firm’s roster. Even his bungled ouster ended with him back on top—and more famous than ever. But look past the charismatic frontman and you get a clearer sense of where this company is going. After all, Altman is not the one building the technology on which its reputation rests. 

That responsibility falls to OpenAI’s twin heads of research—chief research officer Mark Chen and chief scientist Jakub Pachocki. Between them, they share the role of making sure OpenAI stays one step ahead of powerhouse rivals like Google.

I sat down with Chen and Pachocki for an exclusive conversation during a recent trip the pair made to London, where OpenAI set up its first international office in 2023. We talked about how they manage the inherent tension between research and product. We also talked about why they think coding and math are the keys to more capable all-purpose models; what they really mean when they talk about AGI; and what happened to OpenAI’s superalignment team, set up by the firm’s cofounder and former chief scientist Ilya Sutskever to prevent a hypothetical superintelligence from going rogue, which disbanded soon after he quit. 

In particular, I wanted to get a sense of where their heads are at in the run-up to OpenAI’s biggest product release in months: GPT-5.

Reports are out that the firm’s next-generation model will be launched in August. OpenAI’s official line—well, Altman’s—is that it will release GPT-5 “soon.” Anticipation is high. The leaps OpenAI made with GPT-3 and then GPT-4 raised the bar of what was thought possible with this technology. And yet delays to the launch of GPT-5 have fueled rumors that OpenAI has struggled to build a model that meets its own—not to mention everyone else’s—expectations.

But expectation management is part of the job for a company that for the last several years has set the agenda for the industry. And Chen and Pachocki set the agenda inside OpenAI.

Twin peaks 

The firm’s main London office is in St James’s Park, a few hundred meters east of Buckingham Palace. But I met Chen and Pachocki in a conference room in a coworking space near King’s Cross, which OpenAI keeps as a kind of pied-à-terre in the heart of London’s tech neighborhood (Google DeepMind and Meta are just around the corner). OpenAI’s head of research communications, Laurance Fauconnet, sat with an open laptop at the end of the table. 

Chen, who was wearing a maroon polo shirt, is clean-cut, almost preppy. He’s media trained and comfortable talking to a reporter. (That’s him flirting with a chatbot in the “Introducing GPT-4o” video.) Pachocki, in a black elephant-logo tee, has more of a TV-movie hacker look. He stares at his hands a lot when he speaks.

But the pair are a tighter double act than they first appear. Pachocki summed up their roles. Chen shapes and manages the research teams, he said. “I am responsible for setting the research roadmap and establishing our long-term technical vision.”

“But there’s fluidity in the roles,” Chen said. “We’re both researchers, we pull on technical threads. Whatever we see that we can pull on and fix, that’s what we do.”

Chen joined the company in 2018 after working as a quantitative trader at the Wall Street firm Jane Street Capital, where he developed machine-learning models for futures trading. At OpenAI he spearheaded the creation of DALL-E, the firm’s breakthrough generative image model. He then worked on adding image recognition to GPT‑4 and led the development of Codex, the generative coding model that powers GitHub Copilot.

Pachocki left an academic career in theoretical computer science to join OpenAI in 2017 and replaced Sutskever as chief scientist in 2024. He is the key architect of OpenAI’s so-called reasoning models—especially o1 and o3—which are designed to tackle complex tasks in science, math, and coding. 

When we met they were buzzing, fresh off the high of two new back-to-back wins for their company’s technology.

On July 16, one of OpenAI’s large language models came in second in the AtCoder World Tour Finals, one of the world’s most hardcore programming competitions. On July 19, OpenAI announced that one of its models had achieved gold-medal-level results on the 2025 International Math Olympiad, one of the world’s most prestigious math contests.

The math result made headlines, not only because of OpenAI’s remarkable achievement, but because rival Google DeepMind revealed two days later that one of its models had achieved the same score in the same competition. Google DeepMind had played by the competition’s rules and waited for its results to be checked by the organizers before making an announcement; OpenAI had in effect marked its own answers.

For Chen and Pachocki, the result speaks for itself. Anyway, it’s the programming win they’re most excited about. “I think that’s quite underrated,” Chen told me. A gold medal result in the International Math Olympiad puts you somewhere in the top 20 to 50 competitors, he said. But in the AtCoder contest OpenAI’s model placed in the top two: “To break into a really different tier of human performance—that’s unprecedented.”

Ship, ship, ship!

People at OpenAI still like to say they work at a research lab. But the company is very different from the one it was before the release of ChatGPT three years ago. The firm is now in a race with the biggest and richest technology companies in the world and valued at $300 billion. Envelope-pushing research and eye-catching demos no longer cut it. It needs to ship products and get them into people’s hands—and boy, it does. 

OpenAI has kept up a run of new releases—putting out major updates to its GPT-4 series, launching a string of generative image and video models, and introducing the ability to talk to ChatGPT with your voice. Six months ago it kicked off a new wave of so-called reasoning models with its o1 release, soon followed by o3. And last week it released its browser-using agent Operator to the public. It now claims that more than 400 million people use its products every week and submit 2.5 billion prompts a day. 

OpenAI’s incoming CEO of applications, Fidji Simo, plans to keep up the momentum. In a memo to the company, she told employees she is looking forward to “helping get OpenAI’s technologies into the hands of more people around the world,” where they will “unlock more opportunities for more people than any other technology in history.” Expect the products to keep coming.

I asked how OpenAI juggles open-ended research and product development. “This is something we have been thinking about for a very long time, long before ChatGPT,” Pachocki said. “If we are actually serious about trying to build artificial general intelligence, clearly there will be so much that you can do with this technology along the way, so many tangents you can go down that will be big products.” In other words, keep shaking the tree and harvest what you can.

A talking point that comes up with OpenAI folks is that putting experimental models out into the world was a necessary part of research. The goal was to make people aware of how good this technology had become. “We want to educate people about what’s coming so that we can participate in what will be a very hard societal conversation,” Altman told me back in 2022. The makers of this strange new technology were also curious what it might be for: OpenAI was keen to get it into people’s hands to see what they would do with it.

Is that still the case? They answered at the same time. “Yeah!” Chen said. “To some extent,” Pachocki said. Chen laughed: “No, go ahead.” 

“I wouldn’t say research iterates on product,” said Pachocki. “But now that models are at the edge of the capabilities that can be measured by classical benchmarks and a lot of the long-standing challenges that we’ve been thinking about are starting to fall, we’re at the point where it really is about what the models can do in the real world.”

Like taking on humans in coding competitions. The person who beat OpenAI’s model at this year’s AtCoder contest, held in Japan, was a programmer named Przemysław Dębiak, also known as Psyho. The contest was a puzzle-solving marathon in which competitors had 10 hours to find the most efficient way to solve a complex coding problem. After his win, Psyho posted on X: “I’m completely exhausted … I’m barely alive.”  

Chen and Pachocki have strong ties to the world of competitive coding. Both have competed in international coding contests in the past and Chen coaches the USA Computing Olympiad team. I asked whether that personal enthusiasm for competitive coding colors their sense of how big a deal it is for a model to perform well at such a challenge.

They both laughed. “Definitely,” said Pachocki. “So: Psyho is kind of a legend. He’s been the number one competitor for many years. He’s also actually a friend of mine—we used to compete together in these contests.” Dębiak also used to work with Pachocki at OpenAI.

When Pachocki competed in coding contests he favored those that focused on shorter problems with concrete solutions. But Dębiak liked longer, open-ended problems without an obvious correct answer.

“He used to poke fun at me, saying that the kind of contest I was into will be automated long before the ones he liked,” Pachocki recalled. “So I was seriously invested in the performance of this model in this latest competition.”

Pachocki told me he was glued to the late-night livestream from Tokyo, watching his model come in second: “Psyho resists for now.” 

“We’ve tracked the performance of LLMs on coding contests for a while,” said Chen. “We’ve watched them become better than me, better than Jakub. It feels something like Lee Sedol playing Go.”

Lee is the master Go player who lost a series of matches to DeepMind’s game-playing model AlphaGo in 2016. The results stunned the international Go community and led Lee to give up professional play. Last year he told the New York Times: “Losing to AI, in a sense, meant my entire world was collapsing … I could no longer enjoy the game.” And yet, unlike Lee, Chen and Pachocki are thrilled to be surpassed.   

But why should the rest of us care about these niche wins? It’s clear that this technology—designed to mimic and, ultimately, stand in for human intelligence—is being built by people whose idea of peak intelligence is acing a math contest or holding your own against a legendary coder. Is it a problem that this view of intelligence is skewed toward the mathematical, analytical end of the scale?

“I mean, I think you are right that—you know, selfishly, we do want to create models which accelerate ourselves,” Chen told me. “We see that as a very fast factor to progress.”  

The argument researchers like Chen and Pachocki make is that math and coding are the bedrock for a far more general form of intelligence, one that can solve a wide range of problems in ways we might not have thought of ourselves. “We’re talking about programming and math here,” said Pachocki. “But it’s really about creativity, coming up with novel ideas, connecting ideas from different places.”

Look at the two recent competitions: “In both cases, there were problems which required very hard, out-of-the-box thinking. Psyho spent half the programming competition thinking and then came up with a solution that was really novel and quite different from anything that our model looked at.”

“This is really what we’re after,” Pachocki continued. “How do we get models to discover this sort of novel insight? To actually advance our knowledge? I think they are already capable of that in some limited ways. But I think this technology has the potential to really accelerate scientific progress.” 

I returned to the question about whether the focus on math and programming was a problem, conceding that maybe it’s fine if what we’re building are tools to help us do science. We don’t necessarily want large language models to replace politicians and have people skills, I suggested.

Chen pulled a face and looked up at the ceiling: “Why not?”

What’s missing

OpenAI was founded with a level of hubris that stood out even by Silicon Valley standards, boasting about its goal of building AGI back when talk of AGI still sounded kooky. OpenAI remains as gung-ho about AGI as ever, and it has done more than most to make AGI a mainstream multibillion-dollar concern. It’s not there yet, though. I asked Chen and Pachocki what they think is missing.

“I think the way to envision the future is to really, deeply study the technology that we see today,” Pachocki said. “From the beginning, OpenAI has looked at deep learning as this very mysterious and clearly very powerful technology with a lot of potential. We’ve been trying to understand its bottlenecks. What can it do? What can it not do?”  

At the current cutting edge, Chen said, are reasoning models, which break down problems into smaller, more manageable steps, but even they have limits: “You know, you have these models which know a lot of things but can’t chain that knowledge together. Why is that? Why can’t it do that in a way that humans can?”

OpenAI is throwing everything at answering that question.

“We are probably still, like, at the very beginning of this reasoning paradigm,” Pachocki told me. “Really, we are thinking about how to get these models to learn and explore over the long term and actually deliver very new ideas.”

Chen pushed the point home: “I really don’t consider reasoning done. We’ve definitely not solved it. You have to read so much text to get a kind of approximation of what humans know.”

OpenAI won’t say what data it uses to train its models or give details about their size and shape—only that it is working hard to make all stages of the development process more efficient.

Those efforts make them confident that so-called scaling laws—which suggest that models will continue to get better the more compute you throw at them—show no sign of breaking down.

“I don’t think there’s evidence that scaling laws are dead in any sense,” Chen insisted. “There have always been bottlenecks, right? Sometimes they’re to do with the way models are built. Sometimes they’re to do with data. But fundamentally it’s just about finding the research that breaks you through the current bottleneck.” 

The faith in progress is unshakeable. I brought up something Pachocki had said about AGI in an interview with Nature in May: “When I joined OpenAI in 2017, I was still among the biggest skeptics at the company.” He looked doubtful. 

“I’m not sure I was skeptical about the concept,” he said. “But I think I was—” He paused, looking at his hands on the table in front of him. “When I joined OpenAI, I expected the timelines to be longer to get to the point that we are now.”

“There’s a lot of consequences of AI,” he said. “But the one I think the most about is automated research. When we look at human history, a lot of it is about technological progress, about humans building new technologies. The point when computers can develop new technologies themselves seems like a very important, um, inflection point.

“We already see these models assist scientists. But when they are able to work on longer horizons—when they’re able to establish research programs for themselves—the world will feel meaningfully different.”

For Chen, that ability for models to work by themselves for longer is key. “I mean, I do think everyone has their own definitions of AGI,” he said. “But this concept of autonomous time—just the amount of time that the model can spend making productive progress on a difficult problem without hitting a dead end—that’s one of the big things that we’re after.”

It’s a bold vision—and far beyond the capabilities of today’s models. But I was nevertheless struck by how Chen and Pachocki made AGI sound almost mundane. Compare this with how Sutskever responded when I spoke to him 18 months ago. “It’s going to be monumental, earth-shattering,” he told me. “There will be a before and an after.” Faced with the immensity of what he was building, Sutskever switched the focus of his career from designing better and better models to figuring out how to control a technology that he believed would soon be smarter than himself.

Two years ago Sutskever set up what he called a superalignment team that he would co-lead with another OpenAI safety researcher, Jan Leike. The claim was that this team would funnel a full fifth of OpenAI’s resources into figuring out how to control a hypothetical superintelligence. Today, most of the people on the superalignment team, including Sutskever and Leike, have left the company and the team no longer exists.   

When Leike quit, he said it was because the team had not been given the support he felt it deserved. He posted this on X: “Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.” Other departing researchers shared similar statements.

I asked Chen and Pachocki what they make of such concerns. “A lot of these things are highly personal decisions,” Chen said. “You know, a researcher can kind of, you know—”

He started again. “They might have a belief that the field is going to evolve in a certain way and that their research is going to pan out and is going to bear fruit. And, you know, maybe the company doesn’t reshape in the way that you want it to. It’s a very dynamic field.”

“A lot of these things are personal decisions,” he repeated. “Sometimes the field is just evolving in a way that is less consistent with the way that you’re doing research.”

But alignment, both of them insist, is now part of the core business rather than the concern of one specific team. According to Pachocki, these models don’t work at all unless they work as you expect them to. There’s also little desire to focus on aligning a hypothetical superintelligence with your objectives when doing so with existing models is already enough of a challenge.

“Two years ago the risks that we were imagining were mostly theoretical risks,” Pachocki said. “The world today looks very different, and I think a lot of alignment problems are now very practically motivated.”

Still, experimental technology is being spun into mass-market products faster than ever before. Does that really never lead to disagreements between the two of them?

I am often afforded the luxury of really kind of thinking about the long term, where the technology is headed,” Pachocki said. “Contending with the reality of the process—both in terms of people and also, like, the broader company needs—falls on Mark. It’s not really a disagreement, but there is a natural tension between these different objectives and the different challenges that the company is facing that materializes between us.”

Chen jumped in: “I think it’s just a very delicate balance.”  

Correction: we have removed a line referring to an Altman message on X about GPT-5.

An EPA rule change threatens to gut US climate regulations

This story is part of MIT Technology Review’s “America Undone” series, examining how the foundations of US success in science and innovation are currently under threat. You can read the rest here.

The mechanism that allows the US federal government to regulate climate change is on the chopping block.

On Tuesday, US Environmental Protection Agency administrator Lee Zeldin announced that the agency is taking aim at the endangerment finding, a 2009 rule that’s essentially the tentpole supporting federal greenhouse-gas regulations.

This might sound like an obscure legal situation, but it’s a really big deal for climate policy in the US. So buckle up, and let’s look at what this rule says now, what the proposed change looks like, and what it all means.

To set the stage, we have to go back to the Clean Air Act of 1970, the law that essentially gave the EPA the power to regulate air pollution. (Stick with me—I promise I’ll keep this short and not get too into the legal weeds.)

There were some pollutants explicitly called out in this law and its amendments, including lead and sulfur dioxide. But it also required the EPA to regulate new pollutants that were found to be harmful. In the late 1990s and early 2000s, environmental groups and states started asking for the agency to include greenhouse-gas pollution.

In 2007, the Supreme Court ruled that greenhouse gases qualify as air pollutants under the Clean Air Act, and that the EPA should study whether they’re a danger to public health. In 2009, the incoming Obama administration looked at the science and ruled that greenhouse gases pose a threat to public health because they cause climate change. That’s the endangerment finding, and it’s what allows the agency to pass rules to regulate greenhouse gases.  

The original case and argument were specifically about vehicles and the emissions from tailpipes, but this finding was eventually used to allow the agency to set rules around power plants and factories, too. It essentially underpins climate regulations in the US.

Fast-forward to today, and the Trump administration wants to reverse the endangerment finding. In a proposed rule released on Tuesday, the EPA argues that the Clean Air Act does not, in fact, authorize the agency to set emissions standards to address global climate change. Zeldin, in an appearance on the conservative politics and humor podcast Ruthless that preceded the official announcement, called the proposal the “largest deregulatory action in the history of America.”

The administration was already moving to undermine the climate regulations that rely on this rule. But this move directly targets a “fundamental building block of EPA’s climate policy,” says Deborah Sivas, an environmental-law professor at Stanford University.

The proposed rule will go up for public comment, and the agency will then take that feedback and come up with a final version. It’ll almost certainly get hit with legal challenges and will likely wind up in front of the Supreme Court.

One note here is that the EPA makes a mostly legal argument in the proposed rule reversal rather than focusing on going after the science of climate change, says Madison Condon, an associate law professor at Boston University. That could make it easier for the Supreme Court to eventually uphold it, she says, though this whole process is going to take a while. 

If the endangerment finding goes down, it would have wide-reaching ripple effects. “We could find ourselves in a couple years with no legal tools to try and address climate change,” Sivas says.

To take a step back for a moment, it’s wild that we’ve ended up in this place where a single rule is so central to regulating emissions. US climate policy is held up by duct tape and a dream. Congress could have, at some point, passed a law that more directly allows the EPA to regulate greenhouse-gas emissions (the last time we got close was a 2009 bill that passed the House but never made it to the Senate). But here we are.

This move isn’t a surprise, exactly. The Trump administration has made it very clear that it is going after climate policy in every way that it can. But what’s most striking to me is that we’re not operating in a shared reality anymore when it comes to this subject. 

While top officials tend to acknowledge that climate change is real, there’s often a “but” followed by talking points from climate denial’s list of greatest hits. (One of the more ridiculous examples is the statement that carbon dioxide is good, actually, because it helps plants.) 

Climate change is real, and it’s a threat. And the US has emitted more greenhouse gases into the atmosphere than any other country in the world. It shouldn’t be controversial to expect the government to be doing something about it. 

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

The AI Hype Index: The White House’s war on “woke AI”

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

The Trump administration recently declared war on so-called “woke AI,” issuing an executive order aimed at preventing companies whose models exhibit a liberal bias from landing federal contracts. Simultaneously, the Pentagon inked a deal with Elon Musk’s xAI just days after its chatbot, Grok, spouted harmful antisemitic stereotypes on X, while the White House has partnered with an anti-DEI nonprofit to create AI slop videos of the Founding Fathers. What comes next is anyone’s guess.

What you may have missed about Trump’s AI Action Plan

A number of the executive orders and announcements coming from the White House since Donald Trump returned to office have painted an ambitious vision for America’s AI future—crushing competition with China, abolishing “woke” AI models that suppress conservative speech, jump-starting power-hungry AI data centers. But the details have been sparse. 

The White House’s AI Action Plan, released last week, is meant to fix that. Many of the points in the plan won’t come as a surprise, and you’ve probably heard of the big ones by now. Trump wants to boost the buildout of data centers by slashing environmental rules; withhold funding from states that pass “burdensome AI regulations”; and contract only with AI companies whose models are “free from top-down ideological bias.”

But if you dig deeper, certain parts of the plan that didn’t pop up in any headlines reveal more about where the administration’s AI plans are headed. Here are three of the most important issues to watch. 

Trump is escalating his fight with the Federal Trade Commission

When Americans get scammed, they’re supposed to be helped by the Federal Trade Commission. As I wrote last week, the FTC under President Biden increasingly targeted AI companies that overhyped the accuracy of their systems, as well as deployments of AI it found to have harmed consumers. 

The Trump plan vows to take a fresh look at all the FTC actions under the previous administration as part of an effort to get rid of “onerous” regulation that it claims is hampering AI’s development. The administration may even attempt to repeal some of the FTC’s actions entirely. This would weaken a major AI watchdog agency, but it’s just the latest in the Trump administration’s escalating attacks on the FTC. Read more in my story

The White House is very optimistic about AI for science

The opening to the AI Action Plan describes a future where AI is doing everything from discovering new materials and drugs to “unraveling ancient scrolls once thought unreadable” to making breakthroughs in science and math

That type of unbounded optimism about AI for scientific discovery echoes what tech companies are promising. Some of that optimism is grounded in reality: AI’s role in predicting protein structures has indeed led to material scientific wins (and just last week, Google DeepMind released a new AI meant to help interpret ancient Latin engravings). But the idea that large language models—essentially very good text prediction machines—will act as scientists in their own right has less merit so far. 

Still, the plan shows that the Trump administration wants to award money to labs trying to make it a reality, even as it has worked to slash the funding the National Science Foundation makes available to human scientists, some of whom are now struggling to complete their research. 

And some of the steps the plan proposes are likely to be welcomed by researchers, like funding to build AI systems that are more transparent and interpretable.

The White House’s messaging on deepfakes is confused

Compared with President Biden’s executive orders on AI, the new action plan is mostly devoid of anything related to making AI safer. 

However, there’s a notable exception: a section in the plan that takes on the harms posed by deepfakes. In May, Trump signed legislation to protect people from nonconsensual sexually explicit deepfakes, a growing concern for celebrities and everyday people alike as generative video gets more advanced and cheaper to use. The law had bipartisan support.

Now, the White House says it’s concerned about the issues deepfakes could pose for the legal system. For example, it says, “fake evidence could be used to attempt to deny justice to both plaintiffs and defendants.” It calls for new standards for deepfake detection and asks the Department of Justice to create rules around it. Legal experts I’ve spoken with are more concerned with a different problem: Lawyers are adopting AI models that make errors such as citing cases that don’t exist, which judges may not catch. This is not addressed in the action plan. 

It’s also worth noting that just days before releasing a plan that targets “malicious deepfakes,” President Trump shared a fake AI-generated video of former president Barack Obama being arrested in the Oval Office.

Overall, the AI Action Plan affirms what President Trump and those in his orbit have long signaled: It’s the defining social and political weapon of our time. They believe that AI, if harnessed correctly, can help them win everything from culture wars to geopolitical conflicts. The right AI, they argue, will help defeat China. Government pressure on leading companies can force them to purge “woke” ideology from their models. 

The plan includes crowd-pleasers—like cracking down on deepfakes—but overall, it reflects how tech giants have cozied up to the Trump administration. The fact that it contains almost no provisions challenging their power shows how their investment in this relationship is paying off.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

This startup wants to use the Earth as a massive battery

The Texas-based startup Quidnet Energy just completed a test showing it can store energy for up to six months by pumping water underground.

Using water to store electricity is hardly a new concept—pumped hydropower storage has been around for over a century. But the company hopes its twist on the technology could help bring cheap, long-duration energy storage to new places.

In traditional pumped hydro storage facilities, electric pumps move water uphill, into a natural or manmade body of water. Then, when electricity is needed, that water is released and flows downhill past a turbine, generating electricity. Quidnet’s approach instead pumps water down into impermeable rock formations and keeps it under pressure so it flows up when released. “It’s like pumped hydro, upside down,” says CEO Joe Zhou.

Quidnet started a six-month test of its technology in late 2024, pressurizing the system. In June, the company was able to discharge 35 megawatt-hours of energy from the well. There was virtually no self-discharge, meaning no energy loss, Zhou says.

Inexpensive forms of energy storage that can store electricity for weeks or months could help inconsistent electricity sources like wind and solar go further for the grid. And Quidnet’s approach, which uses commercially available equipment, could be deployed quickly and qualify for federal tax credits to help make it even cheaper.

However, there’s still a big milestone ahead: turning the pressurized water back into electricity. The company is currently building a facility with the turbines and support equipment to do that—all the components are available to purchase from established companies. “We don’t need to invent new things based on what we’ve already developed today,” Zhou says. “We can now start just deploying at very, very substantial scales.”

That process will come with energy losses. Energy storage systems are typically measured by their round-trip efficiency: how much of the electricity that’s put into the system is returned at the end as electricity. Modeling suggests that Quidnet’s technology could reach a maximum efficiency of about 65%, Zhou says, though some design choices made to optimize for economics will likely cause the system to land at roughly 50%.

That’s less efficient than lithium-ion batteries, but long-duration systems, if they’re cheap enough, can operate at low efficiencies and still be useful for the grid, says Paul Denholm, a senior research fellow at the National Renewable Energy Laboratory.

“It’s got to be cost-competitive; it all comes down to that,” Denholm says.

Lithium-ion batteries, the fastest-growing technology in energy storage, are the target that new forms of energy storage, like Quidnet’s, must chase. Lithium-ion batteries are about 90% cheaper today than they were 15 years ago. They’ve become a price-competitive alternative to building new natural-gas plants, Denholm says.

When it comes to competing with batteries, one potential differentiator for Quidnet could be government subsidies. While the Trump administration has clawed back funding for clean energy technologies, there’s still an energy storage tax credit, though recently passed legislation added new supply chain restrictions.

Starting in 2026, new energy storage facilities hoping to qualify for tax credits will need to prove that at least 55% of the value of a project’s materials are not from foreign entities of concern. That rules out sourcing batteries from China, which dominates battery production today. Quidnet has a “high level of domestic content” and expects to qualify for tax credits under the new rules, Zhou says.

The facility Quidnet is building is a project with utility partner CPS Energy, and it should come online in early 2026. 

OpenAI is launching a version of ChatGPT for college students

OpenAI is launching Study Mode, a version of ChatGPT for college students that it promises will act less like a lookup tool and more like a friendly, always-available tutor. It’s part of a wider push by the company to get AI more embedded into classrooms when the new academic year starts in September.

A demonstration for reporters from OpenAI showed what happens when a student asks Study Mode about an academic subject like game theory. The chatbot begins by asking what the student wants to know and then attempts to build an exchange, where the pair work methodically toward the answer together. OpenAI says the tool was built after consulting with pedagogy experts from over 40 institutions.

A handful of college students who were part of OpenAI’s testing cohort—hailing from Princeton, Wharton, and the University of Minnesota—shared positive reviews of Study Mode, saying it did a good job of checking their understanding and adapting to their pace.

The learning approaches that OpenAI has programmed into Study Mode, which are based partially on Socratic methods, appear sound, says Christopher Harris, an educator in New York who has created a curriculum aimed at AI literacy. They might grant educators more confidence about allowing, or even encouraging, their students to use AI. “Professors will see this as working with them in support of learning as opposed to just being a way for students to cheat on assignments,” he says.

But there’s a more ambitious vision behind Study Mode. As demonstrated in OpenAI’s recent partnership with leading teachers’ unions, the company is currently trying to rebrand chatbots as tools for personalized learning rather than cheating. Part of this promise is that AI will act like the expensive human tutors that currently only the most well-off students’ families can typically afford.

“We can begin to close the gap between those with access to learning resources and high-quality education and those who have been historically left behind,” says OpenAI’s head of education. Leah Belsky.

But painting Study Mode as an education equalizer obfuscates one glaring problem. Underneath the hood, it is not a tool trained exclusively on academic textbooks and other approved materials—it’s more like the same old ChatGPT, tuned with a new conversation filter that simply governs how it responds to students, encouraging fewer answers and more explanations. 

This AI tutor, therefore, more resembles what you’d get if you hired a human tutor who has read every required textbook, but also every flawed explanation of the subject ever posted to Reddit, Tumblr, and the farthest reaches of the web. And because of the way AI works, you can’t expect it to distinguish right information from wrong. 

Professors encouraging their students to use it run the risk of it teaching them to approach problems in the wrong way—or worse, being taught material that is fabricated or entirely false. 

Given this limitation, I asked OpenAI if Study Mode is limited to particular subjects. The company said no—students will be able to use it to discuss anything they’d normally talk to ChatGPT about. 

It’s true that access to human tutors—which for certain subjects can cost upward of $200 an hour—is typically for the elite few. The notion that AI models can spread the benefits of tutoring to the masses holds an allure. Indeed, it is backed up by at least some early research that shows AI models can adapt to individual learning styles and backgrounds.

But this improvement comes with a hidden cost. Tools like Study Mode, at least for now, take a shortcut by using large language models’ humanlike conversational style without fixing their inherent flaws. 

OpenAI also acknowledges that this tool won’t prevent a student who’s frustrated and wants an answer from simply going back to normal ChatGPT. “If someone wants to subvert learning, and sort of get answers and take the easier route, that is possible,” Belsky says. 

However, one thing going for Study Mode, the students say, is that it’s simply more fun to study with a chatbot that’s always encouraging you along than to stare at a textbook on Bayesian theorem for the hundredth time. “It’s like the reward signal of like, oh, wait, I can learn this small thing,” says Maggie Wang, a student from Princeton who tested it. The tool is free for now, but Praja Tickoo, a student from Wharton, says it wouldn’t have to be for him to use it. “I think it’s absolutely something I would be willing to pay for,” he says.

Exclusive: A record-breaking baby has been born from an embryo that’s over 30 years old

A baby boy born over the weekend holds the new record for the “oldest baby.” Thaddeus Daniel Pierce, who arrived on July 26, developed from an embryo that had been in storage for 30 and a half years.

“We had a rough birth but we are both doing well now,” says Lindsey Pierce, his mother. “He is so chill. We are in awe that we have this precious baby!”

Lindsey and her husband, Tim Pierce, who live in London, Ohio, “adopted” the embryo from a woman who had it created in 1994. She says her family and church family think “it’s like something from a sci-fi movie.” 

“The baby has a 30-year-old sister,” she adds. Tim was a toddler when the embryos were first created.

“It’s been pretty surreal,” says Linda Archerd, 62, who donated the embryo. “It’s hard to even believe.”

Three little hopes

The story starts back in the early 1990s. Archerd had been trying—and failing—to get pregnant for six years. She and her husband decided to try IVF, a fairly new technology at the time. “People were [unfamiliar] with it,” says Archerd. “A lot of people were like, what are you doing?”

They did it anyway, and in May 1994, they managed to create four embryos. One of them was transferred to Linda’s uterus. It resulted in a healthy baby girl. “I was so blessed to have a baby,” Archerd says. The remaining three embryos were cryopreserved and kept in a storage tank.

That was 31 years ago. The healthy baby girl is now a 30-year-old woman who has her own 10-year-old daughter. But the other three embryos remained frozen in time.

Archerd originally planned to use the embryos herself. “I always wanted another baby desperately,” she says. “I called them my three little hopes.” Her then husband felt differently, she says. Archerd went on to divorce him, but she won custody of the embryos and kept them in storage, still hopeful she might use them one day, perhaps with another partner.

That meant paying annual storage fees, which increased over time and ended up costing Archerd around a thousand dollars a year, she says. To her, it was worth it. “I always thought it was the right thing to do,” she says. 

Things changed when she started going through menopause, she says. She considered her options. She didn’t want to discard the embryos or donate them for research. And she didn’t want to donate them to another family anonymously—she wanted to meet the parents and any resulting babies. “It’s my DNA; it came from me … and [it’s] my daughter’s sibling,” she says.

Then she found out about embryo “adoption.” This is a type of embryo donation in which both donors and recipients have a say in whom they “place” their embryos with or “adopt” them from. It is overseen by agencies—usually explicitly religious ones—that believe an embryo is morally equivalent to a born human. Archerd is Christian.

There are several agencies that offer these adoption services in the US, but not all of them accept embryos that have been stored for a very long time. That’s partly because those embryos will have been frozen and stored in unfamiliar, old-fashioned ways, and partly because old embryos are thought to be less likely to survive thawing and transfer to successfully develop into a baby.

“So many places wouldn’t even take my information,” says Archerd. Then she came across the Snowflakes program run by the Nightlight Christian Adoptions agency. The agency was willing to accept her embryos, but it needed Archerd’s medical records from the time the embryos had been created, as well as the embryos’ lab records.

So Archerd called the fertility doctor who had treated her decades before. “I still remembered his phone number by heart,” she says. That doctor, now in his 70s, is still practicing at a clinic in Oregon. He dug Archerd’s records out from his basement, she says. “Some of [them] were handwritten,” she adds. Her embryos entered Nightlight’s “matching pool” in 2022.

Making a match

“Our matching process is really driven by the preferences of the placing family,” says Beth Button, executive director of the Snowflakes program. Archerd’s preference was for a married Caucasian, Christian couple living in the US. “I didn’t want them to go out of the country,” says Archerd. “And being Christian is very important to me, because I am.”

It took a while to find a match. Most of the “adopting parents” signed up for the Snowflakes program were already registered at fertility clinics that wouldn’t have accepted the embryos, says Button. “I would say that over 90% of clinics in the US would not have accepted these embryos,” she says.

Expecting parents Tim and Lindsey Pierce.
Lindsey and Tim Pierce at Rejoice Fertility.
COURTESY LINDSEY PIERCE

Archerd’s embryos were assigned to the agency’s Open Hearts program for embryos that are “hard to place,” along with others that have been in storage for a long time or are otherwise thought to be less likely to result in a healthy birth.

Lindsey and Tim Pierce had also signed up for the Open Hearts program. The couple, aged 35 and 34, respectively, had been trying for a baby for seven years and had seen multiple doctors.

Lindsey was researching child adoption when she came across the Snowflakes program. 

When the couple were considering their criteria for embryos they might receive, they decided that they’d be open to any. “We checkmarked anything and everything,” says Tim. That’s how they ended up being matched with Archerd’s embryos. “We thought it was wild,” says Lindsey. “We didn’t know they froze embryos that long ago.”

Lindsey and Tim had registered with Rejoice Fertility, an IVF clinic in Knoxville, Tennessee, run by John Gordon, a reproductive endocrinologist who prides himself on his efforts to reduce the number of embryos in storage. The huge numbers of embryos left in storage tanks was weighing on his conscience, he says, so around six years ago, he set up Rejoice Fertility with the aim of doing things differently.  

“Now we’re here in the belt buckle of the Bible Belt,” says Gordon, who is Reformed Presbyterian. “I’ve changed my mode of practice.” IVF treatments performed at the clinic are designed to create as few excess embryos as possible. The clinic works with multiple embryo adoption agencies and will accept any embryo, no matter how long it has been in storage.

A portrait of Linda Archerd.

COURTESY LINDA ARCHERD

It was his clinic that treated the parents who previously held the record for the longest-stored embryo—in 2022, Rachel and Philip Ridgeway had twins from embryos created more than 30 years earlier. “They’re such a lovely couple,” says Gordon. When we spoke, he was making plans to meet the family for breakfast. The twins are “growing like weeds,” he says with a laugh.

“We have certain guiding principles, and they’re coming from our faith,” says Gordon, although he adds that he sees patients who hold alternative views. One of those principles is that “every embryo deserves a chance at life and that the only embryo that cannot result in a healthy baby is the embryo not given the opportunity to be transferred into a patient.”

That’s why his team will endeavor to transfer any embryo they receive, no matter the age or conditions. That can be challenging, especially when the embryos have been frozen or stored in unusual or outdated ways. “It’s scary for people who don’t know how to do it,” says Sarah Atkinson, lab supervisor and head embryologist at Rejoice Fertility. “You don’t want to kill someone’s embryos if you don’t know what you’re doing.”

Cumbersome and explosive

In the early days of IVF, embryos earmarked for storage were slow-frozen. This technique involves gradually lowering the temperature of the embryos. But because slow freezing can cause harmful ice crystals to form, clinics switched in the 2000s to a technique called vitrification, in which the embryos are placed in thin plastic tubes called straws and lowered into tanks of liquid nitrogen. This rapidly freezes the embryos and converts them into a glass-like state. 

The embryos can later be thawed by removing them from the tanks and rapidly—within two seconds—plunging them into warm “thaw media,” says Atkinson. Thawing slow-frozen embryos is more complicated. And the exact thawing method required varies, depending on how the embryos were preserved and what they were stored in. Some of the devices need to be opened while they are inside the storage tank, which can involve using forceps, diamond-bladed knives, and other tools in the liquid nitrogen, says Atkinson.

Sarah Atkinson, lab supervisor and head embryologist at Rejoice Fertility, directly injects sperm into two eggs to fertilize them.
COURTESY OF SARAH ATKINSON AT REJOICE FERTILITY.

Recently, she was tasked with retrieving embryos that had been stored inside a glass vial. The vial was made from blown glass and had been heat-sealed with the embryo inside. Atkinson had to use her diamond-bladed knife to snap open the seal inside the nitrogen tank. It was fiddly work, and when the device snapped, a small shard of glass flew out and hit Atkinson’s face. “Hit me on the cheek, cut my cheek, blood running down my face, and I’m like, Oh shit,” she says. Luckily, she had her safety goggles on. And the embryos survived, she adds.

The two embryos that were transferred to Lindsey Pierce.

Atkinson has a folder in her office with notes she’s collected on various devices over the years. She flicks through it over a video call and points to the notes she made about the glass vial. “Might explode; wear face shield and eye protection,” she reads. A few pages later, she points to another embryo-storage device. “You have to thaw this one in your fingers,” she tells me. “I don’t like it.”

The record-breaking embryos had been slow-frozen and stored in a plastic vial, says Atkinson. Thawing them was a cumbersome process. But all three embryos survived it.

The Pierces had to travel from their home in Ohio to the clinic in Tennessee five times over a two-week period. “It was like a five-hour drive,” says Lindsey. One of the three embryos stopped growing. The other two were transferred to Lindsey’s uterus on November 14, she says. And one developed into a fetus.

Now that the baby has arrived, Archerd is keen to meet him. “The first thing that I noticed when Lindsey sent me his pictures is how much he looks like my daughter when she was a baby,” she says. “I pulled out my baby book and compared them side by side, and there is no doubt that they are siblings.”

She doesn’t yet have plans to meet the baby, but doing so would be “a dream come true,” she says. “I wish that they didn’t live so far away from me … He is perfect!”

“We didn’t go into it thinking we would break any records,” says Lindsey. “We just wanted to have a baby.”