Three reasons Meta will struggle with community fact-checking

Earlier this month, Mark Zuckerberg announced that Meta will cut back on its content moderation efforts and eliminate fact-checking in the US in favor of the more “democratic” approach that X (formerly Twitter) calls Community Notes, rolling back protections that he claimed had been developed only in response to media and government pressure.

The move is raising alarm bells, and rightly so. Meta has left a trail of moderation controversies in its wake, from overmoderating images of breastfeeding women to undermoderating hate speech in Myanmar, contributing to the genocide of Rohingya Muslims. Meanwhile, ending professional fact-checking creates the potential for misinformation and hate to spread unchecked.

Enlisting volunteers is how moderation started on the Internet, long before social media giants realized that centralized efforts were necessary. And volunteer moderation can be successful, allowing for the development of bespoke regulations aligned with the needs of particular communities. But without significant commitment and oversight from Meta, such a system cannot contend with how much content is shared across the company’s platforms, and how fast. In fact, the jury is still out on how well it works at X, which is used by 21% of Americans (Meta’s are significantly more popular—Facebook alone is used by 70% of Americans, according to Pew).  

Community Notes, which started in 2021 as Birdwatch, is a community-driven moderation system on X that allows users who sign up for the program to add context to posts. Having regular users provide public fact-checking is relatively new, and so far results are mixed. For example, researchers have found that participants are more likely to challenge content they disagree with politically and that flagging content as false does not reduce engagement, but they have also found that the notes are typically accurate and can help reduce the spread of misleading posts

I’m a community moderator who researches community moderation. Here’s what I’ve learned about the limitations of relying on volunteers for moderation—and what Meta needs to do to succeed: 

1. The system will miss falsehoods and could amplify hateful content

There is a real risk under this style of moderation that only posts about things that a lot of people know about will get flagged in a timely manner—or at all. Consider how a post with a picture of a death cap mushroom and the caption “Tasty” might be handled under Community Notes–style moderation. If an expert in mycology doesn’t see the post, or sees it only after it’s been widely shared, it may not get flagged as “Poisonous, do not eat”—at least not until it’s too late. Topic areas that are more esoteric will be undermoderated. This could have serious impacts on both individuals (who may eat a poisonous mushroom) and society (if a falsehood spreads widely). 

Crucially, X’s Community Notes aren’t visible to readers when they are first added. A note becomes visible to the wider user base only when enough contributors agree that it is accurate by voting for it. And not all votes count. If a note is rated only by people who tend to agree with each other, it won’t show up. X does not make a note visible until there’s agreement from people who have disagreed on previous ratings. This is an attempt to reduce bias, but it’s not foolproof. It still relies on people’s opinions about a note and not on actual facts. Often what’s needed is expertise.

I moderate a community on Reddit called r/AskHistorians. It’s a public history site with over 2 million members and is very strictly moderated. We see people get facts wrong all the time. Sometimes these are straightforward errors. But sometimes there is hateful content that takes experts to recognize. One time a question containing a Holocaust-denial dog whistle escaped review for hours and ended up amassing hundreds of upvotes before it was caught by an expert on our team. Hundreds of people—probably with very different voting patterns and very different opinions on a lot of topics—not only missed the problematic nature of the content but chose to promote it through upvotes. This happens with answers to questions, too. People who aren’t experts in history will upvote outdated, truthy-sounding answers that aren’t actually correct. Conversely, they will downvote good answers if they reflect viewpoints that are tough to swallow. 

r/AskHistorians works because most of its moderators are expert historians. If Meta wants its Community Notes–style program to work, it should  make sure that the people with the knowledge to make assessments see the posts and that expertise is accounted for in voting, especially when there’s a misalignment between common understanding and expert knowledge. 

2. It won’t work without well-supported volunteers  

Meta’s paid content moderators review the worst of the worst—including gore, sexual abuse and exploitation, and violence. As a result, many have suffered severe trauma, leading to lawsuits and unionization efforts. When Meta cuts resources from its centralized moderation efforts, it will be increasingly up to unpaid volunteers to keep the platform safe. 

Community moderators don’t have an easy job. On top of exposure to horrific content, as identifiable members of their communities, they are also often subject to harassment and abuse—something we experience daily on r/AskHistorians. However, community moderators moderate only what they can handle. For example, while I routinely manage hate speech and violent language, as a moderator of a text-based community I am rarely exposed to violent imagery. Community moderators also work as a team. If I do get exposed to something I find upsetting or if someone is being abusive, my colleagues take over and provide emotional support. I also care deeply about the community I moderate. Care for community, supportive colleagues, and self-selection all help keep volunteer moderators’ morale high(ish). 

It’s unclear how Meta’s new moderation system will be structured. If volunteers choose what content they flag, will that replicate X’s problem, where partisanship affects which posts are flagged and how? It’s also unclear what kind of support the platform will provide. If volunteers are exposed to content they find upsetting, will Meta—the company that is currently being sued for damaging the mental health of its paid content moderators—provide social and psychological aid? To be successful, the company will need to ensure that volunteers have access to such resources and are able to choose the type of content they moderate (while also ensuring that this self-selection doesn’t unduly influence the notes).    

3. It can’t work without protections and guardrails 

Online communities can thrive when they are run by people who deeply care about them. However, volunteers can’t do it all on their own. Moderation isn’t just about making decisions on what’s “true” or “false.” It’s also about identifying and responding to other kinds of harmful content. Zuckerberg’s decision is coupled with other changes to its community standards that weaken rules around hateful content in particular. Community moderation is part of a broader ecosystem, and it becomes significantly harder to do it when that ecosystem gets poisoned by toxic content. 

I started moderating r/AskHistorians in 2020 as part of a research project to learn more about the behind-the-scenes experiences of volunteer moderators. While Reddit had started addressing some of the most extreme hate on its platform by occasionally banning entire communities, many communities promoting misogyny, racism, and all other forms of bigotry were permitted to thrive and grow. As a result, my early field notes are filled with examples of extreme hate speech, as well as harassment and abuse directed at moderators. It was hard to keep up with. 

But halfway through 2020, something happened. After a milquetoast statement about racism from CEO Steve Huffman, moderators on the site shut down their communities in protest. And to its credit, the platform listened. Reddit updated its community standards to explicitly prohibit hate speech and began to enforce the policy more actively. While hate is still an issue on Reddit, I see far less now than I did in 2020 and 2021. Community moderation needs robust support because volunteers can’t do it all on their own. It’s only one tool in the box. 

If Meta wants to ensure that its users are safe from scams, exploitation, and manipulation in addition to hate, it cannot rely solely on community fact-checking. But keeping the user base safe isn’t what this decision aims to do. It’s a political move to curry favor with the new administration. Meta could create the perfect community fact-checking program, but because this decision is coupled with weakening its wider moderation practices, things are going to get worse for its users rather than better. 

Sarah Gilbert is research director for the Citizens and Technology Lab at Cornell University.

AI’s energy obsession just got a reality check

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Just a week in, the AI sector has already seen its first battle of wits under the new Trump administration. The clash stems from two key pieces of news: the announcement of the Stargate project, which would spend $500 billion—more than the Apollo space program—on new AI data centers, and the release of a powerful new model from China. Together, they raise important questions the industry needs to answer about the extent to which the race for more data centers—with their heavy environmental toll—is really necessary.

A reminder about the first piece: OpenAI, Oracle, SoftBank, and an Abu Dhabi–based investment fund called MGX plan to spend up to $500 billion opening massive data centers around the US to build better AI. Much of the groundwork for this project was laid in 2024, when OpenAI increased its lobbying spending sevenfold (which we were first to report last week) and AI companies started pushing for policies that were less about controlling problems like deepfakes and misinformation, and more about securing more energy.

Still, Trump received credit for it from tech leaders when he announced the effort on his second day in office. “I think this will be the most important project of this era,” OpenAI’s Sam Altman said at the launch event, adding, “We wouldn’t be able to do this without you, Mr. President.”

It’s an incredible sum, just slightly less than the inflation-adjusted cost of building the US highway system over the course of more than 30 years. However, not everyone sees Stargate as having the same public benefit. Environmental groups say it could strain local grids and further drive up the cost of energy for the rest of us, who aren’t guzzling it to train and deploy AI models. Previous research has also shown that data centers tend to be built in areas that use much more carbon-intensive sources of energy, like coal, than the national average. It’s not clear how much, if at all, Stargate will rely on renewable energy. 

Even louder critics of Stargate, though, include Elon Musk. None of Musk’s companies are involved in the project, and he has attempted to publicly sow doubt that OpenAI and SoftBank have enough of the money needed for the plan anyway, claims that Altman disputed on X. Musk’s decision to publicly criticize the president’s initiative has irked people in Trump’s orbit, Politico reports, but it’s not clear if those people have expressed that to Musk directly. 

On to the second piece. On the day Trump was inaugurated, a Chinese startup released an AI model that started making a whole bunch of important people in Silicon Valley very worried about their competition. (This close timing is almost certainly not an accident.)

The model, called DeepSeek R1, is a reasoning model. These types of models are designed to excel at math, logic, pattern-finding, and decision-making. DeepSeek proved it could “reason” through complicated problems as well as one of OpenAI’s reasoning models, o1—and more efficiently. What’s more, DeepSeek isn’t a super-secret project kept behind lock and key like OpenAI’s. It was released for all to see.

DeepSeek was released as the US has made outcompeting China in the AI race a top priority. This goal was a driving force behind the 2022 CHIPS Act to make more chips domestically. It’s influenced the position of tech companies like OpenAI, which has embraced lending its models to national security work and has partnered with the defense-tech company Anduril to help the military take down drones. It’s led to export controls that limit what types of chips Nvidia can sell to China.

The success of DeepSeek signals that these efforts aren’t working as well as AI leaders in the US would like (though it’s worth noting that the impact of export controls for chips isn’t felt for a few years, so the policy wouldn’t be expected to have prevented a model like DeepSeek).  

Still, the model poses a threat to the bottom line of certain players in Big Tech. Why pay for an expensive model from OpenAI when you can get access to DeepSeek for free? Even other makers of open-source models, especially Meta, are panicking about the competition, according to The Information. The company has set up a number of “war rooms” to figure out how DeepSeek was made so efficient. (A couple of days after the Stargate announcement, Meta said it would increase its own capital investments by 70% to build more AI infrastructure.)

What does this all mean for the Stargate project? Let’s think about why OpenAI and its partners are willing to spend $500 billion on data centers to begin with. They believe that AI in its various forms—not just chatbots or generative video or even new AI agents, but also developments yet to be unveiled—will be the most lucrative tool humanity has ever built. They also believe that access to powerful chips inside massive data centers is the key to getting there. 

DeepSeek poked some holes in that approach. It didn’t train on yet-unreleased chips that are light-years ahead. It didn’t, to our knowledge, require the eye-watering amounts of computing power and energy behind the models from US companies that have made headlines. Its designers made clever decisions in the name of efficiency.

In theory, it could make a project like Stargate seem less urgent and less necessary. If, in dissecting DeepSeek, AI companies discover some lessons about how to make models use existing resources more effectively, perhaps constructing more and more data centers won’t be the only winning formula for better AI. That would be welcome to the many people affected by the problems data centers can bring, like lots of emissions, the loss of fresh, drinkable water used to cool them, and the strain on local power grids. 

Thus far, DeepSeek doesn’t seem to have sparked such a change in approach. OpenAI researcher Noam Brown wrote on X, “I have no doubt that with even more compute it would be an even more powerful model.”

If his logic wins out, the players with the most computing power will win, and getting it is apparently worth at least $500 billion to AI’s biggest companies. But let’s remember—announcing it is the easiest part.


Now read the rest of The Algorithm

Deeper Learning

What’s next for robots

Many of the big questions about AI–-how it learns, how well it works, and where it should be deployed—are now applicable to robotics. In the year ahead, we will see humanoid robots being put to the test in warehouses and factories, robots learning in simulated worlds, and a rapid increase in the military’s adoption of autonomous drones, submarines, and more. 

Why it matters: Jensen Huang, the highly influential CEO of the chipmaker Nvidia, stated last month that the next advancement in AI will mean giving the technology a “body” of sorts in the physical world. This will come in the form of advanced robotics. Even with the caveat that robotics is full of futuristic promises that usually aren’t fulfilled by their deadlines, the marrying of AI methods with new advancements in robots means the field is changing quickly. Read more here.

Bits and Bytes

Leaked documents expose deep ties between Israeli army and Microsoft

Since the attacks of October 7, the Israeli military has relied heavily on cloud and AI services from Microsoft and its partner OpenAI, and the tech giant’s staff has embedded with different units to support rollout, a joint investigation reveals. (+972 Magazine)

The tech arsenal that could power Trump’s immigration crackdown

The effort by federal agencies to acquire powerful technology to identify and track migrants has been unfolding for years across multiple administrations. These technologies may be called upon more directly under President Trump. (The New York Times)

OpenAI launches Operator—an agent that can use a computer for you

Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or making an online grocery order. (MIT Technology Review)

The second wave of AI coding is here

A string of startups are racing to build models that can produce better and better software. But it’s not only AI’s increasingly powerful ability to write code that’s impressive. They claim it’s the shortest path to superintelligent AI. (MIT Technology Review)

Mice with two dads have been created using CRISPR

Mice with two fathers have been born—and have survived to adulthood—following a complex set of experiments by a team in China. 

Zhi-Kun Li at the Chinese Academy of Sciences in Beijing and his colleagues used CRISPR to create the mice, using a novel approach to target genes that normally need to be inherited from both male and female parents. They hope to use the same approach to create primates with two dads. 

Humans are off limits for now, but the work does help us better understand a strange biological phenomenon known as imprinting, which causes certain genes to be expressed differently depending on which parent they came from. For these genes, animals inherit part of a “dose” from each parent, and the two must work in harmony to create a healthy embryo. Without both doses, gene expression can go awry, and the resulting embryos can end up with abnormalities.

This is what researchers have found in previous attempts to create mice with two dads. In the 1980s, scientists in the UK tried injecting the DNA-containing nucleus of a sperm cell into a fertilized egg cell. The resulting embryos had DNA from two males (as well as a small amount of DNA from a female, in the cytoplasm of the egg).

But when these embryos were transferred to the uteruses of surrogate mouse mothers, none of them resulted in a healthy birth, seemingly because imprinted genes from both paternal and maternal genomes are needed for development. 

Li and his colleagues took a different approach. The team used gene editing to knock out imprinted genes altogether.

Around 200 of a mouse’s genes are imprinted, but Li’s team focused on 20 that are known to be important for the development of the embryo.

In an attempt to create healthy mice with DNA from two male “dads,” the team undertook a complicated set of experiments. To start, the team cultured cells with sperm DNA to collect stem cells in the lab. Then they used CRISPR to disrupt the 20 imprinted genes they were targeting.

These gene-edited cells were then injected, along with other sperm cells, into egg cells that had had their own nuclei removed. The result was embryonic cells with DNA from two male mice. These cells were then injected into a type of “embryo shell” used in research, which provides the cells required to make a placenta. The resulting embryos were transferred to the uteruses of female mice.

It worked—to some degree. Some of the embryos developed into live pups, and they even survived to adulthood. The findings were published in the journal Cell Stem Cell.

“It’s exciting,” says Kotaro Sasaki, a developmental biologist at the University of Pennsylvania, who was not involved in the work. Not only have Li and his team been able to avoid a set of imprinting defects, but their approach is the second way scientists have found to create mice using DNA from two males.

The finding builds on research by Katsuhiko Hayashi, now at Osaka University in Japan, and his colleagues. A couple of years ago, that team presented evidence that they had found a way to take cells from the tails of adult male mice and turn them into immature egg cells. These could be fertilized with sperm to create bi-paternal embryos. The mice born from those embryos can reach adulthood and have their own offspring, Hayashi has said.

Li’s team’s more complicated approach was less successful. Only a small fraction of the mice survived, for a start. The team transferred 164 gene-edited embryos, but only seven live pups were born. And those that were born weren’t entirely normal, either. They grew to be bigger than untreated mice, and their organs appeared enlarged. They didn’t live as long as normal mice, and they were infertile.

It would be unethical to do such risky research with human cells and embryos. “Editing 20 imprinted genes in humans would not be acceptable, and producing individuals who could not be healthy or viable is simply not an option,” says Li.

“There are numerous issues,” says Sasaki. For a start, a lot of the technical lab procedures the team used have not been established for human cells. But even if we had those, this approach would be dangerous—knocking out human genes could have untold health consequences. 

“There’s lots and lots of hurdles,” he says. “Human applications [are] still quite far.”

Despite that, the work might shed a little more light on the mysterious phenomenon of imprinting. Previous research has shown that mice with two moms appear smaller, and live longer than expected, while the current study shows that mice with two dads are overgrown and die more quickly. Perhaps paternal imprinted genes support growth and maternal ones limit it, and animals need both to reach a healthy size, says Sasaki.

Useful quantum computing is inevitable—and increasingly imminent

On January 8, Nvidia CEO Jensen Huang jolted the stock market by saying that practical quantum computing is still 15 to 30 years away, at the same time suggesting those computers will need Nvidia GPUs in order to implement the necessary error correction. 

However, history shows that brilliant people are not immune to making mistakes. Huang’s predictions miss the mark, both on the timeline for useful quantum computing and on the role his company’s technology will play in that future.

I’ve been closely following developments in quantum computing as an investor, and it’s clear to me that it is rapidly converging on utility. Last year, Google’s Willow device demonstrated that there is a promising pathway to scaling up to bigger and bigger computers. It showed that errors can be reduced exponentially as the number of quantum bits, or qubits, increases. It also ran a benchmark test in under five minutes that would take one of today’s fastest supercomputers 10 septillion years. While too small to be commercially useful with known algorithms, Willow shows that quantum supremacy (executing a task that is effectively impossible for any classical computer to handle in a reasonable amount of time) and fault tolerance (correcting errors faster than they are made) are achievable.

For example, PsiQuantum, a startup my company is invested in, is set to break ground on two quantum computers that will enter commercial service before the end of this decade. The plan is for each one to be 10 thousand times the size of Willow, big enough to tackle important questions about materials, drugs, and the quantum aspects of nature. These computers will not use GPUs to implement error correction. Rather, they will have custom hardware, operating at speeds that would be impossible with Nvidia hardware.

At the same time, quantum algorithms are improving far faster than hardware. A recent collaboration between the pharmaceutical giant Boehringer Ingelheim and PsiQuantum demonstrated a more than 200x improvement in algorithms to simulate important drugs and materials. Phasecraft, another company we have invested in, has improved the simulation performance for a wide variety of crystal materials and has published a quantum-enhanced version of a widely used materials science algorithm that is tantalizingly close to beating all classical implementations on existing hardware.

Advances like these lead me to believe that useful quantum computing is inevitable and increasingly imminent. And that’s good news, because the hope is that they will be able to perform calculations that no amount of AI or classical computation could ever achieve.

We should care about the prospect of useful quantum computers because today we don’t really know how to do chemistry. We lack knowledge about the mechanisms of action for many of our most important drugs. The catalysts that drive our industries are generally poorly understood, require expensive exotic materials, or both. Despite appearances, we have significant gaps in our agency over the physical world; our achievements belie the fact that we are, in many ways, stumbling around in the dark.

Nature operates on the principles of quantum mechanics. Our classical computational methods fail to accurately capture the quantum nature of reality, even though much of our high-performance computing resources are dedicated to this pursuit. Despite all the intellectual and financial capital expended, we still don’t understand why the painkiller acetaminophen works, how type-II superconductors function, or why a simple crystal of iron and nitrogen can produce a magnet with such incredible field strength. We search for compounds in Amazonian tree bark to cure cancer and other maladies, manually rummaging through a pitifully small subset of a design space encompassing 1060 small molecules. It’s more than a little embarrassing.

We do, however, have some tools to work with. In industry, density functional theory (DFT) is the workhorse of computational chemistry and materials modeling, widely used to investigate the electronic structure of many-body systems—such as atoms, molecules, and solids. When DFT is applied to systems where electron-electron correlations are weak, it produces reasonable results. But it fails entirely on a broad class of interesting problems. 

Take, for example, the buzz in the summer of 2023 around the “room-temperature superconductor” LK-99. Many accomplished chemists turned to DFT to try to characterize the material and determine whether it was, indeed, a superconductor. Results were, to put it politely, mixed—so we abandoned our best computational methods, returning to mortar and pestle to try to make some of the stuff. Sadly, although LK-99 might have many novel characteristics, a room-temperature superconductor it isn’t. That’s unfortunate, as such a material could revolutionize energy generation, transmission, and storage, not to mention magnetic confinement for fusion reactors, particle accelerators, and more.

AI will certainly help with our understanding of materials, but it is no panacea. New AI techniques have emerged in the last few years, with some promising results. DeepMind’s Graph Networks for Materials Exploration (GNoME), for example, found 380,000 new potentially stable materials. At its core, though, GNoME depends on DFT, so its performance is only as good as DFT’s ability to produce good answers. 

The fundamental issue is that an AI model is only as good as the data it’s trained on. Training an LLM on the entire internet corpus, for instance, can yield a model that has a reasonable grasp of most human culture and can process language effectively. But if DFT fails for any non-trivially correlated quantum systems, how useful can a DFT-derived training set really be? We could also turn to synthesis and experimentation to create training data, but the number of physical samples we can realistically produce is minuscule relative to the vast design space, leaving a great deal of potential untapped. Only once we have reliable quantum simulations to produce sufficiently accurate training data will we be able to create AI models that answer quantum questions on classical hardware.

And that means that we need quantum computers. They afford us the opportunity to shift from a world of discovery to a world of design. Today’s iterative process of guessing, synthesizing, and testing materials is comically inadequate.

In a few tantalizing cases, we have stumbled on materials, like superconductors, with near-magical properties. How many more might these new tools reveal in the coming years? We will eventually have machines with millions of qubits that, when used to simulate crystalline materials, open up a vast new design space. It will be like waking up one day and finding a million new elements with fascinating properties on the periodic table.

Of course, building a million-qubit quantum computer is not for the faint of heart. Such machines will be the size of supercomputers, and require large amounts of capital, cryoplant, electricity, concrete, and steel. They also require silicon photonics components that perform well beyond anything in industry, error correction hardware that runs fast enough to chase photons, and single-photon detectors with unprecedented sensitivity. But after years of research and development, and more than a billion dollars of investment, the challenge is now moving from science and engineering to construction.

It is impossible to fully predict how quantum computing will affect our world, but a thought exercise might offer a mental model of some of the possibilities. 

Imagine our world without metal. We could have wooden houses built with stone tools, agriculture, wooden plows, movable type, printing, poetry, and even thoughtfully edited science periodicals. But we would have no inkling of phenomena like electricity or electromagnetism—no motors, generators, radio, MRI machines, silicon, or AI. We wouldn’t miss them, as we’d be oblivious to their existence. 

Today, we are living in a world without quantum materials, oblivious to the unrealized potential and abundance that lie just out of sight. With large-scale quantum computers on the horizon and advancements in quantum algorithms, we are poised to shift from discovery to design, entering an era of unprecedented dynamism in chemistry, materials science, and medicine. It will be a new age of mastery over the physical world.

Peter Barrett is a general partner at Playground Global, which invests in early-stage deep-tech companies including several in quantum computing, quantum algorithms, and quantum sensing: PsiQuantum, Phasecraft, NVision, and Ideon.

The US withdrawal from the WHO will hurt us all

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

On January 20, his first day in office, US president Donald Trump signed an executive order to withdraw the US from the World Health Organization. “Ooh, that’s a big one,” he said as he was handed the document.

The US is the biggest donor to the WHO, and the loss of this income is likely to have a significant impact on the organization, which develops international health guidelines, investigates disease outbreaks, and acts as an information-sharing hub for member states.

But the US will also lose out. “It’s a very tragic and sad event that could only hurt the United States in the long run,” says William Moss, an epidemiologist at Johns Hopkins Bloomberg School of Public Health in Baltimore.

Trump appears to take issue with the amount the US donates to the WHO. He points out that it makes a much bigger contribution than China, a country with a population four times that of the US. “It seems a little unfair to me,” he said as he prepared to sign the executive order.

It is true that the US is far and away the biggest financial supporter of the WHO. The US contributed $1.28 billion over the two-year period covering 2022 and 2023. By comparison, the second-largest donor, Germany, contributed $856 million in the same period. The US currently contributes 14.5% of the WHO’s total budget.

But it’s not as though the WHO sends a billion-dollar bill to the US. All member states are required to pay membership dues, which are calculated as a percentage of a country’s gross domestic product. For the US, this figure comes to $130 million. China pays $87.6 million. But the vast majority of the US’s contributions to the WHO are made on a voluntary basis—in recent years, the donations have been part of multibillion-dollar spending on global health by the US government. (Separately, the Bill and Melinda Gates Foundation contributed $830 million over 2022 and 2023.)

There’s a possibility that other member nations will increase their donations to help cover the shortfall left by the US’s withdrawal. But it is not clear who will step up—or what implications it will have to change the structure of donations.

Martin McKee, a professor of European public health at the London School of Hygiene and Tropical Medicine, thinks it is unlikely that European members will increase their contributions by much. China, India, Brazil, South Africa, and the Gulf states, on the other hand, may be more likely to pay more. But again, it isn’t clear how this will pan out, or whether any of these countries will expect greater influence over global health policy decisions as a result of increasing their donations.

WHO funds are spent on a range of global health projects—programs to eradicate polio, rapidly respond to health emergencies, improve access to vaccines and medicines, develop pandemic prevention strategies, and more. The loss of US funding is likely to have a significant impact on at least some of these programs.

“Diseases don’t stick to national boundaries, hence this decision is not only concerning for the US, but in fact for every country in the world,” says Pauline Scheelbeek at the London School of Hygiene and Tropical Medicine.“With the US no longer reporting to the WHO nor funding part of this process, the evidence on which public health interventions and solutions should be based is incomplete.”

“It’s going to hurt global health,” adds Moss. “It’s going to come back to bite us.”

There’s more on how the withdrawal could affect health programs, vaccine coverage, and pandemic preparedness in this week’s coverage.


Now read the rest of The Checkup

Read more from MIT Technology Review‘s archive

This isn’t the first time Donald Trump has signaled his desire for the US to leave the WHO. He proposed a withdrawal during his last term, in 2020. While the WHO is not perfect, it needs more power and funding, not less, Charles Kenny, director of technology and development at the Center for Global Development, argued at the time.

The move drew condemnation from those working in public health then, too. The editor in chief of the medical journal The Lancet called it “a crime against humanity,” as Charlotte Jee reported.

In 1974, the WHO launched an ambitious program to get lifesaving vaccines to all children around the world. Fifty years on, vaccines are thought to have averted 154 million deaths—including 146 million in children under the age of five. 

The WHO has also seen huge success in its efforts to eradicate polio. Today, wild forms of the virus have been eradicated in all but two countries. But vaccine-derived forms of the virus can still crop up around the world.

At the end of a round of discussions in September among WHO member states working on a pandemic agreement, director-general Tedros Adhanom Ghebreyesus remarked, “The next pandemic will not wait for us, whether from a flu virus like H5N1, another coronavirus, or another family of viruses we don’t yet know about.” The H5N1 virus has been circulating on US dairy farms for months now, and the US is preparing for potential human outbreaks.

From around the web

People with cancer paid $45,000 for an experimental blood-filtering treatment, delivered at a clinic in Antigua, after being misled about its effectiveness. Six of them have died since their treatments. (The New York Times)

The Trump administration has instructed federal health agencies to pause all external communications, such as health advisories, weekly scientific reports, updates to websites, and social media posts. (The Washington Post)

A new “virtual retina,” modeled on human retinas, has been developed to study the impact of retinal implants. The three-dimensional model simulates over 10,000 neurons. (Brain Stimulation)

Trump has signed an executive order stating that “it is the policy of the United States to recognize two sexes, male and female.” The document “defies decades of research into how human bodies grow and develop,” STAT reports, and represents “a dramatic failure to understand biology,” according to a neuroscientist who studies the development of sex. (STAT)

Attention, summer holiday planners: Biting sandflies in the Mediterranean region are transmitting Toscana virus at an increasing rate. The virus is a major cause of central nervous system disorders in the region. Italy saw a 2.6-fold increase in the number of reported infections between the 2016–21 period and 2022–23. (Eurosurveillance)

How a top Chinese AI model overcame US sanctions

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model. 

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost. 

“This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South,” says Hancheng Cao, an assistant professor in information systems at Emory University.

DeepSeek’s success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.

To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University. 

DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a “chain of thought” approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.

Dimitris Papailiopoulos, principal researcher at Microsoft’s AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness,” he says.

DeepSeek has also released six smaller versions of R1 that are small enough to  run locally on laptops. It claims that one of them even outperforms OpenAI’s o1-mini on certain benchmarks.“DeepSeek has largely replicated o1-mini and has open sourced it,” tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review’s request for comments.

Despite the buzz around R1, DeepSeek remains relatively unknown. Based in Hangzhou, China, it was founded in July 2023 by Liang Wenfeng, an alumnus of Zhejiang University with a background in information and electronic engineering. It was incubated by High-Flyer, a hedge fund that Liang founded in 2015. Like Sam Altman of OpenAI, Liang aims to build artificial general intelligence (AGI), a form of AI that can match or even beat humans on a range of tasks.

Training large language models (LLMs) requires a team of highly trained researchers and substantial computing power. In a recent interview with the Chinese media outlet LatePost, Kai-Fu Lee, a veteran entrepreneur and former head of Google China, said that only “front-row players” typically engage in building foundation models such as ChatGPT, as it’s so resource-intensive. The situation is further complicated by the US export controls on advanced semiconductors. High-Flyer’s decision to venture into AI is directly related to these constraints, however. Long before the anticipated sanctions, Liang acquired a substantial stockpile of Nvidia A100 chips, a type now banned from export to China. The Chinese media outlet 36Kr estimates that the company has over 10,000 units in stock, but Dylan Patel, founder of the AI research consultancy SemiAnalysis, estimates that it has at least 50,000. Recognizing the potential of this stockpile for AI training is what led Liang to establish DeepSeek, which was able to use them in combination with the lower-power chips to develop its models. 

Tech giants like Alibaba and ByteDance, as well as a handful of startups with deep-pocketed investors, dominate the Chinese AI space, making it challenging for small or medium-sized enterprises to compete. A company like DeepSeek, which has no plans to raise funds, is rare. 

Zihan Wang, the former DeepSeek employee, told MIT Technology Review that he had access to abundant computing resources and was given freedom to experiment when working at DeepSeek, “a luxury that few fresh graduates would get at any company.” 

In an interview with the Chinese media outlet 36Kr in July 2024 Liang said that an additional challenge Chinese companies face on top of chip sanctions, is that their AI engineering techniques tend to be less efficient. “We [most Chinese companies] have to consume twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps,” he said.  

But DeepSeek found ways to reduce memory usage and speed up calculation without significantly sacrificing accuracy. “The team loves turning a hardware challenge into an opportunity for innovation,” says Wang.

Liang himself remains deeply involved in DeepSeek’s research process, running experiments alongside his team. “The whole team shares a collaborative culture and dedication to hardcore research,” Wang says.

As well as prioritizing efficiency, Chinese companies are increasingly embracing open-source principles. Alibaba Cloud has released over 100 new open-source AI models, supporting 29 languages and catering to various applications, including coding and mathematics. Similarly, startups like Minimax and 01.AI have open-sourced their models. 

According to a white paper released last year by the China Academy of Information and Communications Technology, a state-affiliated research institute, the number of AI large language models worldwide has reached 1,328, with 36% originating in China. This positions China as the second-largest contributor to AI, behind the United States. 

“This generation of young Chinese researchers identify strongly with open-source culture because they benefit so much from it,” says Thomas Qitong Cao, an assistant professor of technology policy at Tufts University.

“The US export control has essentially backed Chinese companies into a corner where they have to be far more efficient with their limited computing resources,” says Matt Sheehan, an AI researcher at the Carnegie Endowment for International Peace. “We are probably going to see a lot of consolidation in the future related to the lack of compute.”

That might already have started to happen. Two weeks ago, Alibaba Cloud announced that it has partnered with the Beijing-based startup 01.AI, founded by Kai-Fu Lee, to merge research teams and establish an “industrial large model laboratory.”

“It is energy-efficient and natural for some kind of division of labor to emerge in the AI industry,” says Cao, the Tufts professor. “The rapid evolution of AI demands agility from Chinese firms to survive.”

Why the next energy race is for underground hydrogen

It might sound like something straight out of the 19th century, but one of the most cutting-edge areas in energy today involves drilling deep underground to hunt for materials that can be burned for energy. The difference is that this time, instead of looking for fossil fuels, the race is on to find natural deposits of hydrogen.

Hydrogen is already a key ingredient in the chemical industry and could be used as a greener fuel in industries from aviation and transoceanic shipping to steelmaking. Today, the gas needs to be manufactured, but there’s some evidence that there are vast deposits underground.

I’ve been thinking about underground resources a lot this week, since I’ve been reporting a story about a new startup, Addis Energy. The company is looking to use subsurface rocks, and the conditions down there, to produce another useful chemical: ammonia. In an age of lab-produced breakthroughs, it feels like something of a regression to go digging for resources, but looking underground could help meet energy demand while also addressing climate change.

It’s rare that hydrogen turns up in oil and gas operations, and for decades, the conventional wisdom has been that there aren’t large deposits of the gas underground. Hydrogen molecules are tiny, after all, so even if the gas was forming there, the assumption was that it would just leak out.

However, there have been somewhat accidental discoveries of hydrogen over the decades, in abandoned mines or new well sites. There are reports of wells that spewed colorless gas, or flames that burned gold. And as people have looked more intentionally for hydrogen, they’ve started to find it.

As it turns out, hydrogen tends to build up in very different rocks from those that host oil and gas deposits. While fossil-fuel prospecting tends to focus on softer rocks, like organic-rich shale, hydrogen seems most plentiful in iron-rich rocks like olivine. The gas forms when chemical reactions at elevated temperature and pressure underground pull water apart. (There’s also likely another mechanism that forms hydrogen underground, called radiolysis, where radioactive elements emit radiation that can split water.)

Some research has put the potential amount of hydrogen available at around a trillion tons—plenty to feed our demand for centuries, even if we ramp up use of the gas.

The past few years have seen companies spring up around the world to try to locate and tap these resources. There’s an influx in Australia, especially the southern part of the country, which seems to have conditions that are good for making hydrogen. One startup, Koloma, has raised over $350 million to aid its geologic hydrogen exploration.

There are so many open questions for this industry, including how much hydrogen is actually going to be accessible and economical to extract. It’s not even clear how best to look for the gas today; researchers and companies are borrowing techniques and tools from the oil and gas industry, but there could be better ways.

It’s also unknown how this could affect climate change. Hydrogen itself may not warm the planet, but it can contribute indirectly to global warming by extending the lifetime of other greenhouse gases. It’s also often found with methane, a super-powerful greenhouse gas that could do major harm if it leaks out of operations at a significant level.

There’s also the issue of transportation: Hydrogen isn’t very dense, and it can be difficult to store and move around. Deposits that are far away from the final customers could face high costs that might make the whole endeavor uneconomical.  

But this whole area is incredibly exciting, and researchers are working to better understand it. Some are looking to expand the potential pool of resources by pumping water underground to stimulate hydrogen production from rocks that wouldn’t naturally produce the gas.

There’s something fascinating to me about using the playbook of the oil and gas industry to develop an energy source that could actually help humanity combat climate change. It could be a strategic move to address energy demand, since a lot of expertise has accumulated over the roughly 150 years that we’ve been digging up fossil fuels.

After all, it’s not digging that’s the problem—it’s emissions.


Now read the rest of The Spark

Related reading

This story from Science, published in 2023, is a great deep dive into the world of so-called “gold hydrogen.” Give it a read for more on the history and geology here.

For more on commercial efforts, specifically Koloma, give this piece from Canary Media a read.   

And for all the details on geologic ammonia and Addis Energy, check out my latest story here.

Another thing

Donald Trump officially took office on Monday and signed a flurry of executive orders. Here are a few of the most significant ones for climate:  

Trump announced his intention to once again withdraw from the Paris agreement. After a one-year waiting period, the world’s largest economy will officially leave the major international climate treaty. (New York Times)

The president also signed an order that pauses lease sales for offshore wind power projects in federal waters. It’s not clear how much the office will be able to slow projects that already have their federal permits. (Associated Press)

Another executive order, titled “Unleashing American Energy,” broadly signals a wide range of climate and energy moves. 
→ One section ends the “EV mandate.” The US government doesn’t have any mandates around EVs, but this bit is a signal of the administration’s intent to roll back policies and funding that support adoption of these vehicles. There will almost certainly be court battles. (Wired)
Another section pauses the disbursement of tens of billions of dollars for climate and energy. The spending was designated by Congress in two of the landmark laws from the Biden administration, the Bipartisan Infrastructure Law and the Inflation Reduction Act. Again, experts say we can likely expect legal fights. (Canary Media)

Keeping up with climate

The Chinese automaker BYD built more electric vehicles in 2024 than Tesla did. The data signals a global shift to cheaper EVs and the continued dominance of China in the EV market. (Washington Post)

A pair of nuclear reactors in South Carolina could get a second chance at life. Construction halted at the VC Summer plant in 2017, $9 billion into the project. Now the site’s owner wants to sell. (Wall Street Journal)

→ Existing reactors are more in-demand than ever, as I covered in this story about what’s next for nuclear power. (MIT Technology Review)

In California, charging depots for electric trucks are increasingly choosing to cobble together their own power rather than waiting years to connect to the grid. These solar- and wind-powered microgrids could help handle broader electricity demand. (Canary Media)

Wildfires in Southern California are challenging even wildlife that have adapted to frequent blazes. As fires become more frequent and intense, biologists worry about animals like mountain lions. (Inside Climate News)

Experts warn that ash from the California wildfires could be toxic, containing materials like lead and arsenic. (Associated Press)

Burning wood for power isn’t necessary to help the UK meet its decarbonization goals, according to a new analysis. Biomass is a controversial green power source that critics say contributes to air pollution and harms forests. (The Guardian

This is what might happen if the US withdraws from the WHO

On January 20, his first day in office, US president Donald Trump signed an executive order to withdraw the US from the World Health Organization. “Ooh, that’s a big one,” he said as he was handed the document.

The US is the biggest donor to the WHO, and the loss of this income is likely to have a significant impact on the organization, which develops international health guidelines, investigates disease outbreaks, and acts as an information-sharing hub for member states.

But the US will also lose out. “It’s a very tragic and sad event that could only hurt the United States in the long run,” says William Moss, an epidemiologist at Johns Hopkins Bloomberg School of Public Health in Baltimore.

A little unfair?

Trump appears to take issue with the amount the US donates to the WHO. He points out that it makes a much bigger contribution than China, a country with a population four times that of the US. “It seems a little unfair to me,” he said as he prepared to sign the executive order.

It is true that the US is far and away the biggest financial supporter of the WHO. The US contributed $1.28 billion over the two-year period covering 2022 and 2023. By comparison, the second-largest donor, Germany, contributed $856 million in the same period. The US currently contributes 14.5% of the WHO’s total budget.

But it’s not as though the WHO sends a billion-dollar bill to the US. All member states are required to pay membership dues, which are calculated as a percentage of a country’s gross domestic product. For the US, this figure comes to $130 million. China pays $87.6 million. But the vast majority of the US’s contributions to the WHO are made on a voluntary basis—in recent years, the donations have been part of multibillion-dollar spending on global health by the US government. (Separately, the Bill and Melinda Gates Foundation contributed $830 million over 2022 and 2023.)

There’s a possibility that other member nations will increase their donations to help cover the shortfall left by the US’s withdrawal. But it is not clear who will step up—or what implications changing the structure of donations will have.

Martin McKee, a professor of European public health at the London School of Hygiene and Tropical Medicine, thinks it is unlikely that European members will increase their contributions by much. The Gulf states, China, India, Brazil, and South Africa, on the other hand, may be more likely to pay more. But again, it isn’t clear how this will pan out, or whether any of these countries will expect greater influence over global health policy decisions as a result of increasing their donations.

Deep impacts

WHO funds are spent on a range of global health projects—programs to eradicate polio, rapidly respond to health emergencies, improve access to vaccines and medicines, develop pandemic prevention strategies, and more. The loss of US funding is likely to have a significant impact on at least some of these programs.

It is not clear which programs will lose funding, or when they will be affected. The US is required to give 12 months’ notice to withdraw its membership, but voluntary contributions might stop before that time is up. 

For the last few years, WHO member states have been negotiating a pandemic agreement designed to improve collaboration on preparing for future pandemics. The agreement is set to be finalized in 2025. But these discussions will be disrupted by the US withdrawal, says McKee. It will “create confusion about how effective any agreement will be and what it will look like,” he says.

The agreement itself won’t make as big an impact without the US as a signatory, either, says Moss, who is also a member of a WHO vaccine advisory committee. The US would not be held to information-sharing standards that other countries could benefit from, and it might not be privy to important health information from other member nations. The global community might also lose out on the US’s resources and expertise. “Having a major country like the United States not be a part of that really undermines the value of any pandemic agreement,” he says.

McKee thinks that the loss of funding will also affect efforts to eradicate polio, and to control outbreaks of mpox in the Democratic Republic of Congo, Uganda, and Burundi, which continue to report hundreds of cases per week. The virus “has the potential to spread, including to the US,” he points out.

“Diseases don’t stick to national boundaries, hence this decision is not only concerning for the US, but in fact for every country in the world,” says Pauline Scheelbeek at the London School of Hygiene and Tropical Medicine. “With the US no longer reporting to the WHO nor funding part of this process, the evidence on which public health interventions and solutions should be based is incomplete.”

Moss is concerned about the potential for outbreaks of vaccine-preventable diseases. Robert F. Kennedy Jr., Trump’s pick to lead the Department of Health and Human Services, is a prominent antivaccine advocate, and Moss worries about potential changes to vaccination-based health policies in the US. That, combined with a weakening of the WHO’s ability to control outbreaks, could be a “double whammy,” he says: “We’re setting ourselves up for large measles disease outbreaks in the United States.”

At the same time, the US is up against another growing threat to public health: the circulation of bird flu on poultry and dairy farms. The US has seen outbreaks of the H5N1 virus on poultry farms in all states, and the virus has been detected in 928 dairy herds across 16 states, according to the US Centers for Disease Control and Prevention. There have been 67 reported human cases in the US, and one person has died. While we don’t yet have evidence that the virus can spread between people, the US and other countries are already preparing for potential outbreaks.

But this preparation relies on a thorough and clear understanding of what is happening on the ground. The WHO provides an important role in information sharing—countries report early signs of outbreaks to the agency, which then shares the information with its members. This kind of information not only allows countries to develop strategies to limit the spread of disease but can also allow them to share genetic sequences of viruses and develop vaccines. Member nations need to know what’s happening in the US, and the US needs to know what’s happening globally. “Both of those channels of communication would be hindered by this,” says Moss.

As if all of that weren’t enough, the US also stands to suffer in terms of its reputation as a leader in global public health. “By saying to the world ‘We don’t care about your health,’ it sends a message that is likely to reflect badly on it,” says McKee. “It’s a classic lose-lose situation.”

“It’s going to hurt global health,” says Moss. “It’s going to come back to bite us.”

Update: this article was amended to include commentary from Pauline Scheelbeek.

OpenAI launches Operator—an agent that can use a computer for you

After weeks of buzz, OpenAI has released Operator, its first AI agent. Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or filling an online grocery order. The app is powered by a new model called Computer-Using Agent—CUA (“coo-ah”), for short—built on top of OpenAI’s multimodal large language model GPT-4o.

Operator is available today at operator.chatgpt.com to people in the US signed up with ChatGPT Pro, OpenAI’s premium $200-a-month service. The company says it plans to roll the tool out to other users in the future.

OpenAI claims that Operator outperforms similar rival tools, including Anthropic’s Computer Use (a version of Claude 3.5 Sonnet that can carry out simple tasks on a computer) and Google DeepMind’s Mariner (a web-browsing agent built on top of Gemini 2.0).

The fact that three of the world’s top AI firms have converged on the same vision of what agent-based models could be makes one thing clear. The battle for AI supremacy has a new frontier—and it’s our computer screens.

“Moving from generating text and images to doing things is the right direction,” says Ali Farhadi, CEO of the Allen Institute for AI (AI2). “It unlocks business, solves new problems.”

Farhadi thinks that doing things on a computer screen is a natural first step for agents: “It is constrained enough that the current state of the technology can actually work,” he says. “At the same time, it’s impactful enough that people might use it.” (AI2 is working on its own computer-using agent, says Farhadi.)

Don’t believe the hype

OpenAI’s announcement also confirms one of two rumors that circled the internet this week. One predicted that OpenAI was about to reveal an agent-based app, after details about Operator were leaked on social media ahead of its release. The other predicted that OpenAI was about to reveal a new superintelligence—and that officials for newly inaugurated President Trump would be briefed on it.

Could the two rumors be linked? OpenAI superfans wanted to know.

Nope. OpenAI gave MIT Technology Review a preview of Operator in action yesterday. The tool is an exciting glimpse of large language models’ potential to do a lot more than answer questions. But Operator is an experimental work in progress. “It’s still early, it still makes mistakes,” says Yash Kumar, a researcher at OpenAI.

(As for the wild superintelligence rumors, let’s leave that to OpenAI CEO Sam Altman to address: “twitter hype is out of control again,” he posted on January 20. “pls chill and cut your expectations 100x!”)

Like Anthropic’s Computer Use and Google DeepMind’s Mariner, Operator takes screenshots of a computer screen and scans the pixels to figure out what actions it can take. CUA, the model behind it, is trained to interact with the same graphical user interfaces—buttons, text boxes, menus—that people use when they do things online. It scans the screen, takes an action, scans the screen again, takes another action, and so on. That lets the model carry out tasks on most websites that a person can use.

“Traditionally the way models have used software is through specialized APIs,” says Reiichiro Nakano, a scientist at OpenAI. (An API, or application programming interface, is a piece of code that acts as a kind of connector, allowing different bits of software to be hooked up to one another.) That puts a lot of apps and most websites off limits, he says: “But if you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible.”

CUA also breaks tasks down into smaller steps and tries to work through them one by one, backtracking when it gets stuck. OpenAI says CUA was trained with techniques similar to those used for its so-called reasoning models, o1 and o3. 

Operator can be instructed to search for campsites in Yosemite with good picnic tables.
OPENAI

OpenAI has tested CUA against a number of industry benchmarks designed to assess the ability of an agent to carry out tasks on a computer. The company claims that its model beats Computer Use and Mariner in all of them.

For example, on OSWorld, which tests how well an agent performs tasks such as merging PDF files or manipulating an image, CUA scores 38.1% to Computer Use’s 22.0%  In comparison, humans score 72.4%. On a benchmark called WebVoyager, which tests how well an agent performs tasks in a browser, CUA scores 87%, Mariner 83.5%, and Computer Use 56%. (Mariner can only carry out tasks in a browser and therefore does not score on OSWorld.)

For now, Operator can also only carry out tasks in a browser. OpenAI plans to make CUA’s wider abilities available in the future via an API that other developers can use to build their own apps. This is how Anthropic released Computer Use in December.

OpenAI says it has tested CUA’s safety, using red teams to explore what happens when users ask it to do unacceptable tasks (such as research how to make a bioweapon), when websites contain hidden instructions designed to derail it, and when the model itself breaks down. “We’ve trained the model to stop and ask the user for information before doing anything with external side effects,” says Casey Chu, another researcher on the team.

Look! No hands

To use Operator, you simply type instructions into a text box. But instead of calling up the browser on your computer, Operator sends your instructions to a remote browser running on an OpenAI server. OpenAI claims that this makes the system more efficient. It’s another key difference between Operator, Computer Use and Mariner (which runs inside Google’s Chrome browser on your own computer).

Because it’s running in the cloud, Operator can carry out multiple tasks at once, says Kumar. In the live demo, he asked Operator to use OpenTable to book him a table for two at 6.30 p.m. at a restaurant called Octavia in San Francisco. Straight away, Operator opened up OpenTable and started clicking through options. “As you can see, my hands are off the keyboard,” he said.

OpenAI is collaborating with a number of businesses, including OpenTable, StubHub, Instacart, DoorDash, and Uber. The nature of those collaborations is not exactly clear, but Operator appears to suggest preset websites to use for certain tasks.

While the tool navigated dropdowns on OpenTable, Kumar sent Operator off to find four tickets for a Kendrick Lamar show on StubHub. While it did that, he pasted a photo of a handwritten shopping list and asked Operator to add the items to his Instacart.

He waited, flicking between Operator’s tabs. “If it needs help or if it needs confirmations, it’ll come back to you with questions and you can answer it,” he said.

Kumar says he has been using Operator at home. It helps him stay on top of grocery shopping: “I can just quickly click a photo of a list and send it to work,” he says.

It’s also become a sidekick in his personal life. “I have a date night every Thursday,” says Kumar. So every Thursday morning, he instructs Operator to send him a list of five restaurants that have a table for two that evening. “Of course, I could do that, but it takes me 10 minutes,” he says. “And I often forget to do it. With Operator, I can run the task with one click. There’s no burden of booking.”

What’s next for robots

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

Jan Liphardt teaches bioengineering at Stanford, but to many strangers in Los Altos, California, he is a peculiar man they see walking a four-legged robotic dog down the street. 

Liphardt has been experimenting with building and modifying robots for years, and when he brings his “dog” out in public, he generally gets one of three reactions. Young children want to have one, their parents are creeped out, and baby boomers try to ignore it. “They’ll quickly walk by,” he says, “like, ‘What kind of dumb new stuff is going on here?’” 

In the many conversations I’ve had about robots, I’ve also found that most people tend to fall into these three camps, though I don’t see such a neat age division. Some are upbeat and vocally hopeful that a future is just around the corner in which machines can expertly handle much of what is currently done by humans, from cooking to surgery. Others are scared: of job losses, injuries, and whatever problems may come up as we try to live side by side. 

The final camp, which I think is the largest, is just unimpressed. We’ve been sold lots of promises that robots will transform society ever since the first robotic arm was installed on an assembly line at a General Motors plant in New Jersey in 1961. Few of those promises have panned out so far. 

But this year, there’s reason to think that even those staunchly in the “bored” camp will be intrigued by what’s happening in the robot races. Here’s a glimpse at what to keep an eye on. 

Humanoids are put to the test

The race to build humanoid robots is motivated by the idea that the world is set up for the human form, and that automating that form could mean a seismic shift for robotics. It is led by some particularly outspoken and optimistic entrepreneurs, including Brett Adcock, the founder of Figure AI, a company making such robots that’s valued at more than $2.6 billion (it’s begun testing its robots with BMW). Adcock recently told Time, “Eventually, physical labor will be optional.” Elon Musk, whose company Tesla is building a version called Optimus, has said humanoid robots will create “a future where there is no poverty.” A robotics company called Eliza Wakes Up is taking preorders for a $420,000 humanoid called, yes, Eliza.

In June 2024, Agility Robotics sent a fleet of its Digit humanoid robots to GXO Logistics, which moves products for companies ranging from Nike to Nestlé. The humanoids can handle most tasks that involve picking things up and moving them somewhere else, like unloading pallets or putting boxes on a conveyor. 

There have been hiccups: Highly polished concrete floors can cause robots to slip at first, and buildings need good Wi-Fi coverage for the robots to keep functioning. But charging is a bigger issue. Agility’s current version of Digit, with a 39-pound battery, can run for two to four hours before it needs to charge for one hour, so swapping out the robots for fresh ones is a common task on each shift. If there are a small number of charging docks installed, the robots can theoretically charge by shuffling among the docks themselves overnight when some facilities aren’t running, but moving around on their own can set off a building’s security system. “It’s a problem,” says CTO Melonee Wise.

Wise is cautious about whether humanoids will be widely adopted in workplaces. “I’ve always been a pessimist,” she says. That’s because getting robots to work well in a lab is one thing, but integrating them into a bustling warehouse full of people and forklifts moving goods on tight deadlines is another task entirely.

If 2024 was the year of unsettling humanoid product launch videos, this year we will see those humanoids put to the test, and we’ll find out whether they’ll be as productive for paying customers as promised. Now that Agility’s robots have been deployed in fast-paced customer facilities, it’s clear that small problems can really add up. 

Then there are issues with how robots and humans share spaces. In the GXO facility the two work in completely separate areas, Wise says, but there are cases where, for example, a human worker might accidentally leave something obstructing a charging station. That means Agility’s robots can’t return to the dock to charge, so they need to alert a human employee to move the obstruction out of the way, slowing operations down.  

It’s often said that robots don’t call out sick or need health care. But this year, as fleets of humanoids arrive on the job, we’ll begin to find out the limitations they do have.

Learning from imagination

The way we teach robots how to do things is changing rapidly. It used to be necessary to break their tasks down into steps with specifically coded instructions, but now, thanks to AI, those instructions can be gleaned from observation. Just as ChatGPT was taught to write through exposure to trillions of sentences rather than by explicitly learning the rules of grammar, robots are learning through videos and demonstrations. 

That poses a big question: Where do you get all these videos and demonstrations for robots to learn from?

Nvidia, the world’s most valuable company, has long aimed to meet that need with simulated worlds, drawing on its roots in the video-game industry. It creates worlds in which roboticists can expose digital replicas of their robots to new environments to learn. A self-driving car can drive millions of virtual miles, or a factory robot can learn how to navigate in different lighting conditions.

In December, the company went a step further, releasing what it’s calling a “world foundation model.” Called Cosmos, the model has learned from 20 million hours of video—the equivalent of watching YouTube nonstop since Rome was at war with Carthage—that can be used to generate synthetic training data.

Here’s an example of how this model could help in practice. Imagine you run a robotics company that wants to build a humanoid that cleans up hospitals. You can start building this robot’s “brain” with a model from Nvidia, which will give it a basic understanding of physics and how the world works, but then you need to help it figure out the specifics of how hospitals work. You could go out and take videos and images of the insides of hospitals, or pay people to wear sensors and cameras while they go about their work there.

“But those are expensive to create and time consuming, so you can only do a limited number of them,” says Rev Lebaredian, vice president of simulation technologies at Nvidia. Cosmos can instead take a handful of those examples and create a three-dimensional simulation of a hospital. It will then start making changes—different floor colors, different sizes of hospital beds—and create slightly different environments. “You’ll multiply that data that you captured in the real world millions of times,” Lebaredian says. In the process, the model will be fine-tuned to work well in that specific hospital setting. 

It’s sort of like learning both from your experiences in the real world and from your own imagination (stipulating that your imagination is still bound by the rules of physics). 

Teaching robots through AI and simulations isn’t new, but it’s going to become much cheaper and more powerful in the years to come. 

A smarter brain gets a smarter body

Plenty of progress in robotics has to do with improving the way a robot senses and plans what to do—its “brain,” in other words. Those advancements can often happen faster than those that improve a robot’s “body,” which determine how well a robot can move through the physical world, especially in environments that are more chaotic and unpredictable than controlled assembly lines. 

The military has always been keen on changing that and expanding the boundaries of what’s physically possible. The US Navy has been testing machines from a company called Gecko Robotics that can navigate up vertical walls (using magnets) to do things like infrastructure inspections, checking for cracks, flaws, and bad welding on aircraft carriers. 

There are also investments being made for the battlefield. While nimble and affordable drones have reshaped rural battlefields in Ukraine, new efforts are underway to bring those drone capabilities indoors. The defense manufacturer Xtend received an $8.8 million contract from the Pentagon in December 2024 for its drones, which can navigate in confined indoor spaces and urban environments. These so-called “loitering munitions” are one-way attack drones carrying explosives that detonate on impact.

“These systems are designed to overcome challenges like confined spaces, unpredictable layouts, and GPS-denied zones,” says Rubi Liani, cofounder and CTO at Xtend. Deliveries to the Pentagon should begin in the first few months of this year. 

Another initiative—sparked in part by the Replicator project, the Pentagon’s plan to spend more than $1 billion on small unmanned vehicles—aims to develop more autonomously controlled submarines and surface vehicles. This is particularly of interest as the Department of Defense focuses increasingly on the possibility of a future conflict in the Pacific between China and Taiwan. In such a conflict, the drones that have dominated the war in Ukraine would serve little use because battles would be waged almost entirely at sea, where small aerial drones would be limited by their range. Instead, undersea drones would play a larger role.

All these changes, taken together, point toward a future where robots are more flexible in how they learn, where they work, and how they move. 

Jan Liphardt from Stanford thinks the next frontier of this transformation will hinge on the ability to instruct robots through speech. Large language models’ ability to understand and generate text has already made them a sort of translator between Liphardt and his robot.

“We can take one of our quadrupeds and we can tell it, ‘Hey, you’re a dog,’ and the thing wants to sniff you and tries to bark,” he says. “Then we do one word change—‘You’re a cat.’ Then the thing meows and, you know, runs away from dogs. And we haven’t changed a single line of code.”

Correction: A previous version of this story incorrectly stated that the robotics company Eliza Wakes Up has ties to a16z.