Four bright spots in climate news in 2025

Climate news hasn’t been great in 2025. Global greenhouse-gas emissions hit record highs (again). This year is set to be either the second or third warmest on record. Climate-fueled disasters like wildfires in California and flooding in Indonesia and Pakistan devastated communities and caused billions in damage.

In addition to these worrying indicators of our continued contributions to climate change and their obvious effects, the world’s largest economy has made a sharp U-turn on climate policy this year. The US under the Trump administration withdrew from the Paris Agreement, cut funds for climate research, and scrapped billions of dollars in funding for climate tech projects.

We’re in a severe situation with climate change. But for those looking for bright spots, there was some good news in 2025. Here are a few of the positive stories our climate reporters noticed this year.

China’s flattening emissions

Solar panels field on hillside

GETTY IMAGES

One of the most notable and encouraging signs of progress this year occurred in China. The world’s second-biggest economy and biggest climate polluter has managed to keep carbon dioxide emissions flat for the last year and a half, according to an analysis in Carbon Brief.

That’s happened before, but only when the nation’s economy was retracting, including in the midst of the covid-19 pandemic. But emissions are now falling even as China’s economy is on track to grow about 5% this year, and electricity demands continue to rise.

So what’s changed? China has now installed so much solar and wind, and put so many EVs on the road, that its economy can continue to expand without increasing the amount of carbon dioxide it’s pumping into the atmosphere, decoupling the traditional link between emissions and growth.

Specifically, China added an astounding 240 gigawatts of solar power capacity and 61 gigawatts of wind power in the first nine months of the year, the Carbon Brief analysis noted. That’s nearly as much solar power as the US has installed in total, in just the first three quarters of this year.

It’s too early to say China’s emissions have peaked, but the country has said it will officially reach that benchmark before 2030.

To be clear, China still isn’t moving fast enough to keep the world on track for meeting relatively safe temperature targets. (Indeed, very few countries are.) But it’s now both producing most of the world’s clean energy technologies and curbing its emissions growth, providing a model for cleaning up industrial economies without sacrificing economic prosperity—and setting the stage for faster climate progress in the coming years.

Batteries on the grid

looking down a row on battery storage units on an overcast day

AP PHOTO/SAM HODDE

It’s hard to articulate just how quickly batteries for grid storage are coming online. These massive arrays of cells can soak up electricity when sources like solar are available and prices are low, and then discharge power back to the grid when it’s needed most.

Back in 2015, the battery storage industry had installed only a fraction of a gigawatt of battery storage capacity across the US. That year, it set a seemingly bold target of adding 35 gigawatts by 2035. The sector passed that goal a decade early this year and then hit 40 gigawatts a couple of months later. 

Costs are still falling, which could help maintain the momentum for the technology’s deployment. This year, battery prices for EVs and stationary storage fell yet again, reaching a record low, according to data from BloombergNEF. Battery packs specifically used for grid storage saw prices fall even faster than the average; they cost 45% less than last year.

We’re starting to see what happens on grids with lots of battery capacity, too: in California and Texas, batteries are already helping meet demand in the evenings, reducing the need to run natural-gas plants. The result: a cleaner, more stable grid.

AI’s energy funding influx

Aerial view of a large Google Data Centre being built in Cheshunt, Hertfordshire, UK

GETTY IMAGES

The AI boom is complicated for our energy system, as we covered at length this year. Electricity demand is ticking up: the amount of power utilities supplied to US data centers jumped 22% this year and will more than double by 2030.

But at least one positive shift is coming out of AI’s influence on energy: It’s driving renewed interest and investment in next-generation energy technologies.

In the near term, much of the energy needed for data centers, including those that power AI, will likely come from fossil fuels, especially new natural-gas power plants. But tech giants like Google, Microsoft, and Meta all have goals on the books to reduce their greenhouse-gas emissions, so they’re looking for alternatives.

Meta signed a deal with XGS Energy in June to purchase up to 150 megawatts of electricity from a geothermal plant. In October, Google signed an agreement that will help reopen Duane Arnold Energy Center in Iowa, a previously shuttered nuclear power plant.

Geothermal and nuclear could be key pieces of the grid of the future, as they can provide constant power in a way that wind and solar don’t. There’s a long way to go for many of the new versions of the tech, but more money and interest from big, powerful players can’t hurt.

Good news, bad news

Aerial view of solar power and battery storage units in the desert

ADOBE STOCK

Perhaps the strongest evidence of collective climate progress so far: We’ve already avoided the gravest dangers that scientists feared just a decade ago.

The world is on track for about 2.6 °C of warming over preindustrial conditions by 2100, according to Climate Action Tracker, an independent scientific effort to track the policy progress that nations have made toward their goals under the Paris climate agreement.

That’s a lot warmer than we want the planet to ever get. But it’s also a whole degree better than the 3.6 °C path that we were on a decade ago, just before nearly 200 countries signed the Paris deal.

That progress occurred because more and more nations passed emissions mandates, funded subsidies, and invested in research and development—and private industry got busy cranking out vast amounts of solar panels, wind turbines, batteries, and EVs. 

The bad news is that progress has stalled. Climate Action Tracker notes that its warming projections have remained stubbornly fixed for the last four years, as nations have largely failed to take the additional action needed to bend that curve closer to the 2 °C goal set out in the international agreement.

But having shaved off a degree of danger is still demonstrable proof that we can pull together in the face of a global threat and address a very, very hard problem. And it means we’ve done the difficult work of laying down the technical foundation for a society that can largely run without spewing ever more greenhouse gas into the atmosphere.

Hopefully, as cleantech continues to improve and climate change steadily worsens, the world will find the collective will to pick up the pace again soon.

AI coding is now everywhere. But not everyone is convinced.

Depending who you ask, AI-powered coding is either giving software developers an unprecedented productivity boost or churning out masses of poorly designed code that saps their attention and sets software projects up for serious long term-maintenance problems.

The problem is right now, it’s not easy to know which is true.

As tech giants pour billions into large language models (LLMs), coding has been touted as the technology’s killer app. Both Microsoft CEO Satya Nadella and Google CEO Sundar Pichai have claimed that around a quarter of their companies’ code is now AI-generated. And in March, Anthropic’s CEO, Dario Amodei, predicted that within six months 90% of all code would be written by AI. It’s an appealing and obvious use case. Code is a form of language, we need lots of it, and it’s expensive to produce manually. It’s also easy to tell if it works—run a program and it’s immediately evident whether it’s functional.


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


Executives enamored with the potential to break through human bottlenecks are pushing engineers to lean into an AI-powered future. But after speaking to more than 30 developers, technology executives, analysts, and researchers, MIT Technology Review found that the picture is not as straightforward as it might seem.  

For some developers on the front lines, initial enthusiasm is waning as they bump up against the technology’s limitations. And as a growing body of research suggests that the claimed productivity gains may be illusory, some are questioning whether the emperor is wearing any clothes.

The pace of progress is complicating the picture, though. A steady drumbeat of new model releases mean these tools’ capabilities and quirks are constantly evolving. And their utility often depends on the tasks they are applied to and the organizational structures built around them. All of this leaves developers navigating confusing gaps between expectation and reality. 

Is it the best of times or the worst of times (to channel Dickens) for AI coding? Maybe both.

A fast-moving field

It’s hard to avoid AI coding tools these days. There are a dizzying array of products available, both from model developers like Anthropic, OpenAI, and Google and from companies like Cursor and Windsurf, which wrap these models in polished code-editing software. And according to Stack Overflow’s 2025 Developer Survey, they’re being adopted rapidly, with 65% of developers now using them at least weekly.

AI coding tools first emerged around 2016 but were supercharged with the arrival of LLMs. Early versions functioned as little more than autocomplete for programmers, suggesting what to type next. Today they can analyze entire code bases, edit across files, fix bugs, and even generate documentation explaining how the code works. All this is guided through natural-language prompts via a chat interface.

“Agents”—autonomous LLM-powered coding tools that can take a high-level plan and build entire programs independently—represent the latest frontier in AI coding. This leap was enabled by the latest reasoning models, which can tackle complex problems step by step and, crucially, access external tools to complete tasks. “This is how the model is able to code, as opposed to just talk about coding,” says Boris Cherny, head of Claude Code, Anthropic’s coding agent.

These agents have made impressive progress on software engineering benchmarks—standardized tests that measure model performance. When OpenAI introduced the SWE-bench Verified benchmark in August 2024, offering a way to evaluate agents’ success at fixing real bugs in open-source repositories, the top model solved just 33% of issues. A year later, leading models consistently score above 70%

In February, Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, coined the term “vibe coding”—meaning an approach where people describe software in natural language and let AI write, refine, and debug the code. Social media abounds with developers who have bought into this vision, claiming massive productivity boosts.

But while some developers and companies report such productivity gains, the hard evidence is more mixed. Early studies from GitHub, Google, and Microsoft—all vendors of AI tools—found developers completing tasks 20% to 55% faster. But a September report from the consultancy Bain & Company described real-world savings as “unremarkable.”

Data from the developer analytics firm GitClear shows that most engineers are producing roughly 10% more durable code—code that isn’t deleted or rewritten within weeks—since 2022, likely thanks to AI. But that gain has come with sharp declines in several measures of code quality. Stack Overflow’s survey also found trust and positive sentiment toward AI tools falling significantly for the first time. And most provocatively, a July study by the nonprofit research organization Model Evaluation & Threat Research (METR) showed that while experienced developers believed AI made them 20% faster, objective tests showed they were actually 19% slower.

Growing disillusionment

For Mike Judge, principal developer at the software consultancy Substantial, the METR study struck a nerve. He was an enthusiastic early adopter of AI tools, but over time he grew frustrated with their limitations and the modest boost they brought to his productivity. “I was complaining to people because I was like, ‘It’s helping me but I can’t figure out how to make it really help me a lot,’” he says. “I kept feeling like the AI was really dumb, but maybe I could trick it into being smart if I found the right magic incantation.”

When asked by a friend, Judge had estimated the tools were providing a roughly 25% speedup. So when he saw similar estimates attributed to developers in the METR study he decided to test his own. For six weeks, he guessed how long a task would take, flipped a coin to decide whether to use AI or code manually, and timed himself. To his surprise, AI slowed him down by an median of 21%—mirroring the METR results.

This got Judge crunching the numbers. If these tools were really speeding developers up, he reasoned, you should see a massive boom in new apps, website registrations, video games, and projects on GitHub. He spent hours and several hundred dollars analyzing all the publicly available data and found flat lines everywhere.

“Shouldn’t this be going up and to the right?” says Judge. “Where’s the hockey stick on any of these graphs? I thought everybody was so extraordinarily productive.” The obvious conclusion, he says, is that AI tools provide little productivity boost for most developers. 

Developers interviewed by MIT Technology Review generally agree on where AI tools excel: producing “boilerplate code” (reusable chunks of code repeated in multiple places with little modification), writing tests, fixing bugs, and explaining unfamiliar code to new developers. Several noted that AI helps overcome the “blank page problem” by offering an imperfect first stab to get a developer’s creative juices flowing. It can also let nontechnical colleagues quickly prototype software features, easing the load on already overworked engineers.

These tasks can be tedious, and developers are typically  glad to hand them off. But they represent only a small part of an experienced engineer’s workload. For the more complex problems where engineers really earn their bread, many developers told MIT Technology Review, the tools face significant hurdles.

Perhaps the biggest problem is that LLMs can hold only a limited amount of information in their “context window”—essentially their working memory. This means they struggle to parse large code bases and are prone to forgetting what they’re doing on longer tasks. “It gets really nearsighted—it’ll only look at the thing that’s right in front of it,” says Judge. “And if you tell it to do a dozen things, it’ll do 11 of them and just forget that last one.”

DEREK BRAHNEY

LLMs’ myopia can lead to headaches for human coders. While an LLM-generated response to a problem may work in isolation, software is made up of hundreds of interconnected modules. If these aren’t built with consideration for other parts of the software, it can quickly lead to a tangled, inconsistent code base that’s hard for humans to parse and, more important, to maintain.

Developers have traditionally addressed this by following conventions—loosely defined coding guidelines that differ widely between projects and teams. “AI has this overwhelming tendency to not understand what the existing conventions are within a repository,” says Bill Harding, the CEO of GitClear. “And so it is very likely to come up with its own slightly different version of how to solve a problem.”

The models also just get things wrong. Like all LLMs, coding models are prone to “hallucinating”—it’s an issue built into how they work. But because the code they output looks so polished, errors can be difficult to detect, says James Liu, director of software engineering at the advertising technology company Mediaocean. Put all these flaws together, and using these tools can feel a lot like pulling a lever on a one-armed bandit. “Some projects you get a 20x improvement in terms of speed or efficiency,” says Liu. “On other things, it just falls flat on its face, and you spend all this time trying to coax it into granting you the wish that you wanted and it’s just not going to.”

Judge suspects this is why engineers often overestimate productivity gains. “You remember the jackpots. You don’t remember sitting there plugging tokens into the slot machine for two hours,” he says.

And it can be particularly pernicious if the developer is unfamiliar with the task. Judge remembers getting AI to help set up a Microsoft cloud service called an Azure Functions, which he’d never used before. He thought it would take about two hours, but nine hours later he threw in the towel. “It kept leading me down these rabbit holes and I didn’t know enough about the topic to be able to tell it ‘Hey, this is nonsensical,’” he says.

The debt begins to mount up

Developers constantly make trade-offs between speed of development and the maintainability of their code—creating what’s known as “technical debt,” says Geoffrey G. Parker, professor of engineering innovation at Dartmouth College. Each shortcut adds complexity and makes the code base harder to manage, accruing “interest” that must eventually be repaid by restructuring the code. As this debt piles up, adding new features and maintaining the software becomes slower and more difficult.

Accumulating technical debt is inevitable in most projects, but AI tools make it much easier for time-pressured engineers to cut corners, says GitClear’s Harding. And GitClear’s data suggests this is happening at scale. Since 2020, the company has seen a significant rise in the amount of copy-pasted code—an indicator that developers are reusing more code snippets, most likely based on AI suggestions—and an even bigger decline in the amount of code moved from one place to another, which happens when developers clean up their code base.

And as models improve, the code they produce is becoming increasingly verbose and complex, says Tariq Shaukat, CEO of Sonar, which makes tools for checking code quality. This is driving down the number of obvious bugs and security vulnerabilities, he says, but at the cost of increasing the number of “code smells”—harder-to-pinpoint flaws that lead to maintenance problems and technical debt. 

Recent research by Sonar found that these make up more than 90% of the issues found in code generated by leading AI models. “Issues that are easy to spot are disappearing, and what’s left are much more complex issues that take a while to find,” says Shaukat. “That’s what worries us about this space at the moment. You’re almost being lulled into a false sense of security.”

If AI tools make it increasingly difficult to maintain code, that could have significant security implications, says Jessica Ji, a security researcher at Georgetown University. “The harder it is to update things and fix things, the more likely a code base or any given chunk of code is to become insecure over time,” says Ji.

There are also more specific security concerns, she says. Researchers have discovered a worrying class of hallucinations where models reference nonexistent software packages in their code. Attackers can exploit this by creating packages with those names that harbor vulnerabilities, which the model or developer may then unwittingly incorporate into software. 

LLMs are also vulnerable to “data-poisoning attacks,” where hackers seed the publicly available data sets models train on with data that alters the model’s behavior in undesirable ways, such as generating insecure code when triggered by specific phrases. In October, research by Anthropic found that as few as 250 malicious documents can introduce this kind of back door into an LLM regardless of its size.

The converted

Despite these issues, though, there’s probably no turning back. “Odds are that writing every line of code on a keyboard by hand—those days are quickly slipping behind us,” says Kyle Daigle, chief operating officer at the Microsoft-owned code-hosting platform GitHub, which produces a popular AI-powered tool called Copilot (not to be confused with the Microsoft product of the same name).

The Stack Overflow report found that despite growing distrust in the technology, usage has increased rapidly and consistently over the past three years. Erin Yepis, a senior analyst at Stack Overflow, says this suggests that engineers are taking advantage of the tools with a clear-eyed view of the risks. The report also found that frequent users tend to be more enthusiastic and more than half of developers are not using the latest coding agents, perhaps explaining why many remain underwhelmed by the technology.

Those latest tools can be a revelation. Trevor Dilley, CTO at the software development agency Twenty20 Ideas, says he had found some value in AI editors’ autocomplete functions, but when he tried anything more complex it would “fail catastrophically.” Then in March, while on vacation with his family, he set the newly released Claude Code to work on one of his hobby projects. It completed a four-hour task in two minutes, and the code was better than what he would have written.

“I was like, Whoa,” he says. “That, for me, was the moment, really. There’s no going back from here.” Dilley has since cofounded a startup called DevSwarm, which is creating software that can marshal multiple agents to work in parallel on a piece of software.

The challenge, says Armin Ronacher, a prominent open-source developer, is that the learning curve for these tools is shallow but long. Until March he’d remained unimpressed by AI tools, but after leaving his job at the software company Sentry in April to launch a startup, he started experimenting with agents. “I basically spent a lot of months doing nothing but this,” he says. “Now, 90% of the code that I write is AI-generated.”

Getting to that point involved extensive trial and error, to figure out which problems tend to trip the tools up and which they can handle efficiently. Today’s models can tackle most coding tasks with the right guardrails, says Ronacher, but these can be very task and project specific.

To get the most out of these tools, developers must surrender control over individual lines of code and focus on the overall software architecture, says Nico Westerdale, chief technology officer at the veterinary staffing company IndeVets. He recently built a data science platform 100,000 lines of code long almost exclusively by prompting models rather than writing the code himself.

Westerdale’s process starts with an extended conversation with the modelagent to develop a detailed plan for what to build and how. He then guides it through each step. It rarely gets things right on the first try and needs constant wrangling, but if you force it to stick to well-defined design patterns, the models can produce high-quality, easily maintainable code, says Westerdale. He reviews every line, and the code is as good as anything he’s ever produced, he says: “I’ve just found it absolutely revolutionary,. It’s also frustrating, difficult, a different way of thinking, and we’re only just getting used to it.”

But while individual developers are learning how to use these tools effectively, getting consistent results across a large engineering team is significantly harder. AI tools amplify both the good and bad aspects of your engineering culture, says Ryan J. Salva, senior director of product management at Google. With strong processes, clear coding patterns, and well-defined best practices, these tools can shine. 

DEREK BRAHNEY

But if your development process is disorganized, they’ll only magnify the problems. It’s also essential to codify that institutional knowledge so the models can draw on it effectively. “A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads,” he says.

The cryptocurrency exchange Coinbase has been vocal about its adoption of AI tools. CEO Brian Armstrong made headlines in August when he revealed that the company had fired staff unwilling to adopt AI tools. But Coinbase’s head of platform, Rob Witoff, tells MIT Technology Review that while they’ve seen massive productivity gains in some areas, the impact has been patchy. For simpler tasks like restructuring the code base and writing tests, AI-powered workflows have achieved speedups of up to 90%. But gains are more modest for other tasks, and the disruption caused by overhauling existing processes often counteracts the increased coding speed, says Witoff.

One factor is that AI tools let junior developers produce far more code,. As in almost all engineering teams, this code has to be reviewed by others, normally more senior developers, to catch bugs and ensure it meets quality standards. But the sheer volume of code now being churned out i whichs quickly saturatinges the ability of midlevel staff to review changes. “This is the cycle we’re going through almost every month, where we automate a new thing lower down in the stack, which brings more pressure higher up in the stack,” he says. “Then we’re looking at applying automation to that higher-up piece.”

Developers also spend only 20% to 40% of their time coding, says Jue Wang, a partner at Bain, so even a significant speedup there often translates to more modest overall gains. Developers spend the rest of their time analyzing software problems and dealing with customer feedback, product strategy, and administrative tasks. To get significant efficiency boosts, companies may need to apply generative AI to all these other processes too, says Jue, and that is still in the works.

Rapid evolution

Programming with agents is a dramatic departure from previous working practices, though, so it’s not surprising companies are facing some teething issues. These are also very new products that are changing by the day. “Every couple months the model improves, and there’s a big step change in the model’s coding capabilities and you have to get recalibrated,” says Anthropic’s Cherny.

For example, in June Anthropic introduced a built-in planning mode to Claude; it has since been replicated by other providers. In October, the company also enabled Claude to ask users questions when it needs more context or faces multiple possible solutions, which Cherny says helps it avoid the tendency to simply assume which path is the best way forward.

Most significant, Anthropic has added features that make Claude better at managing its own context. When it nears the limits of its working memory, it summarizes key details and uses them to start a new context window, effectively giving it an “infinite” one, says Cherny. Claude can also invoke sub-agents to work on smaller tasks, so it no longer has to hold all aspects of the project in its own head. The company claims that its latest model, Claude 4.5 Sonnet, can now code autonomously for more than 30 hours without major performance degradation.

Novel approaches to software development could also sidestep coding agents’ other flaws. MIT professor Max Tegmark has introduced something he calls “vericoding,” which could allow agents to produce entirely bug-free code from a natural-language description. It builds on an approach known as “formal verification,” where developers create a mathematical model of their software that can prove incontrovertibly that it functions correctly. This approach is used in high-stakes areas like flight-control systems and cryptographic libraries, but it remains costly and time-consuming, limiting its broader use.

Rapid improvements in LLMs’ mathematical capabilities have opened up the tantalizing possibility of models that produce not only software but the mathematical proof that it’s bug free, says Tegmark. “You just give the specification, and the AI comes back with provably correct code,” he says. “You don’t have to touch the code. You don’t even have to ever look at the code.”

When tested on about 2,000 vericoding problems in Dafny—a language designed for formal verification—the best LLMs solved over 60%, according to non-peer-reviewed research by Tegmark’s group. This was achieved with off-the-shelf LLMs, and Tegmark expects that training specifically for vericoding could improve scores rapidly.

And counterintuitively, Tthe speed at which AI generates code could actuallylso ease maintainability concerns. Alex Worden, principal engineer at the business software giant Intuit, notes that maintenance is often difficult because engineers reuse components across projects, creating a tangle of dependencies where one change triggers cascading effects across the code base. Reusing code used to save developers time, but in a world where AI can produce hundreds of lines of code in seconds, that imperative has gone, says Worden.

Instead, he advocates for “disposable code,” where each component is generated independently by AI without regard for whether it follows design patterns or conventions. They are then connected via APIs—sets of rules that let components request information or services from each other. Each component’s inner workings are not dependent on other parts of the code base, making it possible to rip them out and replace them without wider impact, says Worden. 

“The industry is still concerned about humans maintaining AI-generated code,” he says. “I question how long humans will look at or care about code.”

A narrowing talent pipeline

For the foreseeable future, though, humans will still need to understand and maintain the code that underpins their projects. And one of the most pernicious side effects of AI tools may be a shrinking pool of people capable of doing so. 

Early evidence suggests that fears around the job-destroying effects of AI may be justified. A recent Stanford University study found that employment among software developers aged 22 to 25 fell nearly 20% between 2022 and 2025, coinciding with the rise of AI-powered coding tools.

Experienced developers could face difficulties too. Luciano Nooijen, an engineer at the video-game infrastructure developer Companion Group, used AI tools heavily in his day job, where they were provided for free. But when he began a side project without access to those tools, he found himself struggling with tasks that previously came naturally. “I was feeling so stupid because things that used to be instinct became manual, sometimes even cumbersome,” says Nooijen.

Just as athletes still perform basic drills, he thinks the only way to maintain an instinct for coding is to regularly practice the grunt work. That’s why he’s largely abandoned AI tools, though he admits that deeper motivations are also at play. 

Part of the reason Nooijen and other developers MIT Technology Review spoke to are pushing back against AI tools is a sense that they are hollowing out the parts of their jobs that they love. “I got into software engineering because I like working with computers. I like making machines do things that I want,” Nooijen says. “It’s just not fun sitting there with my work being done for me.”

AI might not be coming for lawyers’ jobs anytime soon

When the generative AI boom took off in 2022, Rudi Miller and her law school classmates were suddenly gripped with anxiety. “Before graduating, there was discussion about what the job market would look like for us if AI became adopted,” she recalls. 

So when it came time to choose a speciality, Miller—now a junior associate at the law firm Orrick—decided to become a litigator, the kind of lawyer who represents clients in court. She hoped the courtroom would be the last human stage. “Judges haven’t allowed ChatGPT-enabled robots to argue in court yet,” she says.


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


She had reason to be worried. The artificial-intelligence job apocalypse seemed to be coming for lawyers. In March 2023, researchers reported that GPT-4 had smashed the Uniform Bar Exam. That same month, an industry report predicted that 44% of legal work could be automated. The legal tech industry entered a boom as law firms began adopting generative AI to mine mountains of documents and draft contracts, work ordinarily done by junior associates. Last month, the law firm Clifford Chance axed 10% of its staff in London, citing increased use of AI as a reason.

But for all the hype, LLMs are still far from thinking like lawyers—let alone replacing them. The models continue to hallucinate case citations, struggle to navigate gray areas of the law and reason about novel questions, and stumble when they attempt to synthesize information scattered across statutes, regulations, and court cases. And there are deeper institutional reasons to think the models could struggle to supplant legal jobs. While AI is reshaping the grunt work of the profession, the end of lawyers may not be arriving anytime soon.

The big experiment

The legal industry has long been defined by long hours and grueling workloads, so the promise of superhuman efficiency is appealing. Law firms are experimenting with general-purpose tools like ChatGPT and Microsoft Copilot and specialized legal tools like Harvey and Thomson Reuters’ CoCounsel, with some building their own in-house tools on top of frontier models. They’re rolling out AI boot camps and letting associates bill hundreds of hours to AI experimentation. As of 2024, 47.8% of attorneys at law firms employing 500 or more lawyers used AI, according to the American Bar Association. 

But lawyers say that LLMs are a long way from reasoning well enough to replace them. Lucas Hale, a junior associate at McDermott Will & Schulte, has been embracing AI for many routine chores. He uses Relativity to sift through long documents and Microsoft Copilot for drafting legal citations. But when he turns to ChatGPT with a complex legal question, he finds the chatbot spewing hallucinations, rambling off topic, or drawing a blank.

“In the case where we have a very narrow question or a question of first impression for the court,” he says, referring to a novel legal question that a court has never decided before, “that’s the kind of thinking that the tool can’t do.”

Much of Lucas’s work involves creatively applying the law to new fact patterns. “Right now, I don’t think very much of the work that litigators do, at least not the work that I do, can be outsourced to an AI utility,” he says.

Allison Douglis, a senior associate at Jenner & Block, uses an LLM to kick off her legal research. But the tools only take her so far. “When it comes to actually fleshing out and developing an argument as a litigator, I don’t think they’re there,” she says. She has watched the models hallucinate case citations and fumble through ambiguous areas of the law.

“Right now, I would much rather work with a junior associate than an AI tool,” she says. “Unless they get extraordinarily good very quickly, I can’t imagine that changing in the near future.”

Beyond the bar

The legal industry has seemed ripe for an AI takeover ever since ChatGPT’s triumph on the bar exam. But passing a standardized test isn’t the same as practicing law. The exam tests whether people can memorize legal rules and apply them to hypothetical situations—not whether they can exercise strategic judgment in complicated realities or craft arguments in uncharted legal territory. And models can be trained to ace benchmarks without genuinely improving their reasoning.

But new benchmarks are aiming to better measure the models’ ability to do legal work in the real world. The Professional Reasoning Benchmark, published by ScaleAI in November, evaluated leading LLMs on legal and financial tasks designed by professionals in the field. The study found that the models have critical gaps in their reliability for professional adoption, with the best-performing model scoring only 37% on the most difficult legal problems, meaning it met just over a third of possible points on the evaluation criteria. The models frequently made inaccurate legal judgments, and if they did reach correct conclusions, they did so through incomplete or opaque reasoning processes. 

“The tools actually are not there to basically substitute [for] your lawyer,” says Afra Feyza Akyurek, the lead author of the paper. “Even though a lot of people think that LLMs have a good grasp of the law, it’s still lagging behind.” 

The paper builds on other benchmarks measuring the models’ performance on economically valuable work. The AI Productivity Index, published by the data firm Mercor in September and updated in December, found that the models have “substantial limitations” in performing legal work. The best-performing model scored 77.9% on legal tasks, meaning it satisfied roughly four out of five evaluation criteria. A model with such a score might generate substantial economic value in some industries, but in fields where errors are costly, it may not be useful at all, the early version of the study noted.  

Professional benchmarks are a big step forward in evaluating the LLMs’ real-world capabilities, but they may still not capture what lawyers actually do. “These questions, although more challenging than those in past benchmarks, still don’t fully reflect the kinds of subjective, extremely challenging questions lawyers tackle in real life,” says Jon Choi, a law professor at the University of Washington School of Law, who coauthored a study on legal benchmarks in 2023. 

Unlike math or coding, in which LLMs have made significant progress, legal reasoning may be challenging for the models to learn. The law deals with messy real-world problems, riddled with ambiguity and subjectivity, that often have no right answer, says Choi. Making matters worse, a lot of legal work isn’t recorded in ways that can be used to train the models, he says. When it is, documents can span hundreds of pages, scattered across statutes, regulations, and court cases that exist in a complex hierarchy.  

But a more fundamental limitation might be that LLMs are simply not trained to think like lawyers. “The reasoning models still don’t fully reason about problems like we humans do,” says Julian Nyarko, a law professor at Stanford Law School. The models may lack a mental model of the world—the ability to simulate a scenario and predict what will happen—and that capability could be at the heart of complex legal reasoning, he says. It’s possible that the current paradigm of LLMs trained on next-word prediction gets us only so far.  

The jobs remain

Despite early signs that AI is beginning to affect entry-level workers, labor statistics have yet to show that lawyers are being displaced. 93.4% of law school graduates in 2024 were employed within 10 months of graduation—the highest rate on record—according to the National Association for Law Placement. The number of graduates working in law firms rose by 13% from 2023 to 2024. 

For now, law firms are slow to shrink their ranks. “We’re not reducing headcounts at this point,” said Amy Ross, the chief of attorney talent at the law firm Ropes & Gray. 

Even looking ahead, the effects could be incremental. “I will expect some impact on the legal profession’s labor market, but not major,” says Mert Demirer, an economist at MIT. “AI is going to be very useful in terms of information discovery and summary,” he says, but for complex legal tasks, “the law’s low risk tolerance, plus the current capabilities of AI, are going to make that case less automatable at this point.” Capabilities may evolve over time, but that’s a big unknown.

It’s not just that the models themselves are not ready to replace junior lawyers. Institutional barriers may also shape how AI is deployed. Higher productivity reduces billable hours, challenging the dominant business model of law firms. Liability looms large for lawyers, and clients may still want a human on the hook. Regulations could also constrain how lawyers use the technology.

Still, as AI takes on some associate work, law firms may need to reinvent their training system. “When junior work dries up, you have to have a more formal way of teaching than hoping that an apprenticeship works,” says Ethan Mollick, a management professor at the Wharton School of the University of Pennsylvania.

Zach Couger, a junior associate at McDermott Will & Schulte, leans on ChatGPT to comb through piles of contracts he once slogged through by hand. He can’t imagine going back to doing the job himself, but he wonders what he’s missing. 

“I’m worried that I’m not getting the same reps that senior attorneys got,” he says, referring to the repetitive training that has long defined the early experiences of lawyers. “On the other hand, it is very nice to have a semi–knowledge expert to just ask questions to that’s not a partner who’s also very busy.” 

Even though an AI job apocalypse looks distant, the uncertainty sticks with him. Lately, Couger finds himself staying up late, wondering if he could be part of the last class of associates at big law firms: “I may be the last plane out.”

Solar geoengineering startups are getting serious

Solar geoengineering aims to manipulate the climate by bouncing sunlight back into space. In theory, it could ease global warming. But as interest in the idea grows, so do concerns about potential consequences.

A startup called Stardust Solutions recently raised a $60 million funding round, the largest known to date for a geoengineering startup. My colleague James Temple has a new story out about the company, and how its emergence is making some researchers nervous.

So far, the field has been limited to debates, proposed academic research, and—sure—a few fringe actors to keep an eye on. Now things are getting more serious. What does it mean for geoengineering, and for the climate?

Researchers have considered the possibility of addressing planetary warming this way for decades. We already know that volcanic eruptions, which spew sulfur dioxide into the atmosphere, can reduce temperatures. The thought is that we could mimic that natural process by spraying particles up there ourselves.

The prospect is a controversial one, to put it lightly. Many have concerns about unintended consequences and uneven benefits. Even public research led by top institutions has faced barriers—one famous Harvard research program was officially canceled last year after years of debate.

One of the difficulties of geoengineering is that in theory a single entity, like a startup company, could make decisions that have a widespread effect on the planet. And in the last few years, we’ve seen more interest in geoengineering from the private sector. 

Three years ago, James broke the story that Make Sunsets, a California-based company, was already releasing particles into the atmosphere in an effort to tweak the climate.

The company’s CEO Luke Iseman went to Baja California in Mexico, stuck some sulfur dioxide into a weather balloon, and sent it skyward. The amount of material was tiny, and it’s not clear that it even made it into the right part of the atmosphere to reflect any sunlight.

But fears that this group or others could go rogue and do their own geoengineering led to widespread backlash. Mexico announced plans to restrict geoengineering experiments in the country a few weeks after that news broke.

You can still buy cooling credits from Make Sunsets, and the company was just granted a patent for its system. But the startup is seen as something of a fringe actor.

Enter Stardust Solutions. The company has been working under the radar for a few years, but it has started talking about its work more publicly this year. In October, it announced a significant funding round, led by some top names in climate investing. “Stardust is serious, and now it’s raised serious money from serious people,” as James puts it in his new story.

That’s making some experts nervous. Even those who believe we should be researching geoengineering are concerned about what it means for private companies to do so.

“Adding business interests, profit motives, and rich investors into this situation just creates more cause for concern, complicating the ability of responsible scientists and engineers to carry out the work needed to advance our understanding,” write David Keith and Daniele Visioni, two leading figures in geoengineering research, in a recent opinion piece for MIT Technology Review.

Stardust insists that it won’t move forward with any geoengineering until and unless it’s commissioned to do so by governments and there are rules and bodies in place to govern use of the technology.

But there’s no telling how financial pressure might change that, down the road. And we’re already seeing some of the challenges faced by a private company in this space: the need to keep trade secrets.

Stardust is currently not sharing information about the particles it intends to release into the sky, though it says it plans to do so once it secures a patent, which could happen as soon as next year. The company argues that its proprietary particles will be safe, cheap to manufacture, and easier to track than the already abundant sulfur dioxide. But at this point, there’s no way for external experts to evaluate those claims.

As Keith and Visioni put it: “Research won’t be useful unless it’s trusted, and trust depends on transparency.”

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

How one controversial startup hopes to cool the planet

Stardust Solutions believes that it can solve climate change—for a price.

The Israel-based geoengineering startup has said it expects  nations will soon pay it more than a billion dollars a year to launch specially equipped aircraft into the stratosphere. Once they’ve reached the necessary altitude, those planes will disperse particles engineered to reflect away enough sunlight to cool down the planet, purportedly without causing environmental side effects. 

The proprietary (and still secret) particles could counteract all the greenhouse gases the world has emitted over the last 150 years, the company stated in a 2023 pitch deck it presented to venture capital firms. In fact, it’s the “only technologically feasible solution” to climate change, the company said.

The company disclosed it raised $60 million in funding in October, marking by far the largest known funding round to date for a startup working on solar geoengineering.

Stardust is, in a sense, the embodiment of Silicon Valley’s simmering frustration with the pace of academic research on the technology. It’s a multimillion-dollar bet that a startup mindset can advance research and development that has crept along amid scientific caution and public queasiness.

But numerous researchers focused on solar geoengineering are deeply skeptical that Stardust will line up the government customers it would need to carry out a global deployment as early as 2035, the plan described in its earlier investor materials—and aghast at the suggestion that it ever expected to move that fast. They’re also highly critical of the idea that a company would take on the high-stakes task of setting the global temperature, rather than leaving it to publicly funded research programs.

“They’ve ignored every recommendation from everyone and think they can turn a profit in this field,” says Douglas MacMartin, an associate professor at Cornell University who studies solar geoengineering. “I think it’s going to backfire. Their investors are going to be dumping their money down the drain, and it will set back the field.”

The company has finally emerged from stealth mode after completing its funding round, and its CEO, Yanai Yedvab, agreed to conduct one of the company’s first extensive interviews with MIT Technology Review for this story.

Yedvab walked back those ambitious projections a little, stressing that the actual timing of any stratospheric experiments, demonstrations, or deployments will be determined by when governments decide it’s appropriate to carry them out. Stardust has stated clearly that it will move ahead with solar geoengineering only if nations pay it to proceed, and only once there are established rules and bodies guiding the use of the technology.

That decision, he says, will likely be dictated by how bad climate change becomes in the coming years.

“It could be a situation where we are at the place we are now, which is definitely not great,” he says. “But it could be much worse. We’re saying we’d better be ready.”

“It’s not for us to decide, and I’ll say humbly, it’s not for these researchers to decide,” he adds. “It’s the sense of urgency that will dictate how this will evolve.”

The building blocks

No one is questioning the scientific credentials of Stardust. The company was founded in 2023 by a trio of prominent researchers, including Yedvab, who served as deputy chief scientist at the Israeli Atomic Energy Commission. The company’s lead scientist, Eli Waxman, is the head of the department of particle physics and astrophysics at the Weizmann Institute of Science. Amyad Spector, the chief product officer, was previously a nuclear physicist at Israel’s secretive Negev Nuclear Research Center.

Stardust CEO Yanai Yedvab (right) and Chief Product Officer Amyad Spector (left) at the company’s facility in Israel.
ROBY YAHAV, STARDUST

Stardust says it employs 25 scientists, engineers, and academics. The company is based in Ness Ziona, Israel, and plans to open a US headquarters soon. 

Yedvab says the motivation for starting Stardust was simply to help develop an effective means of addressing climate change. 

“Maybe something in our experience, in the tool set that we bring, can help us in contributing to solving one of the greatest problems humanity faces,” he says.

Lowercarbon Capital, the climate-tech-focused investment firm  cofounded by the prominent tech investor Chris Sacca, led the $60 million investment round. Future Positive, Future Ventures, and Never Lift Ventures, among others, participated as well.

AWZ Ventures, a firm focused on security and intelligence technologies, co-led the company’s earlier seed round, which totaled $15 million.

Yedvab says the company will use that money to advance research, development, and testing for the three components of its system, which are also described in the pitch deck: safe particles that could be affordably manufactured; aircraft dispersion systems; and a means of tracking particles and monitoring their effects.

“Essentially, the idea is to develop all these building blocks and to upgrade them to a level that will allow us to give governments the tool set and all the required information to make decisions about whether and how to deploy this solution,” he says. 

The company is, in many ways, the opposite of Make Sunsets, the first company that came along offering to send particles into the stratosphere—for a fee—by pumping sulfur dioxide into weather balloons and hand-releasing them into the sky. Many researchers viewed it as a provocative, unscientific, and irresponsible exercise in attention-gathering. 

But Stardust is serious, and now it’s raised serious money from serious people—all of which raises the stakes for the solar geoengineering field and, some fear, increases the odds that the world will eventually put the technology to use.

“That marks a turning point in that these types of actors are not only possible, but are real,” says Shuchi Talati, executive director of the Alliance for Just Deliberation on Solar Geoengineering, a nonprofit that strives to ensure that developing nations are included in the global debate over such climate interventions. “We’re in a more dangerous era now.”

Many scientists studying solar geoengineering argue strongly that universities, governments, and transparent nonprofits should lead the work in the field, given the potential dangers and deep public concerns surrounding a tool with the power to alter the climate of the planet. 

It’s essential to carry out the research with appropriate oversight, explore the potential downsides of these approaches, and publicly publish the results “to ensure there’s no bias in the findings and no ulterior motives in pushing one way or another on deployment or not,” MacMartin says. “[It] shouldn’t be foisted upon people without proper and adequate information.”

He criticized, for instance, the company’s claims to have developed what he described as their “magic aerosol particle,” arguing that the assertion that it is perfectly safe and inert can’t be trusted without published findings. Other scientists have also disputed those scientific claims.

Plenty of other academics say solar geoengineering shouldn’t be studied at all, fearing that merely investigating it starts the world down a slippery slope toward its use and diminishes the pressures to cut greenhouse-gas emissions. In 2022, hundreds of them signed an open letter calling for a global ban on the development and use of the technology, adding the concern that there is no conceivable way for the world’s nations to pull together to establish rules or make collective decisions ensuring that it would be used in “a fair, inclusive, and effective manner.”

“Solar geoengineering is not necessary,” the authors wrote. “Neither is it desirable, ethical, or politically governable in the current context.”

The for-profit decision 

Stardust says it’s important to pursue the possibility of solar geoengineering because the dangers of climate change are accelerating faster than the world’s ability to respond to it, requiring a new “class of solution … that buys us time and protects us from overheating.”

Yedvab says he and his colleagues thought hard about the right structure for the organization, finally deciding that for-profits working in parallel with academic researchers have delivered “most of the groundbreaking technologies” in recent decades. He cited advances in genome sequencing, space exploration, and drug development, as well as the restoration of the ozone layer.

He added that a for-profit structure was also required to raise funds and attract the necessary talent.

“There is no way we could, unfortunately, raise even a small portion of this amount by philanthropic resources or grants these days,” he says.

He adds that while academics have conducted lots of basic science in solar geoengineering, they’ve done very little in terms of building the technological capacities. Their geoengineering research is also primarily focused on the potential use of sulfur dioxide, because it is known to help reduce global temperatures after volcanic eruptions blast massive amounts of it into the stratospheric. But it has well-documented downsides as well, including harm to the protective ozone layer.

“It seems natural that we need better options, and this is why we started Stardust: to develop this safe, practical, and responsible solution,” the company said in a follow-up email. “Eventually, policymakers will need to evaluate and compare these options, and we’re confident that our option will be superior over sulfuric acid primarily in terms of safety and practicability.”

Public trust can be won not by excluding private companies, but by setting up regulations and organizations to oversee this space, much as the US Food and Drug Administration does for pharmaceuticals, Yedvab says.

“There is no way this field could move forward if you don’t have this governance framework, if you don’t have external validation, if you don’t have clear regulation,” he says.

Meanwhile, the company says it intends to operate transparently, pledging to publish its findings whether they’re favorable or not.

That will include finally revealing details about the particles it has developed, Yedvab says. 

Early next year, the company and its collaborators will begin publishing data or evidence “substantiating all the claims and disclosing all the information,” he says, “so that everyone in the scientific community can actually check whether we checked all these boxes.”

In the follow-up email, the company acknowledged that solar geoengineering isn’t a “silver bullet” but said it is “the only tool that will enable us to cool the planet in the short term, as part of a larger arsenal of technologies.”

“The only way governments could be in a position to consider [solar geoengineering] is if the work has been done to research, de-risk, and engineer safe and responsible solutions—which is what we see as our role,” the company added later. “We are hopeful that research will continue not just from us, but also from academic institutions, nonprofits, and other responsible companies that may emerge in the future.”

Ambitious projections

Stardust’s earlier pitch deck stated that the company expected to conduct its first “stratospheric aerial experiments” last year, though those did not move ahead (more on that in a moment).

On another slide, the company said it expected to carry out a “large-scale demonstration” around 2030 and proceed to a “global full-scale deployment” by about 2035. It said it expected to bring in roughly $200 million and $1.5 billion in annual revenue by those periods, respectively.

Every researcher interviewed for this story was adamant that such a deployment should not happen so quickly.

Given the global but uneven and unpredictable impacts of solar geoengineering, any decision to use the technology should be reached through an inclusive, global agreement, not through the unilateral decisions of individual nations, Talati argues. 

“We won’t have any sort of international agreement by that point given where we are right now,” she says.

A global agreement, to be clear, is a big step beyond setting up rules and oversight bodies—and some believe that such an agreement on a technology so divisive could never be achieved.

There’s also still a vast amount of research that must be done to better understand the negative side effects of solar geoengineering generally and any ecological impacts of Stardust’s materials specifically, adds Holly Buck, an associate professor at the University of Buffalo and author of After Geoengineering.

“It is irresponsible to talk about deploying stratospheric aerosol injection without fundamental research about its impacts,” Buck wrote in an email.

She says the timelines are also “unrealistic” because there are profound public concerns about the technology. Her polling work found that a significant fraction of the US public opposes even research (though polling varies widely). 

Meanwhile, most academic efforts to move ahead with even small-scale outdoor experiments have sparked fierce backlash. That includes the years-long effort by researchers then at Harvard to carry out a basic equipment test for their so-called ScopeX experiment. The high-altitude balloon would have launched from a flight center in Sweden, but the test was ultimately scratched amid objections from environmentalists and Indigenous groups. 

Given this baseline of public distrust, Stardust’s for-profit proposals only threaten to further inflame public fears, Buck says.

“I find the whole proposal incredibly socially naive,” she says. “We actually could use serious research in this field, but proposals like this diminish the chances of that happening.”

Those public fears, which cross the political divide, also mean politicians will see little to no political upside to paying Stardust to move ahead, MacMartin says.

“If you don’t have the constituency for research, it seems implausible to me that you’d turn around and give money to an Israeli company to deploy it,” he says.

An added risk is that if one nation or a small coalition forges ahead without broader agreement, it could provoke geopolitical conflicts. 

“What if Russia wants it a couple of degrees warmer, and India a couple of degrees cooler?” asked Alan Robock, a professor at Rutgers University, in the Bulletin of the Atomic Scientists in 2008. “Should global climate be reset to preindustrial temperature or kept constant at today’s reading? Would it be possible to tailor the climate of each region of the planet independently without affecting the others? If we proceed with geoengineering, will we provoke future climate wars?”

Revised plans

Yedvab says the pitch deck reflected Stardust’s strategy at a “very early stage in our work,” adding that their thinking has “evolved,” partly in response to consultations with experts in the field.

He says that the company will have the technological capacity to move ahead with demonstrations and deployments on the timelines it laid out but adds, “That’s a necessary but not sufficient condition.”

“Governments will need to decide where they want to take it, if at all,” he says. “It could be a case that they will say ‘We want to move forward.’ It could be a case that they will say ‘We want to wait a few years.’”

“It’s for them to make these decisions,” he says.

Yedvab acknowledges that the company has conducted flights in the lower atmosphere to test its monitoring system, using white smoke as a simulant for its particles, as the Wall Street Journal reported last year. It’s also done indoor tests of the dispersion system and its particles in a wind tunnel set up within its facility.

But in response to criticisms like the ones above, Yedvab says the company hasn’t conducted outdoor particle experiments and won’t move forward with them until it has approval from governments. 

“Eventually, there will be a need to conduct outdoor testing,” he says. “There is no way you can validate any solution without outdoor testing.” But such testing of sunlight reflection technology, he says, “should be done only working together with government and under these supervisions.”

Generating returns  

Stardust may be willing to wait for governments to be ready to deploy its system, but there’s no guarantee that its investors will have the same patience. In accepting tens of millions in venture capital, Stardust may now face financial pressures that could “drive the timelines,” says Gernot Wagner, a climate economist at Columbia University. 

And that raises a different set of concerns.

Obliged to deliver returns, the company might feel it must strive to convince government leaders that they should pay for its services, Talati says. 

“The whole point of having companies and investors is you want your thing to be used,” she says. “There’s a massive incentive to lobby countries to use it, and that’s the whole danger of having for-profit companies here.”

She argues those financial incentives threaten to accelerate the use of solar geoengineering ahead of broader international agreements and elevate business interests above the broader public good.

Stardust has “quietly begun lobbying on Capitol Hill” and has hired the law firm Holland & Knight, according to Politico.

It has also worked with Red Duke Strategies, a consulting firm based in McLean, Virginia, to develop “strategic relationships and communications that promote understanding and enable scientific testing,” according to a case study on the company’s  website. 

“The company needed to secure both buy-in and support from the United States government and other influential stakeholders to move forward,” Red Duke states. “This effort demanded a well-connected and authoritative partner who could introduce Stardust to a group of experts able to research, validate, deploy, and regulate its SRM technology.”

Red Duke didn’t respond to an inquiry from MIT Technology Review. Stardust says its work with the consulting firm was not a government lobbying effort.

Yedvab acknowledges that the company is meeting with government leaders in the US, Europe, its own region, and the Global South. But he stresses that it’s not asking any country to contribute funding or to sign off on deployments at this stage. Instead, it’s making the case for nations to begin crafting policies to regulate solar geoengineering.

“When we speak to policymakers—and we speak to policymakers; we don’t hide it—essentially, what we tell them is ‘Listen, there is a solution,’” he says. “‘It’s not decades away—it’s a few years away. And it’s your role as policymakers to set the rules of this field.’”

“Any solution needs checks and balances,” he says. “This is how we see the checks and balances.”

He says the best-case scenario is still a rollout of clean energy technologies that accelerates rapidly enough to drive down emissions and curb climate change.

“We are perfectly fine with building an option that will sit on the shelf,” he says. “We’ll go and do something else. We have a great team and are confident that we can find also other problems to work with.”

He says the company’s investors are aware of and comfortable with that possibility, supportive of the principles that will guide Stardust’s work, and willing to wait for regulations and government contracts.

Lowercarbon Capital didn’t respond to an inquiry from MIT Technology Review.

‘Sentiment of hope’

Others have certainly imagined the alternative scenario Yedvab raises: that nations will increasingly support the idea of geoengineering in the face of mounting climate catastrophes. 

In Kim Stanley Robinson’s 2020 novel, The Ministry for the Future, India unilaterally forges ahead with solar geoengineering following a heat wave that kills millions of people. 

Wagner sketched a variation on that scenario in his 2021 book, Geoengineering: The Gamble, speculating that a small coalition of nations might kick-start a rapid research and deployment program as an emergency response to escalating humanitarian crises. In his version, the Philippines offers to serve as the launch site after a series of super-cyclones batter the island nation, forcing millions from their homes. 

It’s impossible to know today how the world will react if one nation or a few go it alone, or whether nations could come to agreement on where the global temperature should be set. 

But the lure of solar geoengineering could become increasingly enticing as more and more nations endure mass suffering, starvation, displacement, and death.

“We understand that probably it will not be perfect,” Yedvab says. “We understand all the obstacles, but there is this sentiment of hope, or cautious hope, that we have a way out of this dark corridor we are currently in.”

“I think that this sentiment of hope is something that gives us a lot of energy to move on forward,” he adds.

The ads that sell the sizzle of genetic trait discrimination

One day this fall, I watched an electronic sign outside the Broadway-Lafayette subway station in Manhattan switch seamlessly between an ad for makeup and one promoting the website Pickyourbaby.com, which promises a way for potential parents to use genetic tests to influence their baby’s traits, including eye color, hair color, and IQ.

Inside the station, every surface was wrapped with more ads—babies on turnstiles, on staircases, on banners overhead. “Think about it. Makeup and then genetic optimization,” exulted Kian Sadeghi, the 26-year-old founder of Nucleus Genomics, the startup running the ads. To his mind, one should be as accessible as the other. 

Nucleus is a young, attention-seeking genetic software company that says it can analyze genetic tests on IVF embryos to score them for 2,000 traits and disease risks, letting parents pick some and reject others. This is possible because of how our DNA shapes us, sometimes powerfully. As one of the subway banners reminded the New York riders: “Height is 80% genetic.”

The day after the campaign launched, Sadeghi and I had briefly sparred online. He’d been on X showing off a phone app where parents can click through traits like eye color and hair color. I snapped back that all this sounded a lot like Uber Eats—another crappy, frictionless future invented by entrepreneurs, but this time you’d click for a baby.

I agreed to meet Sadeghi that night in the station under a banner that read, “IQ is 50% genetic.” He appeared in a puffer jacket and told me the campaign would soon spread to 1,000 train cars. Not long ago, this was a secretive technology to whisper about at Silicon Valley dinner parties. But now? “Look at the stairs. The entire subway is genetic optimization. We’re bringing it mainstream,” he said. “I mean, like, we are normalizing it, right?”

Normalizing what, exactly? The ability to choose embryos on the basis of predicted traits could lead to healthier people. But the traits mentioned in the subway—height and IQ—focus the public’s mind toward cosmetic choices and even naked discrimination. “I think people are going to read this and start realizing: Wow, it is now an option that I can pick. I can have a taller, smarter, healthier baby,” says Sadeghi.

Entrepreneur Kian Sadeghi stands under advertising banner in the Broadway-Lafayette subway station in Manhattan, part of a campaign called “Have Your Best Baby.”
COURTESY OF THE AUTHOR

Nucleus got its seed funding from Founders Fund, an investment firm known for its love of contrarian bets. And embryo scoring fits right in—it’s an unpopular concept, and professional groups say the genetic predictions aren’t reliable. So far, leading IVF clinics still refuse to offer these tests. Doctors worry, among other things, that they’ll create unrealistic parental expectations. What if little Johnny doesn’t do as well on the SAT as his embryo score predicted?

The ad blitz is a way to end-run such gatekeepers: If a clinic won’t agree to order the test, would-be parents can take their business elsewhere. Another embryo testing company, Orchid, notes that high consumer demand emboldened Uber’s early incursions into regulated taxi markets. “Doctors are essentially being shoved in the direction of using it, not because they want to, but because they will lose patients if they don’t,” Orchid founder Noor Siddiqui said during an online event this past August.

Sadeghi prefers to compare his startup to Airbnb. He hopes it can link customers to clinics, becoming a digital “funnel” offering a “better experience” for everyone. He notes that Nucleus ads don’t mention DNA or any details of how the scoring technique works. That’s not the point. In advertising, you sell the sizzle, not the steak. And in Nucleus’s ad copy, what sizzles is height, smarts, and light-colored eyes.

It makes you wonder if the ads should be permitted. Indeed, I learned from Sadeghi that the Metropolitan Transportation Authority had objected to parts of the campaign. The metro agency, for instance, did not let Nucleus run ads saying “Have a girl” and “Have a boy,” even though it’s very easy to identify the sex of an embryo using a genetic test. The reason was an MTA policy that forbids using government-owned infrastructure to promote “invidious discrimination” against protected classes, which include race, religion and biological sex.

Since 2023, New York City has also included height and weight in its anti-discrimination law, the idea being to “root out bias” related to body size in housing and in public spaces. So I’m not sure why the MTA let Nucleus declare that height is 80% genetic. (The MTA advertising department didn’t respond to questions.) Perhaps it’s because the statement is a factual claim, not an explicit call to action. But we all know what to do: Pick the tall one and leave shorty in the IVF freezer, never to be born.

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

How AI is uncovering hidden geothermal energy resources

Sometimes geothermal hot spots are obvious, marked by geysers and hot springs on the planet’s surface. But in other places, they’re obscured thousands of feet underground. Now AI could help uncover these hidden pockets of potential power.

A startup company called Zanskar announced today that it’s used AI and other advanced computational methods to uncover a blind geothermal system—meaning there aren’t signs of it on the surface—in the western Nevada desert. The company says it’s the first blind system that’s been identified and confirmed to be a commercial prospect in over 30 years. 

Historically, finding new sites for geothermal power was a matter of brute force. Companies spent a lot of time and money drilling deep wells, looking for places where it made sense to build a plant.

Zanskar’s approach is more precise. With advancements in AI, the company aims to “solve this problem that had been unsolvable for decades, and go and finally find those resources and prove that they’re way bigger than previously thought,” says Carl Hoiland, the company’s cofounder and CEO.  

To support a successful geothermal power plant, a site needs high temperatures at an accessible depth and space for fluid to move through the rock and deliver heat. In the case of the new site, which the company calls Big Blind, the prize is a reservoir that reaches 250 °F at about 2,700 feet below the surface.

As electricity demand rises around the world, geothermal systems like this one could provide a source of constant power without emitting the greenhouse gases that cause climate change. 

The company has used its technology to identify many potential hot spots. “We have dozens of sites that look just like this,” says Joel Edwards, Zanskar’s cofounder and CTO. But for Big Blind, the team has done the fieldwork to confirm its model’s predictions.

The first step to identifying a new site is to use regional AI models to search large areas. The team trains models on known hot spots and on simulations it creates. Then it feeds in geological, satellite, and other types of data, including information about fault lines. The models can then predict where potential hot spots might be.

One strength of using AI for this task is that it can handle the immense complexity of the information at hand. “If there’s something learnable in the earth, even if it’s a very complex phenomenon that’s hard for us humans to understand, neural nets are capable of learning that, if given enough data,” Hoiland says. 

Once models identify a potential hot spot, a field crew heads to the site, which might be roughly 100 square miles or so, and collects additional information through techniques that include drilling shallow holes to look for elevated underground temperatures.

In the case of Big Blind, this prospecting information gave the company enough confidence to purchase a federal lease, allowing it to develop a geothermal plant. With that lease secured, the team returned with large drill rigs and drilled thousands of feet down in July and August. The workers found the hot, permeable rock they expected.

Next they must secure permits to build and connect to the grid and line up the investments needed to build the plant. The team will also continue testing at the site, including long-term testing to track heat and water flow.

“There’s a tremendous need for methodology that can look for large-scale features,” says John McLennan, technical lead for resource management at Utah FORGE, a national lab field site for geothermal energy funded by the US Department of Energy. The new discovery is “promising,” McLennan adds.

Big Blind is Zanskar’s first confirmed discovery that wasn’t previously explored or developed, but the company has used its tools for other geothermal exploration projects. Earlier this year, it announced a discovery at a site that had previously been explored by the industry but not developed. The company also purchased and revived a geothermal power plant in New Mexico.

And this could be just the beginning for Zanskar. As Edwards puts it, “This is the start of a wave of new, naturally occurring geothermal systems that will have enough heat in place to support power plants.”

The State of AI: Welcome to the economic singularity

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday for the next two weeks, writers from both publications will debate one aspect of the generative AI revolution reshaping global power.

This week, Richard Waters, FT columnist and former West Coast editor, talks with MIT Technology Review’s editor at large David Rotman about the true impact of AI on the job market.

Bonus: If you’re an MIT Technology Review subscriber, you can join David and Richard, alongside MIT Technology Review’s editor in chief, Mat Honan, for an exclusive conversation live on Tuesday, December 9 at 1pm ET about this topic. Sign up to be a part here.

Richard Waters writes:

Any far-reaching new technology is always uneven in its adoption, but few have been more uneven than generative AI. That makes it hard to assess its likely impact on individual businesses, let alone on productivity across the economy as a whole.

At one extreme, AI coding assistants have revolutionized the work of software developers. Mark Zuckerberg recently predicted that half of Meta’s code would be written by AI within a year. At the other extreme, most companies are seeing little if any benefit from their initial investments. A widely cited study from MIT found that so far, 95% of gen AI projects produce zero return.

That has provided fuel for the skeptics who maintain that—by its very nature as a probabilistic technology prone to hallucinating—generative AI will never have a deep impact on business.

To many students of tech history, though, the lack of immediate impact is just the normal lag associated with transformative new technologies. Erik Brynjolfsson, then an assistant professor at MIT, first described what he called the “productivity paradox of IT” in the early 1990s. Despite plenty of anecdotal evidence that technology was changing the way people worked, it wasn’t showing up in the aggregate data in the form of higher productivity growth. Brynjolfsson’s conclusion was that it just took time for businesses to adapt.

Big investments in IT finally showed through with a notable rebound in US productivity growth starting in the mid-1990s. But that tailed off a decade later and was followed by a second lull.

Richard Waters and David Rotman

FT/MIT TECHNOLOGY REVIEW | ADOBE STOCK

In the case of AI, companies need to build new infrastructure (particularly data platforms), redesign core business processes, and retrain workers before they can expect to see results. If a lag effect explains the slow results, there may at least be reasons for optimism: Much of the cloud computing infrastructure needed to bring generative AI to a wider business audience is already in place.

The opportunities and the challenges are both enormous. An executive at one Fortune 500 company says his organization has carried out a comprehensive review of its use of analytics and concluded that its workers, overall, add little or no value. Rooting out the old software and replacing that inefficient human labor with AI might yield significant results. But, as this person says, such an overhaul would require big changes to existing processes and take years to carry out.

There are some early encouraging signs. US productivity growth, stuck at 1% to 1.5% for more than a decade and a half, rebounded to more than 2% last year. It probably hit the same level in the first nine months of this year, though the lack of official data due to the recent US government shutdown makes this impossible to confirm.

It is impossible to tell, though, how durable this rebound will be or how much can be attributed to AI. The effects of new technologies are seldom felt in isolation. Instead, the benefits compound. AI is riding earlier investments in cloud and mobile computing. In the same way, the latest AI boom may only be the precursor to breakthroughs in fields that have a wider impact on the economy, such as robotics. ChatGPT might have caught the popular imagination, but OpenAI’s chatbot is unlikely to have the final word.

David Rotman replies: 

This is my favorite discussion these days when it comes to artificial intelligence. How will AI affect overall economic productivity? Forget about the mesmerizing videos, the promise of companionship, and the prospect of agents to do tedious everyday tasks—the bottom line will be whether AI can grow the economy, and that means increasing productivity. 

But, as you say, it’s hard to pin down just how AI is affecting such growth or how it will do so in the future. Erik Brynjolfsson predicts that, like other so-called general purpose technologies, AI will follow a J curve in which initially there is a slow, even negative, effect on productivity as companies invest heavily in the technology before finally reaping the rewards. And then the boom. 

But there is a counterexample undermining the just-be-patient argument. Productivity growth from IT picked up in the mid-1990s but since the mid-2000s has been relatively dismal. Despite smartphones and social media and apps like Slack and Uber, digital technologies have done little to produce robust economic growth. A strong productivity boost never came.

Daron Acemoglu, an economist at MIT and a 2024 Nobel Prize winner, argues that the productivity gains from generative AI will be far smaller and take far longer than AI optimists think. The reason is that though the technology is impressive in many ways, the field is too narrowly focused on products that have little relevance to the largest business sectors.

The statistic you cite that 95% of AI projects lack business benefits is telling. 

Take manufacturing. No question, some version of AI could help; imagine a worker on the factory floor snapping a picture of a problem and asking an AI agent for advice. The problem is that the big tech companies creating AI aren’t really interested in solving such mundane tasks, and their large foundation models, mostly trained on the internet, aren’t all that helpful. 

It’s easy to blame the lack of productivity impact from AI so far on business practices and poorly trained workers. Your example of the executive of the Fortune 500 company sounds all too familiar. But it’s more useful to ask how AI can be trained and fine-tuned to give workers, like nurses and teachers and those on the factory floor, more capabilities and make them more productive at their jobs. 

The distinction matters. Some companies announcing large layoffs recently cited AI as the reason. The worry, however, is that it’s just a short-term cost-saving scheme. As economists like Brynjolfsson and Acemoglu agree, the productivity boost from AI will come when it’s used to create new types of jobs and augment the abilities of workers, not when it is used just to slash jobs to reduce costs. 

Richard Waters responds : 

I see we’re both feeling pretty cautious, David, so I’ll try to end on a positive note. 

Some analyses assume that a much greater share of existing work is within the reach of today’s AI. McKinsey reckons 60% (versus 20% for Acemoglu) and puts annual productivity gains across the economy at as much as 3.4%. Also, calculations like these are based on automation of existing tasks; any new uses of AI that enhance existing jobs would, as you suggest, be a bonus (and not just in economic terms).

Cost-cutting always seems to be the first order of business with any new technology. But we’re still in the early stages and AI is moving fast, so we can always hope.

Further reading

FT chief economics commentator Martin Wolf has been skeptical about whether tech investment boosts productivity but says AI might prove him wrong. The downside: Job losses and wealth concentration might lead to “techno-feudalism.”

The FT‘s Robert Armstrong argues that the boom in data center investment need not turn to bust. The biggest risk is that debt financing will come to play too big a role in the buildout.

Last year, David Rotman wrote for MIT Technology Review about how we can make sure AI works for us in boosting productivity, and what course corrections will be required.

David also wrote this piece about how we can best measure the impact of basic R&D funding on economic growth, and why it can often be bigger than you might think.

What’s next for AlphaFold: A conversation with a Google DeepMind Nobel laureate

<div data-chronoton-summary="

  • Nobel-winning protein prediction AlphaFold creator John Jumper reflects on five years since the AI system revolutionized protein structure prediction. The DeepMind tool can determine protein shapes to atomic precision in hours instead of months.
  • Unexpected applications emerge Scientists have found creative “off-label” uses for AlphaFold, from studying honeybee disease resistance to accelerating synthetic protein design. Some researchers even use it as a search engine, testing thousands of potential protein interactions to find matches that would be impractical to verify in labs.
  • Future fusion with language models Jumper, at 39 the youngest chemistry Nobel laureate in 75 years, now aims to combine AlphaFold’s specialized capabilities with the broad reasoning of large language models. “I’ll be shocked if we don’t see more and more LLM impact on science,” he says, while avoiding the pressure of another Nobel-worthy breakthrough.

” data-chronoton-post-id=”1128322″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

In 2017, fresh off a PhD on theoretical chemistry, John Jumper heard rumors that Google DeepMind had moved on from building AI that played games with superhuman skill and was starting up a secret project to predict the structures of proteins. He applied for a job.

Just three years later, Jumper celebrated a stunning win that few had seen coming. With CEO Demis Hassabis, he had co-led the development of an AI system called AlphaFold 2 that was able to predict the structures of proteins to within the width of an atom, matching the accuracy of painstaking techniques used in the lab, and doing it many times faster—returning results in hours instead of months.

AlphaFold 2 had cracked a 50-year-old grand challenge in biology. “This is the reason I started DeepMind,” Hassabis told me a few years ago. “In fact, it’s why I’ve worked my whole career in AI.” In 2024, Jumper and Hassabis shared a Nobel Prize in chemistry.

It was five years ago this week that AlphaFold 2’s debut took scientists by surprise. Now that the hype has died down, what impact has AlphaFold really had? How are scientists using it? And what’s next? I talked to Jumper (as well as a few other scientists) to find out.

“It’s been an extraordinary five years,” Jumper says, laughing: “It’s hard to remember a time before I knew tremendous numbers of journalists.”

AlphaFold 2 was followed by AlphaFold Multimer, which could predict structures that contained more than one protein, and then AlphaFold 3, the fastest version yet. Google DeepMind also let AlphaFold loose on UniProt, a vast protein database used and updated by millions of researchers around the world. It has now predicted the structures of some 200 million proteins, almost all that are known to science.

Despite his success, Jumper remains modest about AlphaFold’s achievements. “That doesn’t mean that we’re certain of everything in there,” he says. “It’s a database of predictions, and it comes with all the caveats of predictions.”

A hard problem

Proteins are the biological machines that make living things work. They form muscles, horns, and feathers; they carry oxygen around the body and ferry messages between cells; they fire neurons, digest food, power the immune system; and so much more. But understanding exactly what a protein does (and what role it might play in various diseases or treatments) involves figuring out its structure—and that’s hard.

Proteins are made from strings of amino acids that chemical forces twist up into complex knots. An untwisted string gives few clues about the structure it will form. In theory, most proteins could take on an astronomical number of possible shapes. The task is to predict the correct one.

Jumper and his team built AlphaFold 2 using a type of neural network called a transformer, the same technology that underpins large language models. Transformers are very good at paying attention to specific parts of a larger puzzle.

But Jumper puts a lot of the success down to making a prototype model that they could test quickly. “We got a system that would give wrong answers at incredible speed,” he says. “That made it easy to start becoming very adventurous with the ideas you try.”

They stuffed the neural network with as much information about protein structures as they could, such as how proteins across certain species have evolved similar shapes. And it worked even better than they expected. “We were sure we had made a breakthrough,” says Jumper. “We were sure that this was an incredible advance in ideas.”

What he hadn’t foreseen was that researchers would download his software and start using it straight away for so many different things. Normally, it’s the thing a few iterations down the line that has the real impact, once the kinks have been ironed out, he says: “I’ve been shocked at how responsibly scientists have used it, in terms of interpreting it, and using it in practice about as much as it should be trusted in my view, neither too much nor too little.”

Any projects stand out in particular? 

Honeybee science

Jumper brings up a research group that uses AlphaFold to study disease resistance in honeybees. “They wanted to understand this particular protein as they look at things like colony collapse,” he says. “I never would have said, ‘You know, of course AlphaFold will be used for honeybee science.’”

He also highlights a few examples of what he calls off-label uses of AlphaFold“in the sense that it wasn’t guaranteed to work”—where the ability to predict protein structures has opened up new research techniques. “The first is very obviously the advances in protein design,” he says. “David Baker and others have absolutely run with this technology.”

Baker, a computational biologist at the University of Washington, was a co-winner of last year’s chemistry Nobel, alongside Jumper and Hassabis, for his work on creating synthetic proteins to perform specific tasks—such as treating disease or breaking down plastics—better than natural proteins can.

Baker and his colleagues have developed their own tool based on AlphaFold, called RoseTTAFold. But they have also experimented with AlphaFold Multimer to predict which of their designs for potential synthetic proteins will work.    

“Basically, if AlphaFold confidently agrees with the structure you were trying to design [and] then you make it and if AlphaFold says ‘I don’t know,’ you don’t make it. That alone was an enormous improvement.” It can make the design process 10 times faster, says Jumper.

Another off-label use that Jumper highlights: Turning AlphaFold into a kind of search engine. He mentions two separate research groups that were trying to understand exactly how human sperm cells hooked up with eggs during fertilization. They knew one of the proteins involved but not the other, he says: “And so they took a known egg protein and ran all 2,000 human sperm surface proteins, and they found one that AlphaFold was very sure stuck against the egg.” They were then able to confirm this in the lab.

“This notion that you can use AlphaFold to do something you couldn’t do before—you would never do 2,000 structures looking for one answer,” he says. “This kind of thing I think is really extraordinary.”

Five years on

When AlphaFold 2 came out, I asked a handful of early adopters what they made of it. Reviews were good, but the technology was too new to know for sure what long-term impact it might have. I caught up with one of those people to hear his thoughts five years on.

Kliment Verba is a molecular biologist who runs a lab at the University of California, San Francisco. “It’s an incredibly useful technology, there’s no question about it,” he tells me. “We use it every day, all the time.”

But it’s far from perfect. A lot of scientists use AlphaFold to study pathogens or to develop drugs. This involves looking at interactions between multiple proteins or between proteins and even smaller molecules in the body. But AlphaFold is known to be less accurate at making predictions about multiple proteins or their interaction over time.

Verba says he and his colleagues have been using AlphaFold long enough to get used to its limitations. “There are many cases where you get a prediction and you have to kind of scratch your head,” he says. “Is this real or is this not? It’s not entirely clear—it’s sort of borderline.”

“It’s sort of the same thing as ChatGPT,” he adds. “You know—it will bullshit you with the same confidence as it would give a true answer.”

Still, Verba’s team uses AlphaFold (both 2 and 3, because they have different strengths, he says) to run virtual versions of their experiments before running them in the lab. Using AlphaFold’s results, they can narrow down the focus of an experiment—or decide that it’s not worth doing.

It can really save time, he says: “It hasn’t really replaced any experiments, but it’s augmented them quite a bit.”

New wave  

AlphaFold was designed to be used for a range of purposes. Now multiple startups and university labs are building on its success to develop a new wave of tools more tailored to drug discovery. This year, a collaboration between MIT researchers and the AI drug company Recursion produced a model called Boltz-2, which predicts not only the structure of proteins but also how well potential drug molecules will bind to their target.  

Last month, the startup Genesis Molecular AI released another structure prediction model called Pearl, which the firm claims is more accurate than AlphaFold 3 for certain queries that are important for drug development. Pearl is interactive, so that drug developers can feed any additional data they may have to the model to guide its predictions.

AlphaFold was a major leap, but there’s more to do, says Evan Feinberg, Genesis Molecular AI’s CEO: “We’re still fundamentally innovating, just with a better starting point than before.”

Genesis Molecular AI is pushing margins of error down from less than two angstroms, the de facto industry standard set by AlphaFold, to less than one angstrom—one 10-millionth of a millimeter, or the width of a single hydrogen atom.

“Small errors can be catastrophic for predicting how well a drug will actually bind to its target,” says Michael LeVine, vice president of modeling and simulation at the firm. That’s because chemical forces that interact at one angstrom can stop doing so at two. “It can go from ‘They will never interact’ to ‘They will,’” he says.

With so much activity in this space, how soon should we expect new types of drugs to hit the market? Jumper is pragmatic. Protein structure prediction is just one step of many, he says: “This was not the only problem in biology. It’s not like we were one protein structure away from curing any diseases.”

Think of it this way, he says. Finding a protein’s structure might previously have cost $100,000 in the lab: “If we were only a hundred thousand dollars away from doing a thing, it would already be done.”

At the same time, researchers are looking for ways to do as much as they can with this technology, says Jumper: “We’re trying to figure out how to make structure prediction an even bigger part of the problem, because we have a nice big hammer to hit it with.”

In other words, they want to make everything into nails? “Yeah, let’s make things into nails,” he says. “How do we make this thing that we made a million times faster a bigger part of our process?”

What’s next?

Jumper’s next act? He wants to fuse the deep but narrow power of AlphaFold with the broad sweep of LLMs.  

“We have machines that can read science. They can do some scientific reasoning,” he says. “And we can build amazing, superhuman systems for protein structure prediction. How do you get these two technologies to work together?”

That makes me think of a system called AlphaEvolve, which is being built by another team at Google DeepMind. AlphaEvolve uses an LLM to generate possible solutions to a problem and a second model to check them, filtering out the trash. Researchers have already used AlphaEvolve to make a handful of practical discoveries in math and computer science.    

Is that what Jumper has in mind? “I won’t say too much on methods, but I’ll be shocked if we don’t see more and more LLM impact on science,” he says. “I think that’s the exciting open question that I’ll say almost nothing about. This is all speculation, of course.”

Jumper was 39 when he won his Nobel Prize. What’s next for him?

“It worries me,” he says. “I believe I’m the youngest chemistry laureate in 75 years.” 

He adds: “I’m at the midpoint of my career, roughly. I guess my approach to this is to try to do smaller things, little ideas that you keep pulling on. The next thing I announce doesn’t have to be, you know, my second shot at a Nobel. I think that’s the trap.”

The State of AI: Chatbot companions and the future of our privacy

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday, writers from both publications debate one aspect of the generative AI revolution reshaping global power.

In this week’s conversation MIT Technology Review’s senior reporter for features and investigations, Eileen Guo, and FT tech correspondent Melissa Heikkilä discuss the privacy implications of our new reliance on chatbots.

Eileen Guo writes:

Even if you don’t have an AI friend yourself, you probably know someone who does. A recent study found that one of the top uses of generative AI is companionship: On platforms like Character.AI, Replika, or Meta AI, people can create personalized chatbots to pose as the ideal friend, romantic partner, parent, therapist, or any other persona they can dream up. 

It’s wild how easily people say these relationships can develop. And multiple studies have found that the more conversational and human-like an AI chatbot is, the more likely it is that we’ll trust it and be influenced by it. This can be dangerous, and the chatbots have been accused of pushing some people toward harmful behaviors—including, in a few extreme examples, suicide. 

Some state governments are taking notice and starting to regulate companion AI. New York requires AI companion companies to create safeguards and report expressions of suicidal ideation, and last month California passed a more detailed bill requiring AI companion companies to protect children and other vulnerable groups. 

But tellingly, one area the laws fail to address is user privacy.

This is despite the fact that AI companions, even more so than other types of generative AI, depend on people to share deeply personal information—from their day-to-day-routines, innermost thoughts, and questions they might not feel comfortable asking real people.

After all, the more users tell their AI companions, the better the bots become at keeping them engaged. This is what MIT researchers Robert Mahari and Pat Pataranutaporn called “addictive intelligence” in an op-ed we published last year, warning that the developers of AI companions make “deliberate design choices … to maximize user engagement.” 

Ultimately, this provides AI companies with something incredibly powerful, not to mention lucrative: a treasure trove of conversational data that can be used to further improve their LLMs. Consider how the venture capital firm Andreessen Horowitz explained it in 2023: 

“Apps such as Character.AI, which both control their models and own the end customer relationship, have a tremendous opportunity to  generate market value in the emerging AI value stack. In a world where data is limited, companies that can create a magical data feedback loop by connecting user engagement back into their underlying model to continuously improve their product will be among the biggest winners that emerge from this ecosystem.”

This personal information is also incredibly valuable to marketers and data brokers. Meta recently announced that it will deliver ads through its AI chatbots. And research conducted this year by the security company Surf Shark found that four out of the five AI companion apps it looked at in the Apple App Store were collecting data such as user or device IDs, which can be combined with third-party data to create profiles for targeted ads. (The only one that said it did not collect data for tracking services was Nomi, which told me earlier this year that it would not “censor” chatbots from giving explicit suicide instructions.) 

All of this means that the privacy risks posed by these AI companions are, in a sense, required: They are a feature, not a bug. And we haven’t even talked about the additional security risks presented by the way AI chatbots collect and store so much personal information in one place

So, is it possible to have prosocial and privacy-protecting AI companions? That’s an open question. 

What do you think, Melissa, and what is top of mind for you when it comes to privacy risks from AI companions? And do things look any different in Europe? 

Melissa Heikkilä replies:

Thanks, Eileen. I agree with you. If social media was a privacy nightmare, then AI chatbots put the problem on steroids. 

In many ways, an AI chatbot creates what feels like a much more intimate interaction than a Facebook page. The conversations we have are only with our computers, so there is little risk of your uncle or your crush ever seeing what you write. The AI companies building the models, on the other hand, see everything. 

Companies are optimizing their AI models for engagement by designing them to be as human-like as possible. But AI developers have several other ways to keep us hooked. The first is sycophancy, or the tendency for chatbots to be overly agreeable. 

This feature stems from the way the language model behind the chatbots is trained using reinforcement learning. Human data labelers rate the answers generated by the model as either acceptable or not. This teaches the model how to behave. 

Because people generally like answers that are agreeable, such responses are weighted more heavily in training. 

AI companies say they use this technique because it helps models become more helpful. But it creates a perverse incentive. 

After encouraging us to pour our hearts out to chatbots, companies from Meta to OpenAI are now looking to monetize these conversations. OpenAI recently told us it was looking at a number of ways to meet $1 trillion spending pledges, which included advertising and shopping features. 

AI models are already incredibly persuasive. Researchers at the UK’s AI Security Institute have shown that they are far more skilled than humans at persuading people to change their minds on politics, conspiracy theories, and vaccine skepticism. They do this by generating large amounts of relevant evidence and communicating it in an effective and understandable way. 

This feature, paired with their sycophancy and a wealth of personal data, could be a powerful tool for advertisers—one that is more manipulative than anything we have seen before. 

By default, chatbot users are opted in to data collection. Opt-out policies place the onus on users to understand the implications of sharing their information. It’s also unlikely that data already used in training will be removed. 

We are all part of this phenomenon whether we want to be or not. Social media platforms from Instagram to LinkedIn now use our personal data to train generative AI models. 

Companies are sitting on treasure troves that consist of our most intimate thoughts and preferences, and language models are very good at picking up on subtle hints in language that could help advertisers profile us better by inferring our age, location, gender, and income level.

We are being sold the idea of an omniscient AI digital assistant, a superintelligent confidante. In return, however, there is a very real risk that our information is about to be sent to the highest bidder once again.

Eileen responds:

I think the comparison between AI companions and social media is both apt and concerning. 

As Melissa highlighted, the privacy risks presented by AI chatbots aren’t new—they just “put the [privacy] problem on steroids.” AI companions are more intimate and even better optimized for engagement than social media, making it more likely that people will offer up more personal information.

Here in the US, we are far from solving the privacy issues already presented by social networks and the internet’s ad economy, even without the added risks of AI.

And without regulation, the companies themselves are not following privacy best practices either. One recent study found that the major AI models train their LLMs on user chat data by default unless users opt out, while several don’t offer opt-out mechanisms at all.

In an ideal world, the greater risks of companion AI would give more impetus to the privacy fight—but I don’t see any evidence this is happening. 

Further reading 

FT reporters peer under the hood of OpenAI’s five-year business plan as it tries to meet its vast $1 trillion spending pledges

Is it really such a problem if AI chatbots tell people what they want to hear? This FT feature asks what’s wrong with sycophancy 

In a recent print issue of MIT Technology Review, Rhiannon Williams spoke to a number of people about the types of relationships they are having with AI chatbots.

Eileen broke the story for MIT Technology Review about a chatbot that was encouraging some users to kill themselves.