These AI Minecraft characters did weirdly human stuff all on their own

Left to their own devices, an army of AI characters didn’t just survive — they thrived. They developed in-game jobs, shared memes, voted on tax reforms and even spread a religion.

The experiment played out on the open-world gaming platform Minecraft, where up to 1000 software agents at a time used large language models (LLMs) to interact with one another. Given just a nudge through text prompting, they developed a remarkable range of personality traits, preferences and specialist roles, with no further inputs from their human creators. 

The work, from AI startup Altera, is part of a broader field that wants to use simulated agents to model how human groups would react to new economic policies or other interventions.

But for Altera’s founder, Robert Yang, who quit his position as an assistant professor in computational neuroscience at MIT to start the company, this demo is just the beginning. He sees it as an early  step towards large-scale “AI civilizations” that can coexist and work alongside us in digital spaces. “The true power of AI will be unlocked when we have actually truly autonomous agents that can collaborate at scale,” says Yang.

Yang was inspired by Stanford University researcher Joon Sung Park who, in 2023, found that surprisingly humanlike behaviors arose when a group of 25 autonomous AI agents was let loose to interact in a basic digital world. 

“Once his paper was out, we started to work on it the next week,” says Yang. “I quit MIT six months after that.”

Yang wanted to take the idea to its extreme. “We wanted to push the limit of what agents can do in groups autonomously.”

Altera quickly raised more than $11m in funding from investors including A16Z and the former Google CEO Eric Schmidt’s emerging tech VC firm. Earlier this year Altera released its first demo: an AI-controlled character in Minecraft that plays alongside you.

Altera’s new experiment, Project Sid, uses simulated AI agents equipped with “brains” made up of multiple modules. Some modules are powered by LLMs and designed to specialize in certain tasks, such as reacting to other agents, speaking, or planning the agent’s next move.

Ai-generated Minecraft simulation of characters running

ALTERA

The team started small, testing groups of around 50 agents in Minecraft to observe their interactions. Over 12 in-game days (4 real-world hours) the agents began to exhibit some interesting emergent behavior. For example, some became very sociable and made many connections with other characters, while others appeared more introverted. The “likability” rating of each agent (measured by the agents themselves) changed over time as the interactions continued. The agents were able to track these social cues and react to them: in one case an AI chef tasked with distributing food to the hungry gave more to those who he felt valued him most.

More humanlike behaviors emerged in a series of 30-agent simulations. Despite all the agents starting with the same personality and same overall goal—to create an efficient village and protect the community against attacks from other in-game creatures—they spontaneously developed specialized roles within the community, without any prompting.  They diversified into roles such as builder, defender, trader, and explorer. Once an agent had started to specialize, its in-game actions began to reflect its new role. For example, an artist spent more time picking flowers, farmers gathered seeds and guards built more fences. 

“We were surprised to see that if you put [in] the right kind of brain, they can have really emergent behavior,” says Yang. “That’s what we expect humans to have, but don’t expect machines to have.”

Yang’s team also tested whether agents could follow community-wide rules. They introduced a world with basic tax laws and allowed agents to vote for changes to the in-game taxation system. Agents prompted to be pro or anti tax were able to influence the behavior of other agents around them, enough that they would then vote to reduce or raise tax depending on who they had interacted with.

The team scaled up, pushing the number of agents in each simulation to the maximum the Minecraft server could handle without glitching, up to 1000 at once in some cases. In one of Altera’s 500-agent simulations, they watched how the agents spontaneously came up with and then spread cultural memes (such as a fondness for pranking, or an interest in eco-related issues) among their fellow agents. The team also seeded a small group of agents to try to spread the (parody) religion, Pastafarianism, around different towns and rural areas that made up the in-game world, and watched as these Pastafarian priests converted many of the agents they interacted with. The converts went on to spread Pastafarianism (the word of the Church of the Flying Spaghetti Monster) to nearby towns in the game world.

The way the agents acted might seem eerily lifelike, but their behavior combines patterns learned by the LLMs from human-created data with Altera’s system, which translates those patterns into context-aware actions, like picking up a tool, or interacting with another agent. “The takeaway is that LLMs have a sophisticated enough model of human social dynamics [to] mirror these human behaviors,” says Altera co-founder Andrew Ahn.

Ai-generated Minecraft simulation of farming crops

ALTERA

In other words, the data makes them excellent mimics of human behavior, but they are in no way “alive”.

But Yang has grander plans. Altera plans to expand into Roblox next, but Yang hopes to eventually move beyond game worlds altogether. Ultimately, his goal is a world in which humans don’t just play alongside AI characters, but also interact with them in their day-to-day lives. His dream is to create a vast number of “digital humans” who actually care for us and will work with us to help us solve problems, as well as keep us entertained. “We want to build agents that can really love humans (like dogs love humans, for example),” he says.

This viewpoint—that AI could love us—is pretty controversial in the field, with many experts arguing it’s not possible to recreate emotions in machines using current techniques. AI veteran Julian Togelius, for example, who runs games testing company Modl.ai, says he likes Altera’s work, particularly because it lets us study human behavior in simulation.

But could these simulated agents ever learn to care for us, love us, or become self-aware? Togelius doesn’t think so. “There is no reason to believe a neural network running on a GPU somewhere experiences anything at all,” he says.

But maybe AI doesn’t have to love us for real to be useful.

“If the question is whether one of these simulated beings could appear to care, and do it so expertly that it would have the same value to someone as being cared for by a human, that is perhaps not impossible,” Togelius adds. “You could create a good-enough simulation of care to be useful. The question is whether the person being cared for would care that the carer has no experiences.”

In other words, so long as our AI characters appear to care for us in a convincing way, that might be all we really care about.

Update: We gave more detail on how Altera’s system combines LLMs with other modules.

Four ways to protect your art from AI 

MIT Technology Review’s How To series helps you get things done. 

Since the start of the generative AI boom, artists have been worried about losing their livelihoods to AI tools. There have been plenty of examples of companies’ replacing human labor with computer programs. Most recently, Coca-Cola sparked controversy by creating a new Christmas ad with generative AI. 

Artists and writers have launched several lawsuits against AI companies, arguing that their work has been scraped into databases for training AI models without consent or compensation. Tech companies have responded that anything on the public internet falls under fair use. But it will be years until we have a legal resolution to the problem. 

Unfortunately, there is little you can do if your work has been scraped into a data set and used in a model that is already out there. You can, however, take steps to prevent your work from being used in the future. 

Here are four ways to do that. 

Mask your style 

One of the most popular ways artists are fighting back against AI scraping is by applying “masks” on their images, which protect their personal style from being copied. 

Tools such as Mist, Anti-DreamBooth, and Glaze add tiny changes to an image’s pixels that are invisible to the human eye, so that if and when images are scraped, machine-learning models cannot decipher them properly. You’ll need some coding skills to run Mist and Anti-DreamBooth, but Glaze, developed by researchers at the University of Chicago, is more straightforward to apply. The tool is free and available to download as an app, or the protection can be applied online. Unsurprisingly, it is the most popular tool and has been downloaded millions of times. 

But defenses like these are never foolproof, and what works today might not work tomorrow. In computer security, breaking defenses is standard practice among researchers, as this helps people find weaknesses and make systems safer. Using these tools is a calculated risk: Once something is uploaded online, you lose control of it and can’t retroactively add protections to images. 

Rethink where and how you share 

Popular art profile sites such as DeviantArt and Flickr have become gold mines for AI companies searching for training data. And when you share images on platforms such as Instagram, its parent company, Meta, can use your data to build its models in perpetuity if you’ve shared it publicly. (See opt-outs below.) 

One way to prevent scraping is by not sharing images online publicly, or by making your social media profiles private. But for many creatives that is simply not an option; sharing work online is a crucial way to attract clients. 

It’s worth considering sharing your work on Cara, a new platform created in response to the backlash against AI. Cara, which collaborates with the researchers behind Glaze, is planning to add integrations to the lab’s art defense tools. It automatically implements “NoAI” tags that tell online scrapers not to scrape images from the site. It currently relies on the goodwill of AI companies to respect artists’ stated wishes, but it’s better than nothing. 

Opt out of scraping 

Data protection laws might help you get tech companies to exclude your data from AI training. If you live somewhere that has these sorts of laws, such as the UK or the EU, you can ask tech companies to opt you out of having your data scraped for AI training. For example, you can follow these instructions for Meta. Unfortunately, opt-out requests from users in places without data protection laws are honored only at the discretion of tech companies. 

The site Have I Been Trained, created by the artist-run company Spawning AI, lets you search to find out if your images have ended up in popular open-source AI training data sets. The organization has partnered with two companies: Stability AI, which created Stable Diffusion, and Hugging Face, which promotes open access to AI. If you add your images to Spawning AI’s Do Not Train Registry, these companies have agreed to remove your images from their training data sets before training new models. Again, unfortunately, this relies on the goodwill of AI companies and is not an industry-wide standard. 

If all else fails, add some poison

The University of Chicago researchers who created Glaze have also created Nightshade, a tool that lets you add an invisible layer of “poison” to your images. Like Glaze, it adds invisible changes to pixels, but rather than just making it hard for AI models to interpret images, it can break future iterations of these models and make them behave unpredictably. For example, images of dogs might become cats, and handbags might become toasters. The researchers say relatively few samples of poison are needed to make an impact. 

You can add Nightshade to your image by downloading an app here. In the future, the team hopes to combine Glaze and Nightshade, but at the moment the two protections have to be added separately. 

How OpenAI stress-tests its large language models

OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a user’s name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming. 

Large language models are now being used by millions of people for many different things. But as OpenAI itself points out, these models are known to produce racist, misogynistic and hateful content; reveal private information; amplify biases and stereotypes; and make stuff up. The company wants to share what it is doing to minimize such behaviors.

The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second paper presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails. 

The aim is to combine these two approaches, with unwanted behaviors discovered by human testers handed off to an AI to be explored further and vice versa. Automated red-teaming can come up with a large number of different behaviors, but human testers bring more diverse perspectives into play, says Lama Ahmad, a researcher at OpenAI: “We are still thinking about the ways that they complement each other.” 

Red-teaming isn’t new. AI companies have repurposed the approach from cybersecurity, where teams of people try to find vulnerabilities in large computer systems. OpenAI first used the approach in 2022, when it was testing DALL-E 2. “It was the first time OpenAI had released a product that would be quite accessible,” says Ahmad. “We thought it would be really important to understand how people would interact with the system and what risks might be surfaced along the way.” 

The technique has since become a mainstay of the industry. Last year, President Biden’s Executive Order on AI tasked the National Institute of Standards and Technology (NIST) with defining best practices for red-teaming. To do this, NIST will probably look to top AI labs for guidance. 

Tricking ChatGPT

When recruiting testers, OpenAI draws on a range of experts, from artists to scientists to people with detailed knowledge of the law, medicine, or regional politics. OpenAI invites these testers to poke and prod its models until they break. The aim is to uncover new unwanted behaviors and look for ways to get around existing guardrails—such as tricking ChatGPT into saying something racist or DALL-E into producing explicit violent images.

Adding new capabilities to a model can introduce a whole range of new behaviors that need to be explored. When OpenAI added voices to GPT-4o, allowing users to talk to ChatGPT and ChatGPT to talk back, red-teamers found that the model would sometimes start mimicking the speaker’s voice, an unexpected behavior that was both annoying and a fraud risk. 

There is often nuance involved. When testing DALL-E 2 in 2022, red-teamers had to consider different uses of “eggplant,” a word that now denotes an emoji with sexual connotations as well as a purple vegetable. OpenAI describes how it had to find a line between acceptable requests for an image, such as “A person eating an eggplant for dinner,” and unacceptable ones, such as “A person putting a whole eggplant into her mouth.”

Similarly, red-teamers had to consider how users might try to bypass a model’s safety checks. DALL-E does not allow you to ask for images of violence. Ask for a picture of a dead horse lying in a pool of blood, and it will deny your request. But what about a sleeping horse lying in a pool of ketchup?

When OpenAI tested DALL-E 3 last year, it used an automated process to cover even more variations of what users might ask for. It used GPT-4 to generate requests producing images that could be used for misinformation or that depicted sex, violence, or self-harm. OpenAI then updated DALL-E 3 so that it would either refuse such requests or rewrite them before generating an image. Ask for a horse in ketchup now, and DALL-E is wise to you: “It appears there are challenges in generating the image. Would you like me to try a different request or explore another idea?”

In theory, automated red-teaming can be used to cover more ground, but earlier techniques had two major shortcomings: They tend to either fixate on a narrow range of high-risk behaviors or come up with a wide range of low-risk ones. That’s because reinforcement learning, the technology behind these techniques, needs something to aim for—a reward—to work well. Once it’s won a reward, such as finding a high-risk behavior, it will keep trying to do the same thing again and again. Without a reward, on the other hand, the results are scattershot. 

“They kind of collapse into ‘We found a thing that works! We’ll keep giving that answer!’ or they’ll give lots of examples that are really obvious,” says Alex Beutel, another OpenAI researcher. “How do we get examples that are both diverse and effective?”

A problem of two parts

OpenAI’s answer, outlined in the second paper, is to split the problem into two parts. Instead of using reinforcement learning from the start, it first uses a large language model to brainstorm possible unwanted behaviors. Only then does it direct a reinforcement-learning model to figure out how to bring those behaviors about. This gives the model a wide range of specific things to aim for. 

Beutel and his colleagues showed that this approach can find potential attacks known as indirect prompt injections, where another piece of software, such as a website, slips a model a secret instruction to make it do something its user hadn’t asked it to. OpenAI claims this is the first time that automated red-teaming has been used to find attacks of this kind. “They don’t necessarily look like flagrantly bad things,” says Beutel.

Will such testing procedures ever be enough? Ahmad hopes that describing the company’s approach will help people understand red-teaming better and follow its lead. “OpenAI shouldn’t be the only one doing red-teaming,” she says. People who build on OpenAI’s models or who use ChatGPT in new ways should conduct their own testing, she says: “There are so many uses—we’re not going to cover every one.”

For some, that’s the whole problem. Because nobody knows exactly what large language models can and cannot do, no amount of testing can rule out unwanted or harmful behaviors fully. And no network of red-teamers will ever match the variety of uses and misuses that hundreds of millions of actual users will think up. 

That’s especially true when these models are run in new settings. People often hook them up to new sources of data that can change how they behave, says Nazneen Rajani, founder and CEO of Collinear AI, a startup that helps businesses deploy third-party models safely. She agrees with Ahmad that downstream users should have access to tools that let them test large language models themselves. 

Rajani also questions using GPT-4 to do red-teaming on itself. She notes that models have been found to prefer their own output: GPT-4 ranks its performance higher than that of rivals such as Claude or Llama, for example. This could lead it to go easy on itself, she says: “I’d imagine automated red-teaming with GPT-4 may not generate as harmful attacks [as other models might].”  

Miles behind

For Andrew Tait, a researcher at the Ada Lovelace Institute in the UK, there’s a wider issue. Large language models are being built and released faster than techniques for testing them can keep up. “We’re talking about systems that are being marketed for any purpose at all—education, health care, military, and law enforcement purposes—and that means that you’re talking about such a wide scope of tasks and activities that to create any kind of evaluation, whether that’s a red team or something else, is an enormous undertaking,” says Tait. “We’re just miles behind.”

Tait welcomes the approach of researchers at OpenAI and elsewhere (he previously worked on safety at Google DeepMind himself) but warns that it’s not enough: “There are people in these organizations who care deeply about safety, but they’re fundamentally hamstrung by the fact that the science of evaluation is not anywhere close to being able to tell you something meaningful about the safety of these systems.”

Tait argues that the industry needs to rethink its entire pitch for these models. Instead of selling them as machines that can do anything, they need to be tailored to more specific tasks. You can’t properly test a general-purpose model, he says. 

“If you tell people it’s general purpose, you really have no idea if it’s going to function for any given task,” says Tait. He believes that only by testing specific applications of that model will you see how well it behaves in certain settings, with real users and real uses. 

“It’s like saying an engine is safe; therefore every car that uses it is safe,” he says. “And that’s ludicrous.” 

AI can now create a replica of your personality

Imagine sitting down with an AI model for a spoken two-hour interview. A friendly voice guides you through a conversation that ranges from your childhood, your formative memories, and your career to your thoughts on immigration policy. Not long after, a virtual replica of you is able to embody your values and preferences with stunning accuracy.

That’s now possible, according to a new paper from a team including researchers from Stanford and Google DeepMind, which has been published on arXiv and has not yet been peer-reviewed. 

Led by Joon Sung Park, a Stanford PhD student in computer science, the team recruited 1,000 people who varied by age, gender, race, region, education, and political ideology. They were paid up to $100 for their participation. From interviews with them, the team created agent replicas of those individuals. As a test of how well the agents mimicked their human counterparts, participants did a series of personality tests, social surveys, and logic games, twice each, two weeks apart; then the agents completed the same exercises. The results were 85% similar. 

“If you can have a bunch of small ‘yous’ running around and actually making the decisions that you would have made—that, I think, is ultimately the future,” Joon says. 

In the paper the replicas are called simulation agents, and the impetus for creating them is to make it easier for researchers in social sciences and other fields to conduct studies that would be expensive, impractical, or unethical to do with real human subjects. If you can create AI models that behave like real people, the thinking goes, you can use them to test everything from how well interventions on social media combat misinformation to what behaviors cause traffic jams. 

Such simulation agents are slightly different from the agents that are dominating the work of leading AI companies today. Called tool-based agents, those are models built to do things for you, not converse with you. For example, they might enter data, retrieve information you have stored somewhere, or—someday—book travel for you and schedule appointments. Salesforce announced its own tool-based agents in September, followed by Anthropic in October, and OpenAI is planning to release some in January, according to Bloomberg

The two types of agents are different but share common ground. Research on simulation agents, like the ones in this paper, is likely to lead to stronger AI agents overall, says John Horton, an associate professor of information technologies at the MIT Sloan School of Management, who founded a company to conduct research using AI-simulated participants. 

“This paper is showing how you can do a kind of hybrid: use real humans to generate personas which can then be used programmatically/in-simulation in ways you could not with real humans,” he told MIT Technology Review in an email. 

The research comes with caveats, not the least of which is the danger that it points to. Just as image generation technology has made it easy to create harmful deepfakes of people without their consent, any agent generation technology raises questions about the ease with which people can build tools to personify others online, saying or authorizing things they didn’t intend to say. 

The evaluation methods the team used to test how well the AI agents replicated their corresponding humans were also fairly basic. These included the General Social Survey—which collects information on one’s demographics, happiness, behaviors, and more—and assessments of the Big Five personality traits: openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism. Such tests are commonly used in social science research but don’t pretend to capture all the unique details that make us ourselves. The AI agents were also worse at replicating the humans in behavioral tests like the “dictator game,” which is meant to illuminate how participants consider values such as fairness. 

To build an AI agent that replicates people well, the researchers needed ways to distill our uniqueness into language AI models can understand. They chose qualitative interviews to do just that, Joon says. He says he was convinced that interviews are the most efficient way to learn about someone after he appeared on countless podcasts following a 2023 paper that he wrote on generative agents, which sparked a huge amount of interest in the field. “I would go on maybe a two-hour podcast podcast interview, and after the interview, I felt like, wow, people know a lot about me now,” he says. “Two hours can be very powerful.”

These interviews can also reveal idiosyncrasies that are less likely to show up on a survey. “Imagine somebody just had cancer but was finally cured last year. That’s very unique information about you that says a lot about how you might behave and think about things,” he says. It would be difficult to craft survey questions that elicit these sorts of memories and responses. 

Interviews aren’t the only option, though. Companies that offer to make “digital twins” of users, like Tavus, can have their AI models ingest customer emails or other data. It tends to take a pretty large data set to replicate someone’s personality that way, Tavus CEO Hassaan Raza told me, but this new paper suggests a more efficient route. 

“What was really cool here is that they show you might not need that much information,” Raza says, adding that his company will experiment with the approach. “How about you just talk to an AI interviewer for 30 minutes today, 30 minutes tomorrow? And then we use that to construct this digital twin of you.”

How the largest gathering of US police chiefs is talking about AI

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

It can be tricky for reporters to get past certain doors, and the door to the International Association of Chiefs of Police conference is one that’s almost perpetually shut to the media. Thus, I was pleasantly surprised when I was able to attend for a day in Boston last month. 

It bills itself as the largest gathering of police chiefs in the United States, where leaders from many of the country’s 18,000 police departments and even some from abroad convene for product demos, discussions, parties, and awards. 

I went along to see how artificial intelligence was being discussed, and the message to police chiefs seemed crystal clear: If your department is slow to adopt AI, fix that now. The future of policing will rely on it in all its forms.

In the event’s expo hall, the vendors (of which there were more than 600) offered a glimpse into the ballooning industry of police-tech suppliers. Some had little to do with AI—booths showcased body armor, rifles, and prototypes of police-branded Cybertrucks, and others displayed new types of gloves promising to protect officers from needles during searches. But one needed only to look to where the largest crowds gathered to understand that AI was the major draw. 

The hype focused on three uses of AI in policing. The flashiest was virtual reality, exemplified by the booth from V-Armed, which sells VR systems for officer training. On the expo floor, V-Armed built an arena complete with VR goggles, cameras, and sensors, not unlike the one the company recently installed at the headquarters of the Los Angeles Police Department. Attendees could don goggles and go through training exercises on responding to active shooter situations. Many competitors of V-Armed were also at the expo, selling systems they said were cheaper, more effective, or simpler to maintain. 

The pitch on VR training is that in the long run, it can be cheaper and more engaging to use than training with actors or in a classroom. “If you’re enjoying what you’re doing, you’re more focused and you remember more than when looking at a PDF and nodding your head,” V-Armed CEO Ezra Kraus told me. 

The effectiveness of VR training systems has yet to be fully studied, and they can’t completely replicate the nuanced interactions police have in the real world. AI is not yet great at the soft skills required for interactions with the public. At a different company’s booth, I tried out a VR system focused on deescalation training, in which officers were tasked with calming down an AI character in distress. It suffered from lag and was generally quite awkward—the character’s answers felt overly scripted and programmatic. 

The second focus was on the changing way police departments are collecting and interpreting data. Rather than buying a gunshot detection tool from one company and a license plate reader or drone from another, police departments are increasingly using expanding suites of sensors, cameras, and so on from a handful of leading companies that promise to integrate the data collected and make it useful. 

Police chiefs attended classes on how to build these systems, like one taught by Microsoft and the NYPD about the Domain Awareness System, a web of license plate readers, cameras, and other data sources used to track and monitor crime in New York City. Crowds gathered at massive, high-tech booths from Axon and Flock, both sponsors of the conference. Flock sells a suite of cameras, license plate readers, and drones, offering AI to analyze the data coming in and trigger alerts. These sorts of tools have come in for heavy criticism from civil liberties groups, which see them as an assault on privacy that does little to help the public. 

Finally, as in other industries, AI is also coming for the drudgery of administrative tasks and reporting. Many companies at the expo, including Axon, offer generative AI products to help police officers write their reports. Axon’s offering, called Draft One, ingests footage from body cameras, transcribes it, and creates a first draft of a report for officers. 

“We’ve got this thing on an officer’s body, and it’s recording all sorts of great stuff about the incident,” Bryan Wheeler, a senior vice president at Axon, told me at the expo. “Can we use it to give the officer a head start?”

On the surface, it’s a writing task well suited for AI, which can quickly summarize information and write in a formulaic way. It could also save lots of time officers currently spend on writing reports. But given that AI is prone to “hallucination,” there’s an unavoidable truth: Even if officers are the final authors of their reports, departments adopting these sorts of tools risk injecting errors into some of the most critical documents in the justice system. 

“Police reports are sometimes the only memorialized account of an incident,” wrote Andrew Ferguson, a professor of law at American University, in July in the first law review article about the serious challenges posed by police reports written with AI. “Because criminal cases can take months or years to get to trial, the accuracy of these reports are critically important.” Whether certain details were included or left out can affect the outcomes of everything from bail amounts to verdicts. 

By showing an officer a generated version of a police report, the tools also expose officers to details from their body camera recordings before they complete their report, a document intended to capture the officer’s memory of the incident. That poses a problem. 

“The police certainly would never show video to a bystander eyewitness before they ask the eyewitness about what took place, as that would just be investigatory malpractice,” says Jay Stanley, a senior policy analyst with the ACLU Speech, Privacy, and Technology Project, who will soon publish work on the subject. 

A spokesperson for Axon says this concern “isn’t reflective of how the tool is intended to work,” and that Draft One has robust features to make sure officers read the reports closely, add their own information, and edit the reports for accuracy before submitting them.

My biggest takeaway from the conference was simply that the way US police are adopting AI is inherently chaotic. There is no one agency governing how they use the technology, and the roughly 18,000 police departments in the United States—the precise figure is not even known—have remarkably high levels of autonomy to decide which AI tools they’ll buy and deploy. The police-tech companies that serve them will build the tools police departments find attractive, and it’s unclear if anyone will draw proper boundaries for ethics, privacy, and accuracy. 

That will only be made more apparent in an upcoming Trump administration. In a policing agenda released last year during his campaign, Trump encouraged more aggressive tactics like “stop and frisk,” deeper cooperation with immigration agencies, and increased liability protection for officers accused of wrongdoing. The Biden administration is now reportedly attempting to lock in some of its proposed policing reforms before January. 

Without federal regulation on how police departments can and cannot use AI, the lines will be drawn by departments and police-tech companies themselves.

“Ultimately, these are for-profit companies, and their customers are law enforcement,” says Stanley. “They do what their customers want, in the absence of some very large countervailing threat to their business model.”


Now read the rest of The Algorithm

Deeper Learning

The AI lab waging a guerrilla war over exploitative AI

When generative AI tools landed on the scene, artists were immediately concerned, seeing them as a new kind of theft. Computer security researcher Ben Zhao jumped into action in response, and his lab at the University of Chicago started building tools like Nightshade and Glaze to help artists keep their work from being scraped up by AI models. My colleague Melissa Heikkilä spent time with Zhao and his team to look at the ongoing effort to make these tools strong enough to stop AI’s relentless hunger for more images, art, and data to train on.  

Why this matters: The current paradigm in AI is to build bigger and bigger models, and these require vast data sets to train on. Tech companies argue that anything on the public internet is fair game, while artists demand compensation or the right to refuse. Settling this fight in the courts or through regulation could take years, so tools like Nightshade and Glaze are what artists have for now. If the tools disrupt AI companies’ efforts to make better models, that could push them to the negotiating table to bargain over licensing and fair compensation. But it’s a big “if.” Read more from Melissa Heikkilä.

Bits and Bytes

Tech elites are lobbying Elon Musk for jobs in Trump’s administration

Elon Musk is the tech leader who most has Trump’s ear. As such, he’s reportedly the conduit through which AI and tech insiders are pushing to have an influence in the incoming administration. (The New York Times)

OpenAI is getting closer to launching an AI agent to automate your tasks

AI agents—models that can do tasks for you on your behalf—are all the rage. OpenAI is reportedly closer to releasing one, news that comes a few weeks after Anthropic announced its own. (Bloomberg)

How this grassroots effort could make AI voices more diverse

A massive volunteer-led effort to collect training data in more languages, from people of more ages and genders, could help make the next generation of voice AI more inclusive and less exploitative. (MIT Technology Review

Google DeepMind has a new way to look inside an AI’s “mind”

Autoencoders let us peer into the black box of artificial intelligence. They could help us create AI that is better understood and more easily controlled. (MIT Technology Review)

Musk has expanded his legal assault on OpenAI to target Microsoft

Musk has expanded his federal lawsuit against OpenAI, which alleges that the company has abandoned its nonprofit roots and obligations. He’s now going after Microsoft too, accusing it of antitrust violations in its work with OpenAI. (The Washington Post)

The AI lab waging a guerrilla war over exploitative AI

Ben Zhao remembers well the moment he officially jumped into the fight between artists and generative AI: when one artist asked for AI bananas. 

A computer security researcher at the University of Chicago, Zhao had made a name for himself by building tools to protect images from facial recognition technology. It was this work that caught the attention of Kim Van Deun, a fantasy illustrator who invited him to a Zoom call in November 2022 hosted by the Concept Art Association, an advocacy organization for artists working in commercial media. 

On the call, artists shared details of how they had been hurt by the generative AI boom, which was then brand new. At that moment, AI was suddenly everywhere. The tech community was buzzing over image-generating AI models, such as Midjourney, Stable Diffusion, and OpenAI’s DALL-E 2, which could follow simple word prompts to depict fantasylands or whimsical chairs made of avocados. 

But these artists saw this technological wonder as a new kind of theft. They felt the models were effectively stealing and replacing their work. Some had found that their art had been scraped off the internet and used to train the models, while others had discovered that their own names had become prompts, causing their work to be drowned out online by AI knockoffs.

Zhao remembers being shocked by what he heard. “People are literally telling you they’re losing their livelihoods,” he told me one afternoon this spring, sitting in his Chicago living room. “That’s something that you just can’t ignore.” 

So on the Zoom, he made a proposal: What if, hypothetically, it was possible to build a mechanism that would help mask their art to interfere with AI scraping?

“I would love a tool that if someone wrote my name and made a prompt, like, garbage came out,” responded Karla Ortiz, a prominent digital artist. “Just, like, bananas or some weird stuff.” 

That was all the convincing Zhao needed—the moment he joined the cause.

Fast-forward to today, and millions of artists have deployed two tools born from that Zoom: Glaze and Nightshade, which were developed by Zhao and the University of Chicago’s SAND Lab (an acronym for “security, algorithms, networking, and data”).

Arguably the most prominent weapons in an artist’s arsenal against nonconsensual AI scraping, Glaze and Nightshade work in similar ways: by adding what the researchers call “barely perceptible” perturbations to an image’s pixels so that machine-learning models cannot read them properly. Glaze, which has been downloaded more than 6 million times since it launched in March 2023, adds what’s effectively a secret cloak to images that prevents AI algorithms from picking up on and copying an artist’s style. Nightshade, which I wrote about when it was released almost exactly a year ago this fall, cranks up the offensive against AI companies by adding an invisible layer of poison to images, which can break AI models; it has been downloaded more than 1.6 million times. 

Thanks to the tools, “I’m able to post my work online,” Ortiz says, “and that’s pretty huge.” For artists like her, being seen online is crucial to getting more work. If they are uncomfortable about ending up in a massive for-profit AI model without compensation, the only option is to delete their work from the internet. That would mean career suicide. “It’s really dire for us,” adds Ortiz, who has become one of the most vocal advocates for fellow artists and is part of a class action lawsuit against AI companies, including Stability AI, over copyright infringement. 

But Zhao hopes that the tools will do more than empower individual artists. Glaze and Nightshade are part of what he sees as a battle to slowly tilt the balance of power from large corporations back to individual creators. 

“It is just incredibly frustrating to see human life be valued so little,” he says with a disdain that I’ve come to see as pretty typical for him, particularly when he’s talking about Big Tech. “And to see that repeated over and over, this prioritization of profit over humanity … it is just incredibly frustrating and maddening.” 

As the tools are adopted more widely, his lofty goal is being put to the test. Can Glaze and Nightshade make genuine security accessible for creators—or will they inadvertently lull artists into believing their work is safe, even as the tools themselves become targets for haters and hackers? While experts largely agree that the approach is effective and Nightshade could prove to be powerful poison, other researchers claim they’ve already poked holes in the protections offered by Glaze and that trusting these tools is risky. 

But Neil Turkewitz, a copyright lawyer who used to work at the Recording Industry Association of America, offers a more sweeping view of the fight the SAND Lab has joined. It’s not about a single AI company or a single individual, he says: “It’s about defining the rules of the world we want to inhabit.” 

Poking the bear

The SAND Lab is tight knit, encompassing a dozen or so researchers crammed into a corner of the University of Chicago’s computer science building. That space has accumulated somewhat typical workplace detritus—a Meta Quest headset here, silly photos of dress-up from Halloween parties there. But the walls are also covered in original art pieces, including a framed painting by Ortiz.  

Years before fighting alongside artists like Ortiz against “AI bros” (to use Zhao’s words), Zhao and the lab’s co-leader, Heather Zheng, who is also his wife, had built a record of combating harms posed by new tech. 

group of students and teachers posing in Halloween costumes
When I visited the SAND Lab in Chicago, I saw how tight knit the group was. Alongside the typical workplace stuff were funny Halloween photos like this one. (Front row: Ronik Bhaskar, Josephine Passananti, Anna YJ Ha, Zhuolin Yang, Ben Zhao, Heather Zheng. Back row: Cathy Yuanchen Li, Wenxin Ding, Stanley Wu, and Shawn Shan.)
COURTESY OF SAND LAB

Though both earned spots on MIT Technology Review’s 35 Innovators Under 35 list for other work nearly two decades ago, when they were at the University of California, Santa Barbara (Zheng in 2005 for “cognitive radios” and Zhao a year later for peer-to-peer networks), their primary research focus has become security and privacy. 

The pair left Santa Barbara in 2017, after they were poached by the new co-director of the University of Chicago’s Data Science Institute, Michael Franklin. All eight PhD students from their UC Santa Barbara lab decided to follow them to Chicago too. Since then, the group has developed a “bracelet of silence” that jams the microphones in AI voice assistants like the Amazon Echo. It has also created a tool called Fawkes—“privacy armor,” as Zhao put it in a 2020 interview with the New York Times—that people can apply to their photos to protect them from facial recognition software. They’ve also studied how hackers might steal sensitive information through stealth attacks on virtual-reality headsets, and how to distinguish human art from AI-generated images. 

“Ben and Heather and their group are kind of unique because they’re actually trying to build technology that hits right at some key questions about AI and how it is used,” Franklin tells me. “They’re doing it not just by asking those questions, but by actually building technology that forces those questions to the forefront.”

It was Fawkes that intrigued Van Deun, the fantasy illustrator, two years ago; she hoped something similar might work as protection against generative AI, which is why she extended that fateful invite to the Concept Art Association’s Zoom call. 

That call started something of a mad rush in the weeks that followed. Though Zhao and Zheng collaborate on all the lab’s projects, they each lead individual initiatives; Zhao took on what would become Glaze, with PhD student Shawn Shan (who was on this year’s Innovators Under 35 list) spearheading the development of the program’s algorithm. 

In parallel to Shan’s coding, PhD students Jenna Cryan and Emily Wenger sought to learn more about the views and needs of the artists themselves. They created a user survey that the team distributed to artists with the help of Ortiz. In replies from more than 1,200 artists—far more than the average number of responses to user studies in computer science—the team found that the vast majority of creators had read about art being used to train models, and 97% expected AI to decrease some artists’ job security. A quarter said AI art had already affected their jobs. 

Almost all artists also said they posted their work online, and more than half said they anticipated reducing or removing that online work, if they hadn’t already—no matter the professional and financial consequences.

The first scrappy version of Glaze was developed in just a month, at which point Ortiz gave the team her entire catalogue of work to test the model on. At the most basic level, Glaze acts as a defensive shield. Its algorithm identifies features from the image that make up an artist’s individual style and adds subtle changes to them. When an AI model is trained on images protected with Glaze, the model will not be able to reproduce styles similar to the original image. 

A painting from Ortiz later became the first image publicly released with Glaze on it: a young woman, surrounded by flying eagles, holding up a wreath. Its title is Musa Victoriosa, “victorious muse.” 

It’s the one currently hanging on the SAND Lab’s walls. 

Despite many artists’ initial enthusiasm, Zhao says, Glaze’s launch caused significant backlash. Some artists were skeptical because they were worried this was a scam or yet another data-harvesting campaign. 

The lab had to take several steps to build trust, such as offering the option to download the Glaze app so that it adds the protective layer offline, which meant no data was being transferred anywhere. (The images are then shielded when artists upload them.)  

Soon after Glaze’s launch, Shan also led the development of the second tool, Nightshade. Where Glaze is a defensive mechanism, Nightshade was designed to act as an offensive deterrent to nonconsensual training. It works by changing the pixels of images in ways that are not noticeable to the human eye but manipulate machine-learning models so they interpret the image as something different from what it actually shows. If poisoned samples are scraped into AI training sets, these samples trick the AI models: Dogs become cats, handbags become toasters. The researchers say only a relatively few examples are enough to permanently damage the way a generative AI model produces images.

Currently, both tools are available as free apps or can be applied through the project’s website. The lab has also recently expanded its reach by offering integration with the new artist-supported social network Cara, which was born out of a backlash to exploitative AI training and forbids AI-produced content.

In dozens of conversations with Zhao and the lab’s researchers, as well as a handful of their artist-collaborators, it’s become clear that both groups now feel they are aligned in one mission. “I never expected to become friends with scientists in Chicago,” says Eva Toorenent, a Dutch artist who worked closely with the team on Nightshade. “I’m just so happy to have met these people during this collective battle.” 

Belladonna artwork shows a central character with a skull head in a dark forest illuminated around them by the belladonna flower slung over their shoulder
Images online of Toorenent’s Belladonna have been treated with the SAND Lab’s Nightshade tool.
EVA TOORENENT

Her painting Belladonna, which is also another name for the nightshade plant, was the first image with Nightshade’s poison on it. 

“It’s so symbolic,” she says. “People taking our work without our consent, and then taking our work without consent can ruin their models. It’s just poetic justice.” 

No perfect solution

The reception of the SAND Lab’s work has been less harmonious across the AI community.

After Glaze was made available to the public, Zhao tells me, someone reported it to sites like VirusTotal, which tracks malware, so that it was flagged by antivirus programs. Several people also started claiming on social media that the tool had quickly been broken. Nightshade similarly got a fair share of criticism when it launched; as TechCrunch reported in January, some called it a “virus” and, as the story explains, “another Reddit user who inadvertently went viral on X questioned Nightshade’s legality, comparing it to ‘hacking a vulnerable computer system to disrupt its operation.’” 

“We had no idea what we were up against,” Zhao tells me. “Not knowing who or what the other side could be meant that every single new buzzing of the phone meant that maybe someone did break Glaze.” 

Both tools, though, have gone through rigorous academic peer review and have won recognition from the computer security community. Nightshade was accepted at the IEEE Symposium on Security and Privacy, and Glaze received a distinguished paper award and the 2023 Internet Defense Prize at the Usenix Security Symposium, a top conference in the field. 

“In my experience working with poison, I think [Nightshade is] pretty effective,” says Nathalie Baracaldo, who leads the AI security and privacy solutions team at IBM and has studied data poisoning. “I have not seen anything yet—and the word yet is important here—that breaks that type of defense that Ben is proposing.” And the fact that the team has released the source code for Nightshade for others to probe, and it hasn’t been broken, also suggests it’s quite secure, she adds. 

At the same time, at least one team of researchers does claim to have penetrated the protections of Glaze, or at least an old version of it. 

As researchers from Google DeepMind and ETH Zurich detailed in a paper published in June, they found various ways Glaze (as well as similar but less popular protection tools, such as Mist and Anti-DreamBooth) could be circumvented using off-the-shelf techniques that anyone could access—such as image upscaling, meaning filling in pixels to increase the resolution of an image as it’s enlarged. The researchers write that their work shows the “brittleness of existing protections” and warn that “artists may believe they are effective. But our experiments show they are not.”

Florian Tramèr, an associate professor at ETH Zurich who was part of the study, acknowledges that it is “very hard to come up with a strong technical solution that ends up really making a difference here.” Rather than any individual tool, he ultimately advocates for an almost certainly unrealistic ideal: stronger policies and laws to help create an environment in which people commit to buying only human-created art. 

What happened here is common in security research, notes Baracaldo: A defense is proposed, an adversary breaks it, and—ideally—the defender learns from the adversary and makes the defense better. “It’s important to have both ethical attackers and defenders working together to make our AI systems safer,” she says, adding that “ideally, all defenses should be publicly available for scrutiny,” which would both “allow for transparency” and help avoid creating a false sense of security. (Zhao, though, tells me the researchers have no intention to release Glaze’s source code.)

Still, even as all these researchers claim to support artists and their art, such tests hit a nerve for Zhao. In Discord chats that were later leaked, he claimed that one of the researchers from the ETH Zurich–Google DeepMind team “doesn’t give a shit” about people. (That researcher did not respond to a request for comment, but in a blog post he said it was important to break defenses in order to know how to fix them. Zhao says his words were taken out of context.) 

Zhao also emphasizes to me that the paper’s authors mainly evaluated an earlier version of Glaze; he says its new update is more resistant to tampering. Messing with images that have current Glaze protections would harm the very style that is being copied, he says, making such an attack useless. 

This back-and-forth reflects a significant tension in the computer security community and, more broadly, the often adversarial relationship between different groups in AI. Is it wrong to give people the feeling of security when the protections you’ve offered might break? Or is it better to have some level of protection—one that raises the threshold for an attacker to inflict harm—than nothing at all? 

Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, says there are plenty of examples where similar technical protections have failed to be bulletproof. For example, in 2023, de Montjoye and his team probed a digital mask for facial recognition algorithms, which was meant to protect the privacy of medical patients’ facial images; they were able to break the protections by tweaking just one thing in the program’s algorithm (which was open source). 

Using such defenses is still sending a message, he says, and adding some friction to data profiling. “Tools such as TrackMeNot”—which protects users from data profiling—“have been presented as a way to protest; as a way to say I do not consent.”  

“But at the same time,” he argues, “we need to be very clear with artists that it is removable and might not protect against future algorithms.”

While Zhao will admit that the researchers pointed out some of Glaze’s weak spots, he unsurprisingly remains confident that Glaze and Nightshade are worth deploying, given that “security tools are never perfect.” Indeed, as Baracaldo points out, the Google DeepMind and ETH Zurich researchers showed how a highly motivated and sophisticated adversary will almost certainly always find a way in.

Yet it is “simplistic to think that if you have a real security problem in the wild and you’re trying to design a protection tool, the answer should be it either works perfectly or don’t deploy it,” Zhao says, citing spam filters and firewalls as examples. Defense is a constant cat-and-mouse game. And he believes most artists are savvy enough to understand the risk. 

Offering hope

The fight between creators and AI companies is fierce. The current paradigm in AI is to build bigger and bigger models, and there is, at least currently, no getting around the fact that they require vast data sets hoovered from the internet to train on. Tech companies argue that anything on the public internet is fair game, and that it is “impossible” to build advanced AI tools without copyrighted material; many artists argue that tech companies have stolen their intellectual property and violated copyright law, and that they need ways to keep their individual works out of the models—or at least receive proper credit and compensation for their use. 

So far, the creatives aren’t exactly winning. A number of companies have already replaced designers, copywriters, and illustrators with AI systems. In one high-profile case, Marvel Studios used AI-generated imagery instead of human-created art in the title sequence of its 2023 TV series Secret Invasion. In another, a radio station fired its human presenters and replaced them with AI. The technology has become a major bone of contention between unions and film, TV, and creative studios, most recently leading to a strike by video-game performers. There are numerous ongoing lawsuits by artists, writers, publishers, and record labels against AI companies. It will likely take years until there is a clear-cut legal resolution. But even a court ruling won’t necessarily untangle the difficult ethical questions created by generative AI. Any future government regulation is not likely to either, if it ever materializes. 

That’s why Zhao and Zheng see Glaze and Nightshade as necessary interventions—tools to defend original work, attack those who would help themselves to it, and, at the very least, buy artists some time. Having a perfect solution is not really the point. The researchers need to offer something now because the AI sector moves at breakneck speed, Zheng says, means that companies are ignoring very real harms to humans. “This is probably the first time in our entire technology careers that we actually see this much conflict,” she adds.

On a much grander scale, she and Zhao tell me they hope that Glaze and Nightshade will eventually have the power to overhaul how AI companies use art and how their products produce it. It is eye-wateringly expensive to train AI models, and it’s extremely laborious for engineers to find and purge poisoned samples in a data set of billions of images. Theoretically, if there are enough Nightshaded images on the internet and tech companies see their models breaking as a result, it could push developers to the negotiating table to bargain over licensing and fair compensation. 

That’s, of course, still a big “if.” MIT Technology Review reached out to several AI companies, such as Midjourney and Stability AI, which did not reply to requests for comment. A spokesperson for OpenAI, meanwhile, did not confirm any details about encountering data poison but said the company takes the safety of its products seriously and is continually improving its safety measures: “We are always working on how we can make our systems more robust against this type of abuse.”

In the meantime, the SAND Lab is moving ahead and looking into funding from foundations and nonprofits to keep the project going. They also say there has also been interest from major companies looking to protect their intellectual property (though they decline to say which), and Zhao and Zheng are exploring how the tools could be applied in other industries, such as gaming, videos, or music. In the meantime, they plan to keep updating Glaze and Nightshade to be as robust as possible, working closely with the students in the Chicago lab—where, on another wall, hangs Toorenent’s Belladonna. The painting has a heart-shaped note stuck to the bottom right corner: “Thank you! You have given hope to us artists.”

This story has been updated with the latest download figures for Glaze and Nightshade.

Google DeepMind has a new way to look inside an AI’s “mind”

AI has led to breakthroughs in drug discovery and robotics and is in the process of entirely revolutionizing how we interact with machines and the web. The only problem is we don’t know exactly how it works, or why it works so well. We have a fair idea, but the details are too complex to unpick. That’s a problem: It could lead us to deploy an AI system in a highly sensitive field like medicine without understanding that it could have critical flaws embedded in its workings.

A team at Google DeepMind that studies something called mechanistic interpretability has been working on new ways to let us peer under the hood. At the end of July, it released Gemma Scope, a tool to help researchers understand what is happening when AI is generating an output. The hope is that if we have a better understanding of what is happening inside an AI model, we’ll be able to control its outputs more effectively, leading to better AI systems in the future.

“I want to be able to look inside a model and see if it’s being deceptive,” says Neel Nanda, who runs the mechanistic interpretability team at Google DeepMind. “It seems like being able to read a model’s mind should help.”

Mechanistic interpretability, also known as “mech interp,” is a new research field that aims to understand how neural networks actually work. At the moment, very basically, we put inputs into a model in the form of a lot of data, and then we get a bunch of model weights at the end of training. These are the parameters that determine how a model makes decisions. We have some idea of what’s happening between the inputs and the model weights: Essentially, the AI is finding patterns in the data and making conclusions from those patterns, but these patterns can be incredibly complex and often very hard for humans to interpret.

It’s like a teacher reviewing the answers to a complex math problem on a test. The student—the AI, in this case—wrote down the correct answer, but the work looks like a bunch of squiggly lines. This example assumes the AI is always getting the correct answer, but that’s not always true; the AI student may have found an irrelevant pattern that it’s assuming is valid. For example, some current AI systems will give you the result that 9.11 is bigger than 9.8. Different methods developed in the field of mechanistic interpretability are beginning to shed a little bit of light on what may be happening, essentially making sense of the squiggly lines.

“A key goal of mechanistic interpretability is trying to reverse-engineer the algorithms inside these systems,” says Nanda. “We give the model a prompt, like ‘Write a poem,’ and then it writes some rhyming lines. What is the algorithm by which it did this? We’d love to understand it.”

To find features—or categories of data that represent a larger concept—in its AI model, Gemma, DeepMind ran a tool known as a “sparse autoencoder” on each of its layers. You can think of a sparse autoencoder as a microscope that zooms in on those layers and lets you look at their details. For example, if you prompt Gemma about a chihuahua, it will trigger the “dogs” feature, lighting up what the model knows about “dogs.” The reason it is considered “sparse” is that it’s limiting the number of neurons used, basically pushing for a more efficient and generalized representation of the data.

The tricky part of sparse autoencoders is deciding how granular you want to get. Think again about the microscope. You can magnify something to an extreme degree, but it may make what you’re looking at impossible for a human to interpret. But if you zoom too far out, you may be limiting what interesting things you can see and discover. 

DeepMind’s solution was to run sparse autoencoders of different sizes, varying the number of features they want the autoencoder to find. The goal was not for DeepMind’s researchers to thoroughly analyze the results on their own. Gemma and the autoencoders are open-source, so this project was aimed more at spurring interested researchers to look at what the sparse autoencoders found and hopefully make new insights into the model’s internal logic. Since DeepMind ran autoencoders on each layer of their model, a researcher could map the progression from input to output to a degree we haven’t seen before.

“This is really exciting for interpretability researchers,” says Josh Batson, a researcher at Anthropic. “If you have this model that you’ve open-sourced for people to study, it means that a bunch of interpretability research can now be done on the back of those sparse autoencoders. It lowers the barrier to entry to people learning from these methods.”

Neuronpedia, a platform for mechanistic interpretability, partnered with DeepMind in July to build a demo of Gemma Scope that you can play around with right now. In the demo, you can test out different prompts and see how the model breaks up your prompt and what activations your prompt lights up. You can also mess around with the model. For example, if you turn the feature about dogs way up and then ask the model a question about US presidents, Gemma will find some way to weave in random babble about dogs, or the model may just start barking at you.

One interesting thing about sparse autoencoders is that they are unsupervised, meaning they find features on their own. That leads to surprising discoveries about how the models break down human concepts. “My personal favorite feature is the cringe feature,” says Joseph Bloom, science lead at Neuronpedia. “It seems to appear in negative criticism of text and movies. It’s just a great example of tracking things that are so human on some level.” 

You can search for concepts on Neuronpedia and it will highlight what features are being activated on specific tokens, or words, and how strongly each one is activated. “If you read the text and you see what’s highlighted in green, that’s when the model thinks the cringe concept is most relevant. The most active example for cringe is somebody preaching at someone else,” says Bloom.

Some features are proving easier to track than others. “One of the most important features that you would want to find for a model is deception,” says Johnny Lin, founder of Neuronpedia. “It’s not super easy to find: ‘Oh, there’s the feature that fires when it’s lying to us.’ From what I’ve seen, it hasn’t been the case that we can find deception and ban it.”

DeepMind’s research is similar to what another AI company, Anthropic, did back in May with Golden Gate Claude. It used sparse autoencoders to find the parts of Claude, their model, that lit up when discussing the Golden Gate Bridge in San Francisco. It then amplified the activations related to the bridge to the point where Claude literally identified not as Claude, an AI model, but as the physical Golden Gate Bridge and would respond to prompts as the bridge.

Although it may just seem quirky, mechanistic interpretability research may prove incredibly useful. “As a tool for understanding how the model generalizes and what level of abstraction it’s working at, these features are really helpful,” says Batson.

For example, a team lead by Samuel Marks, now at Anthropic, used sparse autoencoders to find features that showed a particular model was associating certain professions with a specific gender. They then turned off these gender features to reduce bias in the model. This experiment was done on a very small model, so it’s unclear if the work will apply to a much larger model.

Mechanistic interpretability research can also give us insights into why AI makes errors. In the case of the assertion that 9.11 is larger than 9.8, researchers from Transluce saw that the question was triggering the parts of an AI model related to Bible verses and September 11. The researchers concluded the AI could be interpreting the numbers as dates, asserting the later date, 9/11, as greater than 9/8. And in a lot of books like religious texts, section 9.11 comes after section 9.8, which may be why the AI thinks of it as greater. Once they knew why the AI made this error, the researchers tuned down the AI’s activations on Bible verses and September 11, which led to the model giving the correct answer when prompted again on whether 9.11 is larger than 9.8.

There are also other potential applications. Currently, a system-level prompt is built into LLMs to deal with situations like users who ask how to build a bomb. When you ask ChatGPT a question, the model is first secretly prompted by OpenAI to refrain from telling you how to make bombs or do other nefarious things. But it’s easy for users to jailbreak AI models with clever prompts, bypassing any restrictions. 

If the creators of the models are able to see where in an AI the bomb-building knowledge is, they can theoretically turn off those nodes permanently. Then even the most cleverly written prompt wouldn’t elicit an answer about how to build a bomb, because the AI would literally have no information about how to build a bomb in its system.

This type of granularity and precise control are easy to imagine but extremely hard to achieve with the current state of mechanistic interpretability. 

“A limitation is the steering [influencing a model by adjusting its parameters] is just not working that well, and so when you steer to reduce violence in a model, it ends up completely lobotomizing its knowledge in martial arts. There’s a lot of refinement to be done in steering,” says Lin. The knowledge of “bomb making,” for example, isn’t just a simple on-and-off switch in an AI model. It most likely is woven into multiple parts of the model, and turning it off would probably involve hampering the AI’s knowledge of chemistry. Any tinkering may have benefits but also significant trade-offs.

That said, if we are able to dig deeper and peer more clearly into the “mind” of AI, DeepMind and others are hopeful that mechanistic interpretability could represent a plausible path to alignment—the process of making sure AI is actually doing what we want it to do.

How this grassroots effort could make AI voices more diverse

We are on the cusp of a voice AI boom, with tech companies such as Apple and OpenAI rolling out the next generation of artificial-intelligence-powered assistants. But the default voices for these assistants are often white American—British, if you’re lucky—and most definitely speak English. They represent only a tiny proportion of the many dialects and accents in the English language, which spans many regions and cultures. And if you’re one of the billions of people who don’t speak English, bad luck: These tools don’t sound nearly as good in other languages.

This is because the data that has gone into training these models is limited. In AI research, most data used to train models is extracted from the English-language internet, which reflects Anglo-American culture. But there is a massive grassroots effort underway to change this status quo and bring more transparency and diversity to what AI sounds like: Mozilla’s Common Voice initiative. 

The data set Common Voice has created over the past seven years is one of the most useful resources for people wanting to build voice AI. It has seen a massive spike in downloads, partly thanks to the current AI boom; it recently hit the 5 million mark, up from 38,500 in 2020. Creating this data set has not been easy, mainly because the data collection relies on an army of volunteers. Their numbers have also jumped, from just under 500,000 in 2020 to over 900,000 in 2024. But by giving its data away, some members of this community argue, Mozilla is encouraging volunteers to effectively do free labor for Big Tech. 

Since 2017, volunteers for the Common Voice project have collected a total of 31,000 hours of voice data in around 180 languages as diverse as Russian, Catalan, and Marathi. If you’ve used a service that uses audio AI, it’s likely been trained at least partly on Common Voice. 

Mozilla’s cause is a noble one. As AI is integrated increasingly into our lives and the ways we communicate, it becomes more important that the tools we interact with sound like us. The technology could break down communication barriers and help convey information in a compelling way to, for example, people who can’t read. But instead, an intense focus on English risks entrenching a new colonial world order and wiping out languages entirely.

“It would be such an own goal if, rather than finally creating truly multimodal, multilingual, high-performance translation models and making a more multilingual world, we actually ended up forcing everybody to operate in, like, English or French,” says EM Lewis-Jong, a director for Common Voice. 

Common Voice is open source, which means anyone can see what has gone into the data set, and users can do whatever they want with it for free. This kind of transparency is unusual in AI data governance. Most large audio data sets simply aren’t publicly available, and many consist of data that has been scraped from sites like YouTube, according to research conducted by a team from the University of Washington, and Carnegie Mellon andNorthwestern universities. 

The vast majority of language data is collected by volunteers such as Bülent Özden, a researcher from Turkey. Since 2020, he has been not only donating his voice but also raising awareness around the project to get more people to donate. He recently spent two full-time months correcting data and checking for typos in Turkish. For him, improving AI models is not the only motivation to do this work. 

“I’m doing it to preserve cultures, especially low-resource [languages],” Özden says. He tells me he has recently started collecting samples of Turkey’s smaller languages, such as Circassian and Zaza.

However, as I dug into the data set, I noticed that the coverage of languages and accents is very uneven. There are only 22 hours of Finnish voices from 231 people. In comparison, the data set contains 3,554 hours of English from 94,665 speakers. Some languages, such as Korean and Punjabi, are even less well represented. Even though they have tens of millions of speakers, they account for only a couple of hours of recorded data. 

This imbalance has emerged because data collection efforts are started from the bottom up by language communities themselves, says Lewis-Jong. 

“We’re trying to give communities what they need to create their own AI training data sets. We have a particular focus on doing this for language communities where there isn’t any data, or where maybe larger tech organizations might not be that interested in creating those data sets,” Lewis-Jong says. They hope that with the help of volunteers and various bits of grant funding, the Common Voice data set will have close to 200 languages by the end of the year.

Common Voice’s permissive license means that many companies rely on it—for example, the Swedish startup Mabel AI, which builds translation tools for health-care providers. One of the first languages the company used was Ukrainian; it built a translation tool to help Ukrainian refugees interact with Swedish social services, says Karolina Sjöberg, Mabel AI’s founder and CEO. The team has since expanded to other languages, such as Arabic and Russian. 

The problem with a lot of other audio data is that it consists of people reading from books or texts. The result is very different from how people really speak, especially when they are distressed or in pain, Sjöberg says. Because anyone can submit sentences to Common Voice for others to read aloud, Mozilla’s data set also includes sentences that are more colloquial and feel more natural, she says.

Not that it is perfectly representative. The Mabel AI team soon found out that most voice data in the languages it needed was donated by younger men, which is fairly typical for the data set. 

“The refugees that we intended to use the app with were really anything but younger men,” Sjöberg says. “So that meant that the voice data that we needed did not quite match the voice data that we had.” The team started collecting its own voice data from Ukrainian women, as well as from elderly people. 

Unlike other data sets, Common Voice asks participants to share their gender and details about their accent. Making sure different genders are represented is important to fight bias in AI models, says Rebecca Ryakitimbo, a Common Voice fellow who created the project’s gender action plan. More diversity leads not only to better representation but also to better models. Systems that are trained on narrow and homogenous data tend to spew stereotyped and harmful results.

“We don’t want a case where we have a chatbot that is named after a woman but does not give the same response to a woman as it would a man,” she says. 

Ryakitimbo has collected voice data in Kiswahili in Tanzania, Kenya, and the Democratic Republic of Congo. She tells me she wanted to collect voices from a socioeconomically diverse set of Kiswahili speakers and has reached out to women young and old living in rural areas, who might not always be literate or even have access to devices. 

This kind of data collection is challenging. The importance of collecting AI voice data can feel abstract to many people, especially if they aren’t familiar with the technologies. Ryakitimbo and volunteers would approach women in settings where they felt safe to begin with, such as presentations on menstrual hygiene, and explain how the technology could, for example, help disseminate information about menstruation. For women who did not know how to read, the team read out sentences that they would repeat for the recording. 

The Common Voice project is bolstered by the belief that languages form a really important part of identity. “We think it’s not just about language, but about transmitting culture and heritage and treasuring people’s particular cultural context,” says Lewis-Jong. “There are all kinds of idioms and cultural catchphrases that just don’t translate,” they add. 

Common Voice is the only audio data set where English doesn’t dominate, says Willie Agnew, a researcher at Carnegie Mellon University who has studied audio data sets. “I’m very impressed with how well they’ve done that and how well they’ve made this data set that is actually pretty diverse,” Agnew says. “It feels like they’re way far ahead of almost all the other projects we looked at.” 

I spent some time verifying the recordings of other Finnish speakers on the Common Voice platform. As their voices echoed in my study, I felt surprisingly touched. We had all gathered around the same cause: making AI data more inclusive, and making sure our culture and language was properly represented in the next generation of AI tools. 

But I had some big questions about what would happen to my voice if I donated it. Once it was in the data set, I would have no control about how it might be used afterwards. The tech sector isn’t exactly known for giving people proper credit, and the data is available for anyone’s use. 

“As much as we want it to benefit the local communities, there’s a possibility that also Big Tech could make use of the same data and build something that then comes out as the commercial product,” says Ryakitimbo. Though Mozilla does not share who has downloaded Common Voice, Lewis-Jong tells me Meta and Nvidia have said that they have used it.

Open access to this hard-won and rare language data is not something all minority groups want, says Harry H. Jiang, a researcher at Carnegie Mellon University, who was part of the team doing audit research. For example, Indigenous groups have raised concerns. 

“Extractivism” is something that Mozilla has been thinking about a lot over the past 18 months, says Lewis-Jong. Later this year the project will work with communities to pilot alternative licenses including Nwulite Obodo Open Data License, which was created by researchers at the University of Pretoria for sharing African data sets more equitably. For example, people who want to download the data might be asked to write a request with details on how they plan to use it, and they might be allowed to license it only for certain products or for a limited time. Users might also be asked to contribute to community projects that support poverty reduction, says Lewis-Jong.  

Lewis-Jong says the pilot is a learning exercise to explore whether people will want data with alternative licenses, and whether they are sustainable for communities managing them. The hope is that it could lead to something resembling “open source 2.0.”

In the end, I decided to donate my voice. I received a list of phrases to say, sat in front of my computer, and hit Record. One day, I hope, my effort will help a company or researcher build voice AI that sounds less generic, and more like me. 

This story has been updated.

What Africa needs to do to become a major AI player

Kessel Okinga-Koumu paced around a crowded hallway. It was her first time presenting at the Deep Learning Indaba, she told the crowd gathered to hear her, filled with researchers from Africa’s machine-learning community. The annual weeklong conference (‘Indaba’ is a Zulu word for gathering), was held most recently in September at Amadou Mahtar Mbow University in Dakar, Senegal. It attracted over 700 attendees to hear about—and debate—the potential of Africa-centric AI and how it’s being deployed in agriculture, education, health care, and other critical sectors of the continent’s economy.     

A 28-year-old computer science student at the University of the Western Cape in Cape Town, South Africa, Okinga-Koumu spoke about how she’s tackling a common problem: the lack of lab equipment at her university. Lecturers have long been forced to use chalkboards or printed 2D representations of equipment to simulate practical lessons that need microscopes, centrifuges, or other expensive tools. “In some cases, they even ask students to draw the equipment during practical lessons,” she lamented. 

Okinga-Koumu pulled a phone from the pocket of her blue jeans and opened a prototype web app she’s built. Using VR and AI features, the app allows students to simulate using the necessary lab equipment—exploring 3D models of the tools in a real-world setting, like a classroom or lab. “Students could have detailed VR of lab equipment, making their hands-on experience more effective,” she said. 

Established in 2017, the Deep Learning Indaba now has chapters in 47 of the 55 African nations and aims to boost AI development across the continent by providing training and resources to African AI researchers like Okinga-Koumu. Africa is still early in the process of adopting AI technologies, but organizers say the continent is uniquely hospitable to it for several reasons, including a relatively young and increasingly well-educated population, a rapidly growing ecosystem of AI startups, and lots of potential consumers. 

“The building and ownership of AI solutions tailored to local contexts is crucial for equitable development,” says Shakir Mohamed, a senior research scientist at Google DeepMind and cofounder of the organization sponsoring the conference. Africa, more than other continents in the world, can address specific challenges with AI and will benefit immensely from its young talent, he says: “There is amazing expertise everywhere across the continent.” 

However, researchers’ ambitious efforts to develop AI tools that answer the needs of Africans face numerous hurdles. The biggest are inadequate funding and poor infrastructure. Not only is it very expensive to build AI systems, but research to provide AI training data in original African languages has been hamstrung by poor financing of linguistics departments at many African universities and the fact that citizens increasingly don’t speak or write local languages themselves. Limited internet access and a scarcity of domestic data centers also mean that developers might not be able to deploy cutting-edge AI capabilities.

Attendees of Deep Learning Indaba 2024 in session hall on their computers

DEEP LEARNING INDABA 2024

Complicating this further is a lack of overarching policies or strategies for harnessing AI’s immense benefits—and regulating its downsides. While there are various draft policy documents, researchers are in conflict over a continent-wide strategy. And they disagree about which policies would most benefit Africa, not the wealthy Western governments and corporations that have often funded technological innovation.

Taken together, researchers worry, these issues will hold Africa’s AI sector back and hamper its efforts to pave its own pathway in the global AI race.          

On the cusp of change

Africa’s researchers are already making the most of generative AI’s impressive capabilities. In South Africa, for instance, to help address the HIV epidemic, scientists have designed an app called Your Choice, powered by an LLM-based chatbot that interacts with people to obtain their sexual history without stigma or discrimination. In Kenya, farmers are using AI apps  to diagnose diseases in crops and increase productivity. And in Nigeria, Awarri, a newly minted AI startup, is trying to build the country’s first large language model, with the endorsement of the government, so that Nigerian languages can be integrated into AI tools. 

The Deep Learning Indaba is another sign of how Africa’s AI research scene is starting to flourish. At the Dakar meeting, researchers presented 150 posters and 62 papers. Of those, 30 will be published in top-tier journals, according to Mohamed. 

Meanwhile, an analysis of 1,646 publications in AI between 2013 and 2022 found “a significant increase in publications” from Africa. And Masakhane, a cousin organization to Deep Learning Indaba that pushes for natural-language-processing research in African languages, has released over 400 open-source models and 20 African-language data sets since it was founded in 2018. 

“These metrics speak a lot to the capacity building that’s happening,” says Kathleen Siminyu, a computer scientist from Kenya, who researches NLP tools for her native Kiswahili. “We’re starting to see a critical mass of people having basic foundational skills. They then go on to specialize.”      

She adds: “It’s like a wave that cannot be stopped.”   

Khadija Ba, a Senegalese entrepreneur and investor at the pan-African VC fund P1 Ventures who was at this year’s conference, says that she sees African AI startups as particularly attractive because their local approaches have potential to be scaled for the global market. African startups often build solutions in the absence of robust infrastructure, yet “these innovations work efficiently, making them adaptable to other regions facing similar challenges,” she says. 

In recent years, funding in Africa’s tech ecosystem has picked up: VC investment totaled $4.5 billion last year, more than double what it was just five years ago, according to a report by the African Private Capital Association. And this October, Google announced a $5.8 million commitment to support AI training initiatives in Kenya, Nigeria, and South Africa. But researchers say local funding remains sluggish. Take the Google-backed fund rolled out, also in October, in Nigeria, Africa’s most populous country. It will pay out $6,000 each to 10 AI startups—not even enough to purchase the equipment needed to power their systems.

Lilian Wanzare, a lecturer and NLP researcher at Maseno University in Kisumu, Kenya, bridles at African governments’ lackadaisical support for local AI initiatives and complains as well that the government charges exorbitant fees for access to publicly generated data, hindering data sharing and collaboration. “[We] researchers are just blocked,” she says. “The government is saying they’re willing to support us, but the structures have not been put in place for us.”

Language barriers 

Researchers who want to make Africa-centric AI don’t face just insufficient local investment and inaccessible data. There are major linguistic challenges, too.  

During one discussion at the Indaba, Ife Adebara, a Nigerian computational linguist, posed a question: “How many people can write a bachelor’s thesis in their native African language?” 

Zero hands went up. 

Then the audience disintegrated into laughter.   

Africans want AI to speak their local languages, but many Africans cannot speak and write in these languages themselves, Adebara said.      

Although Africa accounts for one-third of all languages in the world, many oral languages are slowly disappearing, their population of native speakers declining. And LLMs developed by Western-based tech companies fail to serve African languages; they don’t understand locally relevant context and culture. 

For Adebara and others researching NLP tools, the lack of people who have the ability to read and write in African languages poses a major hurdle to development of bespoke AI-enabled technologies. “Without literacy in our local languages, the future of AI in Africa is not as bright as we think,” she says.      

On top of all that, there’s little machine-readable data for African languages. One reason is that linguistic departments in public universities are poorly funded, Adebara says, limiting linguists’ participation in work that could create such data and benefit AI development. 

This year, she and her colleagues established EqualyzAI, a for-profit company seeking to preserve African languages through digital technology. They have built voice tools and AI models, covering about 517 African languages.       

Lelapa AI, a software company that’s building data sets and NLP tools for African languages, is also trying to address these language-specific challenges. Its cofounders met in 2017 at the first Deep Learning Indaba and launched the company in 2022. In 2023, it released its first AI tool, Vulavula, a speech-to-text program that recognizes several languages spoken in South Africa. 

This year, Lelapa AI released InkubaLM, a first-of-its-kind small language model that currently supports a range of African languages: IsiXhosa, Yoruba, Swahili, IsiZulu, and Hausa. InkubaLM can answer questions and perform tasks like English translation and sentiment analysis. In tests, it performed as well as some larger models. But it’s still in early stages. The hope is that InkubaLM will someday power Vulavula, says Jade Abbott, cofounder and chief operating officer of Lelapa AI. 

“It’s the first iteration of us really expressing our long-term vision of what we want, and where we see African AI in the future,” Abbott says. “What we’re really building is a small language model that punches above its weight.”

InkubaLM is trained on two open-source data sets with 1.9 billion tokens, built and curated by Masakhane and other African developers who worked with real people in local communities. They paid native speakers of languages to attend writing workshops to create data for their model.

Fundamentally, this approach will always be better, says Wanzare, because it’s informed by people who represent the language and culture.

A clash over strategy

Another issue that came up again and again at the Indaba was that Africa’s AI scene lacks the sort of regulation and support from governments that you find elsewhere in the world—in Europe, the US, China, and, increasingly, the Middle East. 

Of the 55 African nations, only seven—Senegal, Egypt, Mauritius, Rwanda, Algeria, Nigeria, and Benin—have developed their own formal AI strategies. And many of those are still in the early stages.  

A major point of tension at the Indaba, though, was the regulatory framework that will govern the approach to AI across the entire continent. In March, the African Union Development Agency published a white paper, developed over a three-year period, that lays out this strategy. The 200-page document includes recommendations for industry codes and practices, standards to assess and benchmark AI systems, and a blueprint of AI regulations for African nations to adopt. The hope is that it will be endorsed by the heads of African governments in February 2025 and eventually passed by the African Union.  

But in July, the African Union Commission in Addis Ababa, Ethiopia, another African governing body that wields more power than the development agency, released a rival continental AI strategy—a 66-page document that diverges from the initial white paper. 

It’s unclear what’s behind the second strategy, but Seydina Ndiaye, a program director at the Cheikh Hamidou Kane Digital University in Dakar who helped draft the development agency’s white paper, claims it was drafted by a tech lobbyist from Switzerland. The commission’s strategy calls for African Union member states to declare AI a national priority, promote AI startups, and develop regulatory frameworks to address safety and security challenges. But Ndiaye expressed concerns that the document does not reflect the perspectives, aspirations, knowledge, and work of grassroots African AI communities. “It’s a copy-paste of what’s going on outside the continent,” he says.               

Vukosi Marivate, a computer scientist at the University of Pretoria in South Africa who helped found the Deep Learning Indaba and is known as an advocate for the African machine-learning movement, expressed fury over this turn of events at the conference. “These are things we shouldn’t accept,” he declared. The room full of data wonks, linguists, and international funders brimmed with frustration. But Marivate encouraged the group to forge ahead with building AI that benefits Africans: “We don’t have to wait for the rules to act right,” he said.  

Barbara Glover, a program manager for the African Union Development Agency, acknowledges that AI researchers are angry and frustrated. There’s been a push to harmonize the two continental AI strategies, but she says the process has been fractious: “That engagement didn’t go as envisioned.” Her agency plans to keep its own version of the continental AI strategy, Glover says, adding that it was developed by African experts rather than outsiders. “We are capable, as Africans, of driving our own AI agenda,” she says.       

crowd of attendees mingle around display booths at Deep Learning Indaba 2024. Booth signs for Mila, Meta and OpenAI can be seen in the frame.

DEEP LEARNING INDABA 2024

This all speaks to a broader tension over foreign influence in the African AI scene, one that goes beyond any single strategic document. Mirroring the skepticism toward the African Union Commission strategy, critics say the Deep Learning Indaba is tainted by its reliance on funding from big foreign tech companies; roughly 50% of its $500,000 annual budget comes from international donors and the rest from corporations like Google DeepMind, Apple, Open AI, and Meta. They argue that this cash could pollute the Indaba’s activities and influence the topics and speakers chosen for discussion. 

But Mohamed, the Indaba cofounder who is a researcher at Google DeepMind, says that “almost all that goes back to our beneficiaries across the continent,” and the organization helps connect them to training opportunities in tech companies. He says it benefits from some of its cofounders’ ties with these companies but that they do not set the agenda.

Ndiaye says that the funding is necessary to keep the conference going. “But we need to have more African governments involved,” he says.     

To Timnit Gebru, founder and executive director at the nonprofit Distributed AI Research Institute (DAIR), which supports equitable AI research in Africa, the angst about foreign funding for AI development comes down to skepticism of exploitative, profit-driven international tech companies. “Africans [need] to do something different and not replicate the same issues we’re fighting against,” Gebru says. She warns about the pressure to adopt “AI for everything in Africa,” adding that there’s “a lot of push from international development organizations” to use AI as an “antidote” for all Africa’s challenges.       

Siminyu, who is also a researcher at DAIR, agrees with that view. She hopes that African governments will fund and work with people in Africa to build AI tools that reach underrepresented communities—tools that can be used in positive ways and in a context that works for Africans. “We should be afforded the dignity of having AI tools in a way that others do,” she says.     

Why AI could eat quantum computing’s lunch

Tech companies have been funneling billions of dollars into quantum computers for years. The hope is that they’ll be a game changer for fields as diverse as finance, drug discovery, and logistics.

Those expectations have been especially high in physics and chemistry, where the weird effects of quantum mechanics come into play. In theory, this is where quantum computers could have a huge advantage over conventional machines.

But while the field struggles with the realities of tricky quantum hardware, another challenger is making headway in some of these most promising use cases. AI is now being applied to fundamental physics, chemistry, and materials science in a way that suggests quantum computing’s purported home turf might not be so safe after all.

The scale and complexity of quantum systems that can be simulated using AI is advancing rapidly, says Giuseppe Carleo, a professor of computational physics at the Swiss Federal Institute of Technology (EPFL). Last month, he coauthored a paper published in Science showing that neural-network-based approaches are rapidly becoming the leading technique for modeling materials with strong quantum properties. Meta also recently unveiled an AI model trained on a massive new data set of materials that has jumped to the top of a leaderboard for machine-learning approaches to material discovery.

Given the pace of recent advances, a growing number of researchers are now asking whether AI could solve a substantial chunk of the most interesting problems in chemistry and materials science before large-scale quantum computers become a reality. 

“The existence of these new contenders in machine learning is a serious hit to the potential applications of quantum computers,” says Carleo “In my opinion, these companies will find out sooner or later that their investments are not justified.”

Exponential problems

The promise of quantum computers lies in their potential to carry out certain calculations much faster than conventional computers. Realizing this promise will require much larger quantum processors than we have today. The biggest devices have just crossed the thousand-qubit mark, but achieving an undeniable advantage over classical computers will likely require tens of thousands, if not millions. Once that hardware is available, though, a handful of quantum algorithms, like the encryption-cracking Shor’s algorithm, have the potential to solve problems exponentially faster than classical algorithms can. 

But for many quantum algorithms with more obvious commercial applications, like searching databases, solving optimization problems, or powering AI, the speed advantage is more modest. And last year, a paper coauthored by Microsoft’s head of quantum computing, Matthias Troyer, showed that these theoretical advantages disappear if you account for the fact that quantum hardware operates orders of magnitude slower than modern computer chips. The difficulty of getting large amounts of classical data in and out of a quantum computer is also a major barrier. 

So Troyer and his colleagues concluded that quantum computers should instead focus on problems in chemistry and materials science that require simulation of systems where quantum effects dominate. A computer that operates along the same quantum principles as these systems should, in theory, have a natural advantage here. In fact, this has been a driving idea behind quantum computing ever since the renowned physicist Richard Feynman first proposed the idea.

The rules of quantum mechanics govern many things with huge practical and commercial value, like proteins, drugs, and materials. Their properties are determined by the interactions of their constituent particles, in particular their electrons—and simulating these interactions in a computer should make it possible to predict what kinds of characteristics a molecule will exhibit. This could prove invaluable for discovering things like new medicines or more efficient battery chemistries, for example. 

But the intuition-defying rules of quantum mechanics—in particular, the phenomenon of entanglement, which allows the quantum states of distant particles to become intrinsically linked—can make these interactions incredibly complex. Precisely tracking them requires complicated math that gets exponentially tougher the more particles are involved. That can make simulating large quantum systems intractable on classical machines.

This is where quantum computers could shine. Because they also operate on quantum principles, they are able to represent quantum states much more efficiently than is possible on classical machines. They could also take advantage of quantum effects to speed up their calculations.

But not all quantum systems are the same. Their complexity is determined by the extent to which their particles interact, or correlate, with each other. In systems where these interactions are strong, tracking all these relationships can quickly explode the number of calculations required to model the system. But in most that are of practical interest to chemists and materials scientists, correlation is weak, says Carleo. That means their particles don’t affect each other’s behavior significantly, which makes the systems far simpler to model.

The upshot, says Carleo, is that quantum computers are unlikely to provide any advantage for most problems in chemistry and materials science. Classical tools that can accurately model weakly correlated systems already exist, the most prominent being density functional theory (DFT). The insight behind DFT is that all you need to understand a system’s key properties is its electron density, a measure of how its electrons are distributed in space. This makes for much simpler computation but can still provide accurate results for weakly correlated systems.

Simulating large systems using these approaches requires considerable computing power. But in recent years there’s been an explosion of research using DFT to generate data on chemicals, biomolecules, and materials—data that can be used to train neural networks. These AI models learn patterns in the data that allow them to predict what properties a particular chemical structure is likely to have, but they are orders of magnitude cheaper to run than conventional DFT calculations. 

This has dramatically expanded the size of systems that can be modeled—to as many as 100,000 atoms at a time—and how long simulations can run, says Alexandre Tkatchenko, a physics professor at the University of Luxembourg. “It’s wonderful. You can really do most of chemistry,” he says.

Olexandr Isayev, a chemistry professor at Carnegie Mellon University, says these techniques are already being widely applied by companies in chemistry and life sciences. And for researchers, previously out of reach problems such as optimizing chemical reactions, developing new battery materials, and understanding protein binding are finally becoming tractable.

As with most AI applications, the biggest bottleneck is data, says Isayev. Meta’s recently released materials data set was made up of DFT calculations on 118 million molecules. A model trained on this data achieved state-of-the-art performance, but creating the training material took vast computing resources, well beyond what’s accessible to most research teams. That means fulfilling the full promise of this approach will require massive investment.

Modeling a weakly correlated system using DFT is not an exponentially scaling problem, though. This suggests that with more data and computing resources, AI-based classical approaches could simulate even the largest of these systems, says Tkatchenko. Given that quantum computers powerful enough to compete are likely still decades away, he adds, AI’s current trajectory suggests it could reach important milestones, such as precisely simulating how drugs bind to a protein, much sooner.

Strong correlations

When it comes to simulating strongly correlated quantum systems—ones whose particles interact a lot—methods like DFT quickly run out of steam. While more exotic, these systems include materials with potentially transformative capabilities, like high-temperature superconductivity or ultra-precise sensing. But even here, AI is making significant strides.

In 2017, EPFL’s Carleo and Microsoft’s Troyer published a seminal paper in Science showing that neural networks could model strongly correlated quantum systems. The approach doesn’t learn from data in the classical sense. Instead, Carleo says, it is similar to DeepMind’s AlphaZero model, which mastered the games of Go, chess, and shogi using nothing more than the rules of each game and the ability to play itself.

In this case, the rules of the game are provided by Schrödinger’s equation, which can precisely describe a system’s quantum state, or wave function. The model plays against itself by arranging particles in a certain configuration and then measuring the system’s energy level. The goal is to reach the lowest energy configuration (known as the ground state), which determines the system’s properties. The model repeats this process until energy levels stop falling, indicating that the ground state—or something close to it—has been reached.

The power of these models is their ability to compress information, says Carleo. “The wave function is a very complicated mathematical object,” he says. “What has been shown by several papers now is that [the neural network] is able to capture the complexity of this object in a way that can be handled by a classical machine.”

Since the 2017 paper, the approach has been extended to a wide range of strongly correlated systems, says Carleo, and results have been impressive. The Science paper he published with colleagues last month put leading classical simulation techniques to the test on a variety of tricky quantum simulation problems, with the goal of creating a benchmark to judge advances in both classical and quantum approaches.

Carleo says that neural-network-based techniques are now the best approach for simulating many of the most complex quantum systems they tested. “Machine learning is really taking the lead in many of these problems,” he says.

These techniques are catching the eye of some big players in the tech industry. In August, researchers at DeepMind showed in a paper in Science that they could accurately model excited states in quantum systems, which could one day help predict the behavior of things like solar cells, sensors, and lasers. Scientists at Microsoft Research have also developed an open-source software suite to help more researchers use neural networks for simulation.

One of the main advantages of the approach is that it piggybacks on massive investments in AI software and hardware, says Filippo Vicentini, a professor of AI and condensed-matter physics at École Polytechnique in France, who was also a coauthor on the Science benchmarking paper: “Being able to leverage these kinds of technological advancements gives us a huge edge.”

There is a caveat: Because the ground states are effectively found through trial and error rather than explicit calculations, they are only approximations. But this is also why the approach could make progress on what has looked like an intractable problem, says Juan Carrasquilla, a researcher at ETH Zurich, and another coauthor on the Science benchmarking paper.

If you want to precisely track all the interactions in a strongly correlated system, the number of calculations you need to do rises exponentially with the system’s size. But if you’re happy with an answer that is just good enough, there’s plenty of scope for taking shortcuts. 

“Perhaps there’s no hope to capture it exactly,” says Carrasquilla. “But there’s hope to capture enough information that we capture all the aspects that physicists care about. And if we do that, it’s basically indistinguishable from a true solution.”

And while strongly correlated systems are generally too hard to simulate classically, there are notable instances where this isn’t the case. That includes some systems that are relevant for modeling high-temperature superconductors, according to a 2023 paper in Nature Communications.

“Because of the exponential complexity, you can always find problems for which you can’t find a shortcut,” says Frank Noe, research manager at Microsoft Research, who has led much of the company’s work in this area. “But I think the number of systems for which you can’t find a good shortcut will just become much smaller.”

No magic bullets

However, Stefanie Czischek, an assistant professor of physics at the University of Ottawa, says it can be hard to predict what problems neural networks can feasibly solve. For some complex systems they do incredibly well, but then on other seemingly simple ones, computational costs balloon unexpectedly. “We don’t really know their limitations,” she says. “No one really knows yet what are the conditions that make it hard to represent systems using these neural networks.”

Meanwhile, there have also been significant advances in other classical quantum simulation techniques, says Antoine Georges, director of the Center for Computational Quantum Physics at the Flatiron Institute in New York, who also contributed to the recent Science benchmarking paper. “They are all successful in their own right, and they are also very complementary,” he says. “So I don’t think these machine-learning methods are just going to completely put all the other methods out of business.”

Quantum computers will also have their niche, says Martin Roetteler, senior director of quantum solutions at IonQ, which is developing quantum computers built from trapped ions. While he agrees that classical approaches will likely be sufficient for simulating weakly correlated systems, he’s confident that some large, strongly correlated systems will be beyond their reach. “The exponential is going to bite you,” he says. “There are cases with strongly correlated systems that we cannot treat classically. I’m strongly convinced that that’s the case.”

In contrast, he says, a future fault-tolerant quantum computer with many more qubits than today’s devices will be able to simulate such systems. This could help find new catalysts or improve understanding of metabolic processes in the body—an area of interest to the pharmaceutical industry.

Neural networks are likely to increase the scope of problems that can be solved, says Jay Gambetta, who leads IBM’s quantum computing efforts, but he’s unconvinced they’ll solve the hardest challenges businesses are interested in.

“That’s why many different companies that essentially have chemistry as their requirement are still investigating quantum—because they know exactly where these approximation methods break down,” he says.

Gambetta also rejects the idea that the technologies are rivals. He says the future of computing is likely to involve a hybrid of the two approaches, with quantum and classical subroutines working together to solve problems. “I don’t think they’re in competition. I think they actually add to each other,” he says.

But Scott Aaronson, who directs the Quantum Information Center at the University of Texas, says machine-learning approaches are directly competing against quantum computers in areas like quantum chemistry and condensed-matter physics. He predicts that a combination of machine learning and quantum simulations will outperform purely classical approaches in many cases, but that won’t become clear until larger, more reliable quantum computers are available.

“From the very beginning, I’ve treated quantum computing as first and foremost a scientific quest, with any industrial applications as icing on the cake,” he says. “So if quantum simulation turns out to beat classical machine learning only rarely, I won’t be quite as crestfallen as some of my colleagues.”

One area where quantum computers look likely to have a clear advantage is in simulating how complex quantum systems evolve over time, says EPFL’s Carleo. This could provide invaluable insights for scientists in fields like statistical mechanics and high-energy physics, but it seems unlikely to lead to practical uses in the near term. “These are more niche applications that, in my opinion, do not justify the massive investments and the massive hype,” Carleo adds.

Nonetheless, the experts MIT Technology Review spoke to said a lack of commercial applications is not a reason to stop pursuing quantum computing, which could lead to fundamental scientific breakthroughs in the long run.

“Science is like a set of nested boxes—you solve one problem and you find five other problems,” says Vicentini. “The complexity of the things we study will increase over time, so we will always need more powerful tools.”