The AI lab waging a guerrilla war over exploitative AI

Ben Zhao remembers well the moment he officially jumped into the fight between artists and generative AI: when one artist asked for AI bananas. 

A computer security researcher at the University of Chicago, Zhao had made a name for himself by building tools to protect images from facial recognition technology. It was this work that caught the attention of Kim Van Deun, a fantasy illustrator who invited him to a Zoom call in November 2022 hosted by the Concept Art Association, an advocacy organization for artists working in commercial media. 

On the call, artists shared details of how they had been hurt by the generative AI boom, which was then brand new. At that moment, AI was suddenly everywhere. The tech community was buzzing over image-generating AI models, such as Midjourney, Stable Diffusion, and OpenAI’s DALL-E 2, which could follow simple word prompts to depict fantasylands or whimsical chairs made of avocados. 

But these artists saw this technological wonder as a new kind of theft. They felt the models were effectively stealing and replacing their work. Some had found that their art had been scraped off the internet and used to train the models, while others had discovered that their own names had become prompts, causing their work to be drowned out online by AI knockoffs.

Zhao remembers being shocked by what he heard. “People are literally telling you they’re losing their livelihoods,” he told me one afternoon this spring, sitting in his Chicago living room. “That’s something that you just can’t ignore.” 

So on the Zoom, he made a proposal: What if, hypothetically, it was possible to build a mechanism that would help mask their art to interfere with AI scraping?

“I would love a tool that if someone wrote my name and made a prompt, like, garbage came out,” responded Karla Ortiz, a prominent digital artist. “Just, like, bananas or some weird stuff.” 

That was all the convincing Zhao needed—the moment he joined the cause.

Fast-forward to today, and millions of artists have deployed two tools born from that Zoom: Glaze and Nightshade, which were developed by Zhao and the University of Chicago’s SAND Lab (an acronym for “security, algorithms, networking, and data”).

Arguably the most prominent weapons in an artist’s arsenal against nonconsensual AI scraping, Glaze and Nightshade work in similar ways: by adding what the researchers call “barely perceptible” perturbations to an image’s pixels so that machine-learning models cannot read them properly. Glaze, which has been downloaded more than 6 million times since it launched in March 2023, adds what’s effectively a secret cloak to images that prevents AI algorithms from picking up on and copying an artist’s style. Nightshade, which I wrote about when it was released almost exactly a year ago this fall, cranks up the offensive against AI companies by adding an invisible layer of poison to images, which can break AI models; it has been downloaded more than 1.6 million times. 

Thanks to the tools, “I’m able to post my work online,” Ortiz says, “and that’s pretty huge.” For artists like her, being seen online is crucial to getting more work. If they are uncomfortable about ending up in a massive for-profit AI model without compensation, the only option is to delete their work from the internet. That would mean career suicide. “It’s really dire for us,” adds Ortiz, who has become one of the most vocal advocates for fellow artists and is part of a class action lawsuit against AI companies, including Stability AI, over copyright infringement. 

But Zhao hopes that the tools will do more than empower individual artists. Glaze and Nightshade are part of what he sees as a battle to slowly tilt the balance of power from large corporations back to individual creators. 

“It is just incredibly frustrating to see human life be valued so little,” he says with a disdain that I’ve come to see as pretty typical for him, particularly when he’s talking about Big Tech. “And to see that repeated over and over, this prioritization of profit over humanity … it is just incredibly frustrating and maddening.” 

As the tools are adopted more widely, his lofty goal is being put to the test. Can Glaze and Nightshade make genuine security accessible for creators—or will they inadvertently lull artists into believing their work is safe, even as the tools themselves become targets for haters and hackers? While experts largely agree that the approach is effective and Nightshade could prove to be powerful poison, other researchers claim they’ve already poked holes in the protections offered by Glaze and that trusting these tools is risky. 

But Neil Turkewitz, a copyright lawyer who used to work at the Recording Industry Association of America, offers a more sweeping view of the fight the SAND Lab has joined. It’s not about a single AI company or a single individual, he says: “It’s about defining the rules of the world we want to inhabit.” 

Poking the bear

The SAND Lab is tight knit, encompassing a dozen or so researchers crammed into a corner of the University of Chicago’s computer science building. That space has accumulated somewhat typical workplace detritus—a Meta Quest headset here, silly photos of dress-up from Halloween parties there. But the walls are also covered in original art pieces, including a framed painting by Ortiz.  

Years before fighting alongside artists like Ortiz against “AI bros” (to use Zhao’s words), Zhao and the lab’s co-leader, Heather Zheng, who is also his wife, had built a record of combating harms posed by new tech. 

group of students and teachers posing in Halloween costumes
When I visited the SAND Lab in Chicago, I saw how tight knit the group was. Alongside the typical workplace stuff were funny Halloween photos like this one. (Front row: Ronik Bhaskar, Josephine Passananti, Anna YJ Ha, Zhuolin Yang, Ben Zhao, Heather Zheng. Back row: Cathy Yuanchen Li, Wenxin Ding, Stanley Wu, and Shawn Shan.)
COURTESY OF SAND LAB

Though both earned spots on MIT Technology Review’s 35 Innovators Under 35 list for other work nearly two decades ago, when they were at the University of California, Santa Barbara (Zheng in 2005 for “cognitive radios” and Zhao a year later for peer-to-peer networks), their primary research focus has become security and privacy. 

The pair left Santa Barbara in 2017, after they were poached by the new co-director of the University of Chicago’s Data Science Institute, Michael Franklin. All eight PhD students from their UC Santa Barbara lab decided to follow them to Chicago too. Since then, the group has developed a “bracelet of silence” that jams the microphones in AI voice assistants like the Amazon Echo. It has also created a tool called Fawkes—“privacy armor,” as Zhao put it in a 2020 interview with the New York Times—that people can apply to their photos to protect them from facial recognition software. They’ve also studied how hackers might steal sensitive information through stealth attacks on virtual-reality headsets, and how to distinguish human art from AI-generated images. 

“Ben and Heather and their group are kind of unique because they’re actually trying to build technology that hits right at some key questions about AI and how it is used,” Franklin tells me. “They’re doing it not just by asking those questions, but by actually building technology that forces those questions to the forefront.”

It was Fawkes that intrigued Van Deun, the fantasy illustrator, two years ago; she hoped something similar might work as protection against generative AI, which is why she extended that fateful invite to the Concept Art Association’s Zoom call. 

That call started something of a mad rush in the weeks that followed. Though Zhao and Zheng collaborate on all the lab’s projects, they each lead individual initiatives; Zhao took on what would become Glaze, with PhD student Shawn Shan (who was on this year’s Innovators Under 35 list) spearheading the development of the program’s algorithm. 

In parallel to Shan’s coding, PhD students Jenna Cryan and Emily Wenger sought to learn more about the views and needs of the artists themselves. They created a user survey that the team distributed to artists with the help of Ortiz. In replies from more than 1,200 artists—far more than the average number of responses to user studies in computer science—the team found that the vast majority of creators had read about art being used to train models, and 97% expected AI to decrease some artists’ job security. A quarter said AI art had already affected their jobs. 

Almost all artists also said they posted their work online, and more than half said they anticipated reducing or removing that online work, if they hadn’t already—no matter the professional and financial consequences.

The first scrappy version of Glaze was developed in just a month, at which point Ortiz gave the team her entire catalogue of work to test the model on. At the most basic level, Glaze acts as a defensive shield. Its algorithm identifies features from the image that make up an artist’s individual style and adds subtle changes to them. When an AI model is trained on images protected with Glaze, the model will not be able to reproduce styles similar to the original image. 

A painting from Ortiz later became the first image publicly released with Glaze on it: a young woman, surrounded by flying eagles, holding up a wreath. Its title is Musa Victoriosa, “victorious muse.” 

It’s the one currently hanging on the SAND Lab’s walls. 

Despite many artists’ initial enthusiasm, Zhao says, Glaze’s launch caused significant backlash. Some artists were skeptical because they were worried this was a scam or yet another data-harvesting campaign. 

The lab had to take several steps to build trust, such as offering the option to download the Glaze app so that it adds the protective layer offline, which meant no data was being transferred anywhere. (The images are then shielded when artists upload them.)  

Soon after Glaze’s launch, Shan also led the development of the second tool, Nightshade. Where Glaze is a defensive mechanism, Nightshade was designed to act as an offensive deterrent to nonconsensual training. It works by changing the pixels of images in ways that are not noticeable to the human eye but manipulate machine-learning models so they interpret the image as something different from what it actually shows. If poisoned samples are scraped into AI training sets, these samples trick the AI models: Dogs become cats, handbags become toasters. The researchers say only a relatively few examples are enough to permanently damage the way a generative AI model produces images.

Currently, both tools are available as free apps or can be applied through the project’s website. The lab has also recently expanded its reach by offering integration with the new artist-supported social network Cara, which was born out of a backlash to exploitative AI training and forbids AI-produced content.

In dozens of conversations with Zhao and the lab’s researchers, as well as a handful of their artist-collaborators, it’s become clear that both groups now feel they are aligned in one mission. “I never expected to become friends with scientists in Chicago,” says Eva Toorenent, a Dutch artist who worked closely with the team on Nightshade. “I’m just so happy to have met these people during this collective battle.” 

Belladonna artwork shows a central character with a skull head in a dark forest illuminated around them by the belladonna flower slung over their shoulder
Images online of Toorenent’s Belladonna have been treated with the SAND Lab’s Nightshade tool.
EVA TOORENENT

Her painting Belladonna, which is also another name for the nightshade plant, was the first image with Nightshade’s poison on it. 

“It’s so symbolic,” she says. “People taking our work without our consent, and then taking our work without consent can ruin their models. It’s just poetic justice.” 

No perfect solution

The reception of the SAND Lab’s work has been less harmonious across the AI community.

After Glaze was made available to the public, Zhao tells me, someone reported it to sites like VirusTotal, which tracks malware, so that it was flagged by antivirus programs. Several people also started claiming on social media that the tool had quickly been broken. Nightshade similarly got a fair share of criticism when it launched; as TechCrunch reported in January, some called it a “virus” and, as the story explains, “another Reddit user who inadvertently went viral on X questioned Nightshade’s legality, comparing it to ‘hacking a vulnerable computer system to disrupt its operation.’” 

“We had no idea what we were up against,” Zhao tells me. “Not knowing who or what the other side could be meant that every single new buzzing of the phone meant that maybe someone did break Glaze.” 

Both tools, though, have gone through rigorous academic peer review and have won recognition from the computer security community. Nightshade was accepted at the IEEE Symposium on Security and Privacy, and Glaze received a distinguished paper award and the 2023 Internet Defense Prize at the Usenix Security Symposium, a top conference in the field. 

“In my experience working with poison, I think [Nightshade is] pretty effective,” says Nathalie Baracaldo, who leads the AI security and privacy solutions team at IBM and has studied data poisoning. “I have not seen anything yet—and the word yet is important here—that breaks that type of defense that Ben is proposing.” And the fact that the team has released the source code for Nightshade for others to probe, and it hasn’t been broken, also suggests it’s quite secure, she adds. 

At the same time, at least one team of researchers does claim to have penetrated the protections of Glaze, or at least an old version of it. 

As researchers from Google DeepMind and ETH Zurich detailed in a paper published in June, they found various ways Glaze (as well as similar but less popular protection tools, such as Mist and Anti-DreamBooth) could be circumvented using off-the-shelf techniques that anyone could access—such as image upscaling, meaning filling in pixels to increase the resolution of an image as it’s enlarged. The researchers write that their work shows the “brittleness of existing protections” and warn that “artists may believe they are effective. But our experiments show they are not.”

Florian Tramèr, an associate professor at ETH Zurich who was part of the study, acknowledges that it is “very hard to come up with a strong technical solution that ends up really making a difference here.” Rather than any individual tool, he ultimately advocates for an almost certainly unrealistic ideal: stronger policies and laws to help create an environment in which people commit to buying only human-created art. 

What happened here is common in security research, notes Baracaldo: A defense is proposed, an adversary breaks it, and—ideally—the defender learns from the adversary and makes the defense better. “It’s important to have both ethical attackers and defenders working together to make our AI systems safer,” she says, adding that “ideally, all defenses should be publicly available for scrutiny,” which would both “allow for transparency” and help avoid creating a false sense of security. (Zhao, though, tells me the researchers have no intention to release Glaze’s source code.)

Still, even as all these researchers claim to support artists and their art, such tests hit a nerve for Zhao. In Discord chats that were later leaked, he claimed that one of the researchers from the ETH Zurich–Google DeepMind team “doesn’t give a shit” about people. (That researcher did not respond to a request for comment, but in a blog post he said it was important to break defenses in order to know how to fix them. Zhao says his words were taken out of context.) 

Zhao also emphasizes to me that the paper’s authors mainly evaluated an earlier version of Glaze; he says its new update is more resistant to tampering. Messing with images that have current Glaze protections would harm the very style that is being copied, he says, making such an attack useless. 

This back-and-forth reflects a significant tension in the computer security community and, more broadly, the often adversarial relationship between different groups in AI. Is it wrong to give people the feeling of security when the protections you’ve offered might break? Or is it better to have some level of protection—one that raises the threshold for an attacker to inflict harm—than nothing at all? 

Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, says there are plenty of examples where similar technical protections have failed to be bulletproof. For example, in 2023, de Montjoye and his team probed a digital mask for facial recognition algorithms, which was meant to protect the privacy of medical patients’ facial images; they were able to break the protections by tweaking just one thing in the program’s algorithm (which was open source). 

Using such defenses is still sending a message, he says, and adding some friction to data profiling. “Tools such as TrackMeNot”—which protects users from data profiling—“have been presented as a way to protest; as a way to say I do not consent.”  

“But at the same time,” he argues, “we need to be very clear with artists that it is removable and might not protect against future algorithms.”

While Zhao will admit that the researchers pointed out some of Glaze’s weak spots, he unsurprisingly remains confident that Glaze and Nightshade are worth deploying, given that “security tools are never perfect.” Indeed, as Baracaldo points out, the Google DeepMind and ETH Zurich researchers showed how a highly motivated and sophisticated adversary will almost certainly always find a way in.

Yet it is “simplistic to think that if you have a real security problem in the wild and you’re trying to design a protection tool, the answer should be it either works perfectly or don’t deploy it,” Zhao says, citing spam filters and firewalls as examples. Defense is a constant cat-and-mouse game. And he believes most artists are savvy enough to understand the risk. 

Offering hope

The fight between creators and AI companies is fierce. The current paradigm in AI is to build bigger and bigger models, and there is, at least currently, no getting around the fact that they require vast data sets hoovered from the internet to train on. Tech companies argue that anything on the public internet is fair game, and that it is “impossible” to build advanced AI tools without copyrighted material; many artists argue that tech companies have stolen their intellectual property and violated copyright law, and that they need ways to keep their individual works out of the models—or at least receive proper credit and compensation for their use. 

So far, the creatives aren’t exactly winning. A number of companies have already replaced designers, copywriters, and illustrators with AI systems. In one high-profile case, Marvel Studios used AI-generated imagery instead of human-created art in the title sequence of its 2023 TV series Secret Invasion. In another, a radio station fired its human presenters and replaced them with AI. The technology has become a major bone of contention between unions and film, TV, and creative studios, most recently leading to a strike by video-game performers. There are numerous ongoing lawsuits by artists, writers, publishers, and record labels against AI companies. It will likely take years until there is a clear-cut legal resolution. But even a court ruling won’t necessarily untangle the difficult ethical questions created by generative AI. Any future government regulation is not likely to either, if it ever materializes. 

That’s why Zhao and Zheng see Glaze and Nightshade as necessary interventions—tools to defend original work, attack those who would help themselves to it, and, at the very least, buy artists some time. Having a perfect solution is not really the point. The researchers need to offer something now because the AI sector moves at breakneck speed, Zheng says, means that companies are ignoring very real harms to humans. “This is probably the first time in our entire technology careers that we actually see this much conflict,” she adds.

On a much grander scale, she and Zhao tell me they hope that Glaze and Nightshade will eventually have the power to overhaul how AI companies use art and how their products produce it. It is eye-wateringly expensive to train AI models, and it’s extremely laborious for engineers to find and purge poisoned samples in a data set of billions of images. Theoretically, if there are enough Nightshaded images on the internet and tech companies see their models breaking as a result, it could push developers to the negotiating table to bargain over licensing and fair compensation. 

That’s, of course, still a big “if.” MIT Technology Review reached out to several AI companies, such as Midjourney and Stability AI, which did not reply to requests for comment. A spokesperson for OpenAI, meanwhile, did not confirm any details about encountering data poison but said the company takes the safety of its products seriously and is continually improving its safety measures: “We are always working on how we can make our systems more robust against this type of abuse.”

In the meantime, the SAND Lab is moving ahead and looking into funding from foundations and nonprofits to keep the project going. They also say there has also been interest from major companies looking to protect their intellectual property (though they decline to say which), and Zhao and Zheng are exploring how the tools could be applied in other industries, such as gaming, videos, or music. In the meantime, they plan to keep updating Glaze and Nightshade to be as robust as possible, working closely with the students in the Chicago lab—where, on another wall, hangs Toorenent’s Belladonna. The painting has a heart-shaped note stuck to the bottom right corner: “Thank you! You have given hope to us artists.”

This story has been updated with the latest download figures for Glaze and Nightshade.

Google DeepMind has a new way to look inside an AI’s “mind”

AI has led to breakthroughs in drug discovery and robotics and is in the process of entirely revolutionizing how we interact with machines and the web. The only problem is we don’t know exactly how it works, or why it works so well. We have a fair idea, but the details are too complex to unpick. That’s a problem: It could lead us to deploy an AI system in a highly sensitive field like medicine without understanding that it could have critical flaws embedded in its workings.

A team at Google DeepMind that studies something called mechanistic interpretability has been working on new ways to let us peer under the hood. At the end of July, it released Gemma Scope, a tool to help researchers understand what is happening when AI is generating an output. The hope is that if we have a better understanding of what is happening inside an AI model, we’ll be able to control its outputs more effectively, leading to better AI systems in the future.

“I want to be able to look inside a model and see if it’s being deceptive,” says Neel Nanda, who runs the mechanistic interpretability team at Google DeepMind. “It seems like being able to read a model’s mind should help.”

Mechanistic interpretability, also known as “mech interp,” is a new research field that aims to understand how neural networks actually work. At the moment, very basically, we put inputs into a model in the form of a lot of data, and then we get a bunch of model weights at the end of training. These are the parameters that determine how a model makes decisions. We have some idea of what’s happening between the inputs and the model weights: Essentially, the AI is finding patterns in the data and making conclusions from those patterns, but these patterns can be incredibly complex and often very hard for humans to interpret.

It’s like a teacher reviewing the answers to a complex math problem on a test. The student—the AI, in this case—wrote down the correct answer, but the work looks like a bunch of squiggly lines. This example assumes the AI is always getting the correct answer, but that’s not always true; the AI student may have found an irrelevant pattern that it’s assuming is valid. For example, some current AI systems will give you the result that 9.11 is bigger than 9.8. Different methods developed in the field of mechanistic interpretability are beginning to shed a little bit of light on what may be happening, essentially making sense of the squiggly lines.

“A key goal of mechanistic interpretability is trying to reverse-engineer the algorithms inside these systems,” says Nanda. “We give the model a prompt, like ‘Write a poem,’ and then it writes some rhyming lines. What is the algorithm by which it did this? We’d love to understand it.”

To find features—or categories of data that represent a larger concept—in its AI model, Gemma, DeepMind ran a tool known as a “sparse autoencoder” on each of its layers. You can think of a sparse autoencoder as a microscope that zooms in on those layers and lets you look at their details. For example, if you prompt Gemma about a chihuahua, it will trigger the “dogs” feature, lighting up what the model knows about “dogs.” The reason it is considered “sparse” is that it’s limiting the number of neurons used, basically pushing for a more efficient and generalized representation of the data.

The tricky part of sparse autoencoders is deciding how granular you want to get. Think again about the microscope. You can magnify something to an extreme degree, but it may make what you’re looking at impossible for a human to interpret. But if you zoom too far out, you may be limiting what interesting things you can see and discover. 

DeepMind’s solution was to run sparse autoencoders of different sizes, varying the number of features they want the autoencoder to find. The goal was not for DeepMind’s researchers to thoroughly analyze the results on their own. Gemma and the autoencoders are open-source, so this project was aimed more at spurring interested researchers to look at what the sparse autoencoders found and hopefully make new insights into the model’s internal logic. Since DeepMind ran autoencoders on each layer of their model, a researcher could map the progression from input to output to a degree we haven’t seen before.

“This is really exciting for interpretability researchers,” says Josh Batson, a researcher at Anthropic. “If you have this model that you’ve open-sourced for people to study, it means that a bunch of interpretability research can now be done on the back of those sparse autoencoders. It lowers the barrier to entry to people learning from these methods.”

Neuronpedia, a platform for mechanistic interpretability, partnered with DeepMind in July to build a demo of Gemma Scope that you can play around with right now. In the demo, you can test out different prompts and see how the model breaks up your prompt and what activations your prompt lights up. You can also mess around with the model. For example, if you turn the feature about dogs way up and then ask the model a question about US presidents, Gemma will find some way to weave in random babble about dogs, or the model may just start barking at you.

One interesting thing about sparse autoencoders is that they are unsupervised, meaning they find features on their own. That leads to surprising discoveries about how the models break down human concepts. “My personal favorite feature is the cringe feature,” says Joseph Bloom, science lead at Neuronpedia. “It seems to appear in negative criticism of text and movies. It’s just a great example of tracking things that are so human on some level.” 

You can search for concepts on Neuronpedia and it will highlight what features are being activated on specific tokens, or words, and how strongly each one is activated. “If you read the text and you see what’s highlighted in green, that’s when the model thinks the cringe concept is most relevant. The most active example for cringe is somebody preaching at someone else,” says Bloom.

Some features are proving easier to track than others. “One of the most important features that you would want to find for a model is deception,” says Johnny Lin, founder of Neuronpedia. “It’s not super easy to find: ‘Oh, there’s the feature that fires when it’s lying to us.’ From what I’ve seen, it hasn’t been the case that we can find deception and ban it.”

DeepMind’s research is similar to what another AI company, Anthropic, did back in May with Golden Gate Claude. It used sparse autoencoders to find the parts of Claude, their model, that lit up when discussing the Golden Gate Bridge in San Francisco. It then amplified the activations related to the bridge to the point where Claude literally identified not as Claude, an AI model, but as the physical Golden Gate Bridge and would respond to prompts as the bridge.

Although it may just seem quirky, mechanistic interpretability research may prove incredibly useful. “As a tool for understanding how the model generalizes and what level of abstraction it’s working at, these features are really helpful,” says Batson.

For example, a team lead by Samuel Marks, now at Anthropic, used sparse autoencoders to find features that showed a particular model was associating certain professions with a specific gender. They then turned off these gender features to reduce bias in the model. This experiment was done on a very small model, so it’s unclear if the work will apply to a much larger model.

Mechanistic interpretability research can also give us insights into why AI makes errors. In the case of the assertion that 9.11 is larger than 9.8, researchers from Transluce saw that the question was triggering the parts of an AI model related to Bible verses and September 11. The researchers concluded the AI could be interpreting the numbers as dates, asserting the later date, 9/11, as greater than 9/8. And in a lot of books like religious texts, section 9.11 comes after section 9.8, which may be why the AI thinks of it as greater. Once they knew why the AI made this error, the researchers tuned down the AI’s activations on Bible verses and September 11, which led to the model giving the correct answer when prompted again on whether 9.11 is larger than 9.8.

There are also other potential applications. Currently, a system-level prompt is built into LLMs to deal with situations like users who ask how to build a bomb. When you ask ChatGPT a question, the model is first secretly prompted by OpenAI to refrain from telling you how to make bombs or do other nefarious things. But it’s easy for users to jailbreak AI models with clever prompts, bypassing any restrictions. 

If the creators of the models are able to see where in an AI the bomb-building knowledge is, they can theoretically turn off those nodes permanently. Then even the most cleverly written prompt wouldn’t elicit an answer about how to build a bomb, because the AI would literally have no information about how to build a bomb in its system.

This type of granularity and precise control are easy to imagine but extremely hard to achieve with the current state of mechanistic interpretability. 

“A limitation is the steering [influencing a model by adjusting its parameters] is just not working that well, and so when you steer to reduce violence in a model, it ends up completely lobotomizing its knowledge in martial arts. There’s a lot of refinement to be done in steering,” says Lin. The knowledge of “bomb making,” for example, isn’t just a simple on-and-off switch in an AI model. It most likely is woven into multiple parts of the model, and turning it off would probably involve hampering the AI’s knowledge of chemistry. Any tinkering may have benefits but also significant trade-offs.

That said, if we are able to dig deeper and peer more clearly into the “mind” of AI, DeepMind and others are hopeful that mechanistic interpretability could represent a plausible path to alignment—the process of making sure AI is actually doing what we want it to do.

How this grassroots effort could make AI voices more diverse

We are on the cusp of a voice AI boom, with tech companies such as Apple and OpenAI rolling out the next generation of artificial-intelligence-powered assistants. But the default voices for these assistants are often white American—British, if you’re lucky—and most definitely speak English. They represent only a tiny proportion of the many dialects and accents in the English language, which spans many regions and cultures. And if you’re one of the billions of people who don’t speak English, bad luck: These tools don’t sound nearly as good in other languages.

This is because the data that has gone into training these models is limited. In AI research, most data used to train models is extracted from the English-language internet, which reflects Anglo-American culture. But there is a massive grassroots effort underway to change this status quo and bring more transparency and diversity to what AI sounds like: Mozilla’s Common Voice initiative. 

The data set Common Voice has created over the past seven years is one of the most useful resources for people wanting to build voice AI. It has seen a massive spike in downloads, partly thanks to the current AI boom; it recently hit the 5 million mark, up from 38,500 in 2020. Creating this data set has not been easy, mainly because the data collection relies on an army of volunteers. Their numbers have also jumped, from just under 500,000 in 2020 to over 900,000 in 2024. But by giving its data away, some members of this community argue, Mozilla is encouraging volunteers to effectively do free labor for Big Tech. 

Since 2017, volunteers for the Common Voice project have collected a total of 31,000 hours of voice data in around 180 languages as diverse as Russian, Catalan, and Marathi. If you’ve used a service that uses audio AI, it’s likely been trained at least partly on Common Voice. 

Mozilla’s cause is a noble one. As AI is integrated increasingly into our lives and the ways we communicate, it becomes more important that the tools we interact with sound like us. The technology could break down communication barriers and help convey information in a compelling way to, for example, people who can’t read. But instead, an intense focus on English risks entrenching a new colonial world order and wiping out languages entirely.

“It would be such an own goal if, rather than finally creating truly multimodal, multilingual, high-performance translation models and making a more multilingual world, we actually ended up forcing everybody to operate in, like, English or French,” says EM Lewis-Jong, a director for Common Voice. 

Common Voice is open source, which means anyone can see what has gone into the data set, and users can do whatever they want with it for free. This kind of transparency is unusual in AI data governance. Most large audio data sets simply aren’t publicly available, and many consist of data that has been scraped from sites like YouTube, according to research conducted by a team from the University of Washington, and Carnegie Mellon andNorthwestern universities. 

The vast majority of language data is collected by volunteers such as Bülent Özden, a researcher from Turkey. Since 2020, he has been not only donating his voice but also raising awareness around the project to get more people to donate. He recently spent two full-time months correcting data and checking for typos in Turkish. For him, improving AI models is not the only motivation to do this work. 

“I’m doing it to preserve cultures, especially low-resource [languages],” Özden says. He tells me he has recently started collecting samples of Turkey’s smaller languages, such as Circassian and Zaza.

However, as I dug into the data set, I noticed that the coverage of languages and accents is very uneven. There are only 22 hours of Finnish voices from 231 people. In comparison, the data set contains 3,554 hours of English from 94,665 speakers. Some languages, such as Korean and Punjabi, are even less well represented. Even though they have tens of millions of speakers, they account for only a couple of hours of recorded data. 

This imbalance has emerged because data collection efforts are started from the bottom up by language communities themselves, says Lewis-Jong. 

“We’re trying to give communities what they need to create their own AI training data sets. We have a particular focus on doing this for language communities where there isn’t any data, or where maybe larger tech organizations might not be that interested in creating those data sets,” Lewis-Jong says. They hope that with the help of volunteers and various bits of grant funding, the Common Voice data set will have close to 200 languages by the end of the year.

Common Voice’s permissive license means that many companies rely on it—for example, the Swedish startup Mabel AI, which builds translation tools for health-care providers. One of the first languages the company used was Ukrainian; it built a translation tool to help Ukrainian refugees interact with Swedish social services, says Karolina Sjöberg, Mabel AI’s founder and CEO. The team has since expanded to other languages, such as Arabic and Russian. 

The problem with a lot of other audio data is that it consists of people reading from books or texts. The result is very different from how people really speak, especially when they are distressed or in pain, Sjöberg says. Because anyone can submit sentences to Common Voice for others to read aloud, Mozilla’s data set also includes sentences that are more colloquial and feel more natural, she says.

Not that it is perfectly representative. The Mabel AI team soon found out that most voice data in the languages it needed was donated by younger men, which is fairly typical for the data set. 

“The refugees that we intended to use the app with were really anything but younger men,” Sjöberg says. “So that meant that the voice data that we needed did not quite match the voice data that we had.” The team started collecting its own voice data from Ukrainian women, as well as from elderly people. 

Unlike other data sets, Common Voice asks participants to share their gender and details about their accent. Making sure different genders are represented is important to fight bias in AI models, says Rebecca Ryakitimbo, a Common Voice fellow who created the project’s gender action plan. More diversity leads not only to better representation but also to better models. Systems that are trained on narrow and homogenous data tend to spew stereotyped and harmful results.

“We don’t want a case where we have a chatbot that is named after a woman but does not give the same response to a woman as it would a man,” she says. 

Ryakitimbo has collected voice data in Kiswahili in Tanzania, Kenya, and the Democratic Republic of Congo. She tells me she wanted to collect voices from a socioeconomically diverse set of Kiswahili speakers and has reached out to women young and old living in rural areas, who might not always be literate or even have access to devices. 

This kind of data collection is challenging. The importance of collecting AI voice data can feel abstract to many people, especially if they aren’t familiar with the technologies. Ryakitimbo and volunteers would approach women in settings where they felt safe to begin with, such as presentations on menstrual hygiene, and explain how the technology could, for example, help disseminate information about menstruation. For women who did not know how to read, the team read out sentences that they would repeat for the recording. 

The Common Voice project is bolstered by the belief that languages form a really important part of identity. “We think it’s not just about language, but about transmitting culture and heritage and treasuring people’s particular cultural context,” says Lewis-Jong. “There are all kinds of idioms and cultural catchphrases that just don’t translate,” they add. 

Common Voice is the only audio data set where English doesn’t dominate, says Willie Agnew, a researcher at Carnegie Mellon University who has studied audio data sets. “I’m very impressed with how well they’ve done that and how well they’ve made this data set that is actually pretty diverse,” Agnew says. “It feels like they’re way far ahead of almost all the other projects we looked at.” 

I spent some time verifying the recordings of other Finnish speakers on the Common Voice platform. As their voices echoed in my study, I felt surprisingly touched. We had all gathered around the same cause: making AI data more inclusive, and making sure our culture and language was properly represented in the next generation of AI tools. 

But I had some big questions about what would happen to my voice if I donated it. Once it was in the data set, I would have no control about how it might be used afterwards. The tech sector isn’t exactly known for giving people proper credit, and the data is available for anyone’s use. 

“As much as we want it to benefit the local communities, there’s a possibility that also Big Tech could make use of the same data and build something that then comes out as the commercial product,” says Ryakitimbo. Though Mozilla does not share who has downloaded Common Voice, Lewis-Jong tells me Meta and Nvidia have said that they have used it.

Open access to this hard-won and rare language data is not something all minority groups want, says Harry H. Jiang, a researcher at Carnegie Mellon University, who was part of the team doing audit research. For example, Indigenous groups have raised concerns. 

“Extractivism” is something that Mozilla has been thinking about a lot over the past 18 months, says Lewis-Jong. Later this year the project will work with communities to pilot alternative licenses including Nwulite Obodo Open Data License, which was created by researchers at the University of Pretoria for sharing African data sets more equitably. For example, people who want to download the data might be asked to write a request with details on how they plan to use it, and they might be allowed to license it only for certain products or for a limited time. Users might also be asked to contribute to community projects that support poverty reduction, says Lewis-Jong.  

Lewis-Jong says the pilot is a learning exercise to explore whether people will want data with alternative licenses, and whether they are sustainable for communities managing them. The hope is that it could lead to something resembling “open source 2.0.”

In the end, I decided to donate my voice. I received a list of phrases to say, sat in front of my computer, and hit Record. One day, I hope, my effort will help a company or researcher build voice AI that sounds less generic, and more like me. 

This story has been updated.

What Africa needs to do to become a major AI player

Kessel Okinga-Koumu paced around a crowded hallway. It was her first time presenting at the Deep Learning Indaba, she told the crowd gathered to hear her, filled with researchers from Africa’s machine-learning community. The annual weeklong conference (‘Indaba’ is a Zulu word for gathering), was held most recently in September at Amadou Mahtar Mbow University in Dakar, Senegal. It attracted over 700 attendees to hear about—and debate—the potential of Africa-centric AI and how it’s being deployed in agriculture, education, health care, and other critical sectors of the continent’s economy.     

A 28-year-old computer science student at the University of the Western Cape in Cape Town, South Africa, Okinga-Koumu spoke about how she’s tackling a common problem: the lack of lab equipment at her university. Lecturers have long been forced to use chalkboards or printed 2D representations of equipment to simulate practical lessons that need microscopes, centrifuges, or other expensive tools. “In some cases, they even ask students to draw the equipment during practical lessons,” she lamented. 

Okinga-Koumu pulled a phone from the pocket of her blue jeans and opened a prototype web app she’s built. Using VR and AI features, the app allows students to simulate using the necessary lab equipment—exploring 3D models of the tools in a real-world setting, like a classroom or lab. “Students could have detailed VR of lab equipment, making their hands-on experience more effective,” she said. 

Established in 2017, the Deep Learning Indaba now has chapters in 47 of the 55 African nations and aims to boost AI development across the continent by providing training and resources to African AI researchers like Okinga-Koumu. Africa is still early in the process of adopting AI technologies, but organizers say the continent is uniquely hospitable to it for several reasons, including a relatively young and increasingly well-educated population, a rapidly growing ecosystem of AI startups, and lots of potential consumers. 

“The building and ownership of AI solutions tailored to local contexts is crucial for equitable development,” says Shakir Mohamed, a senior research scientist at Google DeepMind and cofounder of the organization sponsoring the conference. Africa, more than other continents in the world, can address specific challenges with AI and will benefit immensely from its young talent, he says: “There is amazing expertise everywhere across the continent.” 

However, researchers’ ambitious efforts to develop AI tools that answer the needs of Africans face numerous hurdles. The biggest are inadequate funding and poor infrastructure. Not only is it very expensive to build AI systems, but research to provide AI training data in original African languages has been hamstrung by poor financing of linguistics departments at many African universities and the fact that citizens increasingly don’t speak or write local languages themselves. Limited internet access and a scarcity of domestic data centers also mean that developers might not be able to deploy cutting-edge AI capabilities.

Attendees of Deep Learning Indaba 2024 in session hall on their computers

DEEP LEARNING INDABA 2024

Complicating this further is a lack of overarching policies or strategies for harnessing AI’s immense benefits—and regulating its downsides. While there are various draft policy documents, researchers are in conflict over a continent-wide strategy. And they disagree about which policies would most benefit Africa, not the wealthy Western governments and corporations that have often funded technological innovation.

Taken together, researchers worry, these issues will hold Africa’s AI sector back and hamper its efforts to pave its own pathway in the global AI race.          

On the cusp of change

Africa’s researchers are already making the most of generative AI’s impressive capabilities. In South Africa, for instance, to help address the HIV epidemic, scientists have designed an app called Your Choice, powered by an LLM-based chatbot that interacts with people to obtain their sexual history without stigma or discrimination. In Kenya, farmers are using AI apps  to diagnose diseases in crops and increase productivity. And in Nigeria, Awarri, a newly minted AI startup, is trying to build the country’s first large language model, with the endorsement of the government, so that Nigerian languages can be integrated into AI tools. 

The Deep Learning Indaba is another sign of how Africa’s AI research scene is starting to flourish. At the Dakar meeting, researchers presented 150 posters and 62 papers. Of those, 30 will be published in top-tier journals, according to Mohamed. 

Meanwhile, an analysis of 1,646 publications in AI between 2013 and 2022 found “a significant increase in publications” from Africa. And Masakhane, a cousin organization to Deep Learning Indaba that pushes for natural-language-processing research in African languages, has released over 400 open-source models and 20 African-language data sets since it was founded in 2018. 

“These metrics speak a lot to the capacity building that’s happening,” says Kathleen Siminyu, a computer scientist from Kenya, who researches NLP tools for her native Kiswahili. “We’re starting to see a critical mass of people having basic foundational skills. They then go on to specialize.”      

She adds: “It’s like a wave that cannot be stopped.”   

Khadija Ba, a Senegalese entrepreneur and investor at the pan-African VC fund P1 Ventures who was at this year’s conference, says that she sees African AI startups as particularly attractive because their local approaches have potential to be scaled for the global market. African startups often build solutions in the absence of robust infrastructure, yet “these innovations work efficiently, making them adaptable to other regions facing similar challenges,” she says. 

In recent years, funding in Africa’s tech ecosystem has picked up: VC investment totaled $4.5 billion last year, more than double what it was just five years ago, according to a report by the African Private Capital Association. And this October, Google announced a $5.8 million commitment to support AI training initiatives in Kenya, Nigeria, and South Africa. But researchers say local funding remains sluggish. Take the Google-backed fund rolled out, also in October, in Nigeria, Africa’s most populous country. It will pay out $6,000 each to 10 AI startups—not even enough to purchase the equipment needed to power their systems.

Lilian Wanzare, a lecturer and NLP researcher at Maseno University in Kisumu, Kenya, bridles at African governments’ lackadaisical support for local AI initiatives and complains as well that the government charges exorbitant fees for access to publicly generated data, hindering data sharing and collaboration. “[We] researchers are just blocked,” she says. “The government is saying they’re willing to support us, but the structures have not been put in place for us.”

Language barriers 

Researchers who want to make Africa-centric AI don’t face just insufficient local investment and inaccessible data. There are major linguistic challenges, too.  

During one discussion at the Indaba, Ife Adebara, a Nigerian computational linguist, posed a question: “How many people can write a bachelor’s thesis in their native African language?” 

Zero hands went up. 

Then the audience disintegrated into laughter.   

Africans want AI to speak their local languages, but many Africans cannot speak and write in these languages themselves, Adebara said.      

Although Africa accounts for one-third of all languages in the world, many oral languages are slowly disappearing, their population of native speakers declining. And LLMs developed by Western-based tech companies fail to serve African languages; they don’t understand locally relevant context and culture. 

For Adebara and others researching NLP tools, the lack of people who have the ability to read and write in African languages poses a major hurdle to development of bespoke AI-enabled technologies. “Without literacy in our local languages, the future of AI in Africa is not as bright as we think,” she says.      

On top of all that, there’s little machine-readable data for African languages. One reason is that linguistic departments in public universities are poorly funded, Adebara says, limiting linguists’ participation in work that could create such data and benefit AI development. 

This year, she and her colleagues established EqualyzAI, a for-profit company seeking to preserve African languages through digital technology. They have built voice tools and AI models, covering about 517 African languages.       

Lelapa AI, a software company that’s building data sets and NLP tools for African languages, is also trying to address these language-specific challenges. Its cofounders met in 2017 at the first Deep Learning Indaba and launched the company in 2022. In 2023, it released its first AI tool, Vulavula, a speech-to-text program that recognizes several languages spoken in South Africa. 

This year, Lelapa AI released InkubaLM, a first-of-its-kind small language model that currently supports a range of African languages: IsiXhosa, Yoruba, Swahili, IsiZulu, and Hausa. InkubaLM can answer questions and perform tasks like English translation and sentiment analysis. In tests, it performed as well as some larger models. But it’s still in early stages. The hope is that InkubaLM will someday power Vulavula, says Jade Abbott, cofounder and chief operating officer of Lelapa AI. 

“It’s the first iteration of us really expressing our long-term vision of what we want, and where we see African AI in the future,” Abbott says. “What we’re really building is a small language model that punches above its weight.”

InkubaLM is trained on two open-source data sets with 1.9 billion tokens, built and curated by Masakhane and other African developers who worked with real people in local communities. They paid native speakers of languages to attend writing workshops to create data for their model.

Fundamentally, this approach will always be better, says Wanzare, because it’s informed by people who represent the language and culture.

A clash over strategy

Another issue that came up again and again at the Indaba was that Africa’s AI scene lacks the sort of regulation and support from governments that you find elsewhere in the world—in Europe, the US, China, and, increasingly, the Middle East. 

Of the 55 African nations, only seven—Senegal, Egypt, Mauritius, Rwanda, Algeria, Nigeria, and Benin—have developed their own formal AI strategies. And many of those are still in the early stages.  

A major point of tension at the Indaba, though, was the regulatory framework that will govern the approach to AI across the entire continent. In March, the African Union Development Agency published a white paper, developed over a three-year period, that lays out this strategy. The 200-page document includes recommendations for industry codes and practices, standards to assess and benchmark AI systems, and a blueprint of AI regulations for African nations to adopt. The hope is that it will be endorsed by the heads of African governments in February 2025 and eventually passed by the African Union.  

But in July, the African Union Commission in Addis Ababa, Ethiopia, another African governing body that wields more power than the development agency, released a rival continental AI strategy—a 66-page document that diverges from the initial white paper. 

It’s unclear what’s behind the second strategy, but Seydina Ndiaye, a program director at the Cheikh Hamidou Kane Digital University in Dakar who helped draft the development agency’s white paper, claims it was drafted by a tech lobbyist from Switzerland. The commission’s strategy calls for African Union member states to declare AI a national priority, promote AI startups, and develop regulatory frameworks to address safety and security challenges. But Ndiaye expressed concerns that the document does not reflect the perspectives, aspirations, knowledge, and work of grassroots African AI communities. “It’s a copy-paste of what’s going on outside the continent,” he says.               

Vukosi Marivate, a computer scientist at the University of Pretoria in South Africa who helped found the Deep Learning Indaba and is known as an advocate for the African machine-learning movement, expressed fury over this turn of events at the conference. “These are things we shouldn’t accept,” he declared. The room full of data wonks, linguists, and international funders brimmed with frustration. But Marivate encouraged the group to forge ahead with building AI that benefits Africans: “We don’t have to wait for the rules to act right,” he said.  

Barbara Glover, a program manager for the African Union Development Agency, acknowledges that AI researchers are angry and frustrated. There’s been a push to harmonize the two continental AI strategies, but she says the process has been fractious: “That engagement didn’t go as envisioned.” Her agency plans to keep its own version of the continental AI strategy, Glover says, adding that it was developed by African experts rather than outsiders. “We are capable, as Africans, of driving our own AI agenda,” she says.       

crowd of attendees mingle around display booths at Deep Learning Indaba 2024. Booth signs for Mila, Meta and OpenAI can be seen in the frame.

DEEP LEARNING INDABA 2024

This all speaks to a broader tension over foreign influence in the African AI scene, one that goes beyond any single strategic document. Mirroring the skepticism toward the African Union Commission strategy, critics say the Deep Learning Indaba is tainted by its reliance on funding from big foreign tech companies; roughly 50% of its $500,000 annual budget comes from international donors and the rest from corporations like Google DeepMind, Apple, Open AI, and Meta. They argue that this cash could pollute the Indaba’s activities and influence the topics and speakers chosen for discussion. 

But Mohamed, the Indaba cofounder who is a researcher at Google DeepMind, says that “almost all that goes back to our beneficiaries across the continent,” and the organization helps connect them to training opportunities in tech companies. He says it benefits from some of its cofounders’ ties with these companies but that they do not set the agenda.

Ndiaye says that the funding is necessary to keep the conference going. “But we need to have more African governments involved,” he says.     

To Timnit Gebru, founder and executive director at the nonprofit Distributed AI Research Institute (DAIR), which supports equitable AI research in Africa, the angst about foreign funding for AI development comes down to skepticism of exploitative, profit-driven international tech companies. “Africans [need] to do something different and not replicate the same issues we’re fighting against,” Gebru says. She warns about the pressure to adopt “AI for everything in Africa,” adding that there’s “a lot of push from international development organizations” to use AI as an “antidote” for all Africa’s challenges.       

Siminyu, who is also a researcher at DAIR, agrees with that view. She hopes that African governments will fund and work with people in Africa to build AI tools that reach underrepresented communities—tools that can be used in positive ways and in a context that works for Africans. “We should be afforded the dignity of having AI tools in a way that others do,” she says.     

Why AI could eat quantum computing’s lunch

Tech companies have been funneling billions of dollars into quantum computers for years. The hope is that they’ll be a game changer for fields as diverse as finance, drug discovery, and logistics.

Those expectations have been especially high in physics and chemistry, where the weird effects of quantum mechanics come into play. In theory, this is where quantum computers could have a huge advantage over conventional machines.

But while the field struggles with the realities of tricky quantum hardware, another challenger is making headway in some of these most promising use cases. AI is now being applied to fundamental physics, chemistry, and materials science in a way that suggests quantum computing’s purported home turf might not be so safe after all.

The scale and complexity of quantum systems that can be simulated using AI is advancing rapidly, says Giuseppe Carleo, a professor of computational physics at the Swiss Federal Institute of Technology (EPFL). Last month, he coauthored a paper published in Science showing that neural-network-based approaches are rapidly becoming the leading technique for modeling materials with strong quantum properties. Meta also recently unveiled an AI model trained on a massive new data set of materials that has jumped to the top of a leaderboard for machine-learning approaches to material discovery.

Given the pace of recent advances, a growing number of researchers are now asking whether AI could solve a substantial chunk of the most interesting problems in chemistry and materials science before large-scale quantum computers become a reality. 

“The existence of these new contenders in machine learning is a serious hit to the potential applications of quantum computers,” says Carleo “In my opinion, these companies will find out sooner or later that their investments are not justified.”

Exponential problems

The promise of quantum computers lies in their potential to carry out certain calculations much faster than conventional computers. Realizing this promise will require much larger quantum processors than we have today. The biggest devices have just crossed the thousand-qubit mark, but achieving an undeniable advantage over classical computers will likely require tens of thousands, if not millions. Once that hardware is available, though, a handful of quantum algorithms, like the encryption-cracking Shor’s algorithm, have the potential to solve problems exponentially faster than classical algorithms can. 

But for many quantum algorithms with more obvious commercial applications, like searching databases, solving optimization problems, or powering AI, the speed advantage is more modest. And last year, a paper coauthored by Microsoft’s head of quantum computing, Matthias Troyer, showed that these theoretical advantages disappear if you account for the fact that quantum hardware operates orders of magnitude slower than modern computer chips. The difficulty of getting large amounts of classical data in and out of a quantum computer is also a major barrier. 

So Troyer and his colleagues concluded that quantum computers should instead focus on problems in chemistry and materials science that require simulation of systems where quantum effects dominate. A computer that operates along the same quantum principles as these systems should, in theory, have a natural advantage here. In fact, this has been a driving idea behind quantum computing ever since the renowned physicist Richard Feynman first proposed the idea.

The rules of quantum mechanics govern many things with huge practical and commercial value, like proteins, drugs, and materials. Their properties are determined by the interactions of their constituent particles, in particular their electrons—and simulating these interactions in a computer should make it possible to predict what kinds of characteristics a molecule will exhibit. This could prove invaluable for discovering things like new medicines or more efficient battery chemistries, for example. 

But the intuition-defying rules of quantum mechanics—in particular, the phenomenon of entanglement, which allows the quantum states of distant particles to become intrinsically linked—can make these interactions incredibly complex. Precisely tracking them requires complicated math that gets exponentially tougher the more particles are involved. That can make simulating large quantum systems intractable on classical machines.

This is where quantum computers could shine. Because they also operate on quantum principles, they are able to represent quantum states much more efficiently than is possible on classical machines. They could also take advantage of quantum effects to speed up their calculations.

But not all quantum systems are the same. Their complexity is determined by the extent to which their particles interact, or correlate, with each other. In systems where these interactions are strong, tracking all these relationships can quickly explode the number of calculations required to model the system. But in most that are of practical interest to chemists and materials scientists, correlation is weak, says Carleo. That means their particles don’t affect each other’s behavior significantly, which makes the systems far simpler to model.

The upshot, says Carleo, is that quantum computers are unlikely to provide any advantage for most problems in chemistry and materials science. Classical tools that can accurately model weakly correlated systems already exist, the most prominent being density functional theory (DFT). The insight behind DFT is that all you need to understand a system’s key properties is its electron density, a measure of how its electrons are distributed in space. This makes for much simpler computation but can still provide accurate results for weakly correlated systems.

Simulating large systems using these approaches requires considerable computing power. But in recent years there’s been an explosion of research using DFT to generate data on chemicals, biomolecules, and materials—data that can be used to train neural networks. These AI models learn patterns in the data that allow them to predict what properties a particular chemical structure is likely to have, but they are orders of magnitude cheaper to run than conventional DFT calculations. 

This has dramatically expanded the size of systems that can be modeled—to as many as 100,000 atoms at a time—and how long simulations can run, says Alexandre Tkatchenko, a physics professor at the University of Luxembourg. “It’s wonderful. You can really do most of chemistry,” he says.

Olexandr Isayev, a chemistry professor at Carnegie Mellon University, says these techniques are already being widely applied by companies in chemistry and life sciences. And for researchers, previously out of reach problems such as optimizing chemical reactions, developing new battery materials, and understanding protein binding are finally becoming tractable.

As with most AI applications, the biggest bottleneck is data, says Isayev. Meta’s recently released materials data set was made up of DFT calculations on 118 million molecules. A model trained on this data achieved state-of-the-art performance, but creating the training material took vast computing resources, well beyond what’s accessible to most research teams. That means fulfilling the full promise of this approach will require massive investment.

Modeling a weakly correlated system using DFT is not an exponentially scaling problem, though. This suggests that with more data and computing resources, AI-based classical approaches could simulate even the largest of these systems, says Tkatchenko. Given that quantum computers powerful enough to compete are likely still decades away, he adds, AI’s current trajectory suggests it could reach important milestones, such as precisely simulating how drugs bind to a protein, much sooner.

Strong correlations

When it comes to simulating strongly correlated quantum systems—ones whose particles interact a lot—methods like DFT quickly run out of steam. While more exotic, these systems include materials with potentially transformative capabilities, like high-temperature superconductivity or ultra-precise sensing. But even here, AI is making significant strides.

In 2017, EPFL’s Carleo and Microsoft’s Troyer published a seminal paper in Science showing that neural networks could model strongly correlated quantum systems. The approach doesn’t learn from data in the classical sense. Instead, Carleo says, it is similar to DeepMind’s AlphaZero model, which mastered the games of Go, chess, and shogi using nothing more than the rules of each game and the ability to play itself.

In this case, the rules of the game are provided by Schrödinger’s equation, which can precisely describe a system’s quantum state, or wave function. The model plays against itself by arranging particles in a certain configuration and then measuring the system’s energy level. The goal is to reach the lowest energy configuration (known as the ground state), which determines the system’s properties. The model repeats this process until energy levels stop falling, indicating that the ground state—or something close to it—has been reached.

The power of these models is their ability to compress information, says Carleo. “The wave function is a very complicated mathematical object,” he says. “What has been shown by several papers now is that [the neural network] is able to capture the complexity of this object in a way that can be handled by a classical machine.”

Since the 2017 paper, the approach has been extended to a wide range of strongly correlated systems, says Carleo, and results have been impressive. The Science paper he published with colleagues last month put leading classical simulation techniques to the test on a variety of tricky quantum simulation problems, with the goal of creating a benchmark to judge advances in both classical and quantum approaches.

Carleo says that neural-network-based techniques are now the best approach for simulating many of the most complex quantum systems they tested. “Machine learning is really taking the lead in many of these problems,” he says.

These techniques are catching the eye of some big players in the tech industry. In August, researchers at DeepMind showed in a paper in Science that they could accurately model excited states in quantum systems, which could one day help predict the behavior of things like solar cells, sensors, and lasers. Scientists at Microsoft Research have also developed an open-source software suite to help more researchers use neural networks for simulation.

One of the main advantages of the approach is that it piggybacks on massive investments in AI software and hardware, says Filippo Vicentini, a professor of AI and condensed-matter physics at École Polytechnique in France, who was also a coauthor on the Science benchmarking paper: “Being able to leverage these kinds of technological advancements gives us a huge edge.”

There is a caveat: Because the ground states are effectively found through trial and error rather than explicit calculations, they are only approximations. But this is also why the approach could make progress on what has looked like an intractable problem, says Juan Carrasquilla, a researcher at ETH Zurich, and another coauthor on the Science benchmarking paper.

If you want to precisely track all the interactions in a strongly correlated system, the number of calculations you need to do rises exponentially with the system’s size. But if you’re happy with an answer that is just good enough, there’s plenty of scope for taking shortcuts. 

“Perhaps there’s no hope to capture it exactly,” says Carrasquilla. “But there’s hope to capture enough information that we capture all the aspects that physicists care about. And if we do that, it’s basically indistinguishable from a true solution.”

And while strongly correlated systems are generally too hard to simulate classically, there are notable instances where this isn’t the case. That includes some systems that are relevant for modeling high-temperature superconductors, according to a 2023 paper in Nature Communications.

“Because of the exponential complexity, you can always find problems for which you can’t find a shortcut,” says Frank Noe, research manager at Microsoft Research, who has led much of the company’s work in this area. “But I think the number of systems for which you can’t find a good shortcut will just become much smaller.”

No magic bullets

However, Stefanie Czischek, an assistant professor of physics at the University of Ottawa, says it can be hard to predict what problems neural networks can feasibly solve. For some complex systems they do incredibly well, but then on other seemingly simple ones, computational costs balloon unexpectedly. “We don’t really know their limitations,” she says. “No one really knows yet what are the conditions that make it hard to represent systems using these neural networks.”

Meanwhile, there have also been significant advances in other classical quantum simulation techniques, says Antoine Georges, director of the Center for Computational Quantum Physics at the Flatiron Institute in New York, who also contributed to the recent Science benchmarking paper. “They are all successful in their own right, and they are also very complementary,” he says. “So I don’t think these machine-learning methods are just going to completely put all the other methods out of business.”

Quantum computers will also have their niche, says Martin Roetteler, senior director of quantum solutions at IonQ, which is developing quantum computers built from trapped ions. While he agrees that classical approaches will likely be sufficient for simulating weakly correlated systems, he’s confident that some large, strongly correlated systems will be beyond their reach. “The exponential is going to bite you,” he says. “There are cases with strongly correlated systems that we cannot treat classically. I’m strongly convinced that that’s the case.”

In contrast, he says, a future fault-tolerant quantum computer with many more qubits than today’s devices will be able to simulate such systems. This could help find new catalysts or improve understanding of metabolic processes in the body—an area of interest to the pharmaceutical industry.

Neural networks are likely to increase the scope of problems that can be solved, says Jay Gambetta, who leads IBM’s quantum computing efforts, but he’s unconvinced they’ll solve the hardest challenges businesses are interested in.

“That’s why many different companies that essentially have chemistry as their requirement are still investigating quantum—because they know exactly where these approximation methods break down,” he says.

Gambetta also rejects the idea that the technologies are rivals. He says the future of computing is likely to involve a hybrid of the two approaches, with quantum and classical subroutines working together to solve problems. “I don’t think they’re in competition. I think they actually add to each other,” he says.

But Scott Aaronson, who directs the Quantum Information Center at the University of Texas, says machine-learning approaches are directly competing against quantum computers in areas like quantum chemistry and condensed-matter physics. He predicts that a combination of machine learning and quantum simulations will outperform purely classical approaches in many cases, but that won’t become clear until larger, more reliable quantum computers are available.

“From the very beginning, I’ve treated quantum computing as first and foremost a scientific quest, with any industrial applications as icing on the cake,” he says. “So if quantum simulation turns out to beat classical machine learning only rarely, I won’t be quite as crestfallen as some of my colleagues.”

One area where quantum computers look likely to have a clear advantage is in simulating how complex quantum systems evolve over time, says EPFL’s Carleo. This could provide invaluable insights for scientists in fields like statistical mechanics and high-energy physics, but it seems unlikely to lead to practical uses in the near term. “These are more niche applications that, in my opinion, do not justify the massive investments and the massive hype,” Carleo adds.

Nonetheless, the experts MIT Technology Review spoke to said a lack of commercial applications is not a reason to stop pursuing quantum computing, which could lead to fundamental scientific breakthroughs in the long run.

“Science is like a set of nested boxes—you solve one problem and you find five other problems,” says Vicentini. “The complexity of the things we study will increase over time, so we will always need more powerful tools.”

How ChatGPT search paves the way for AI agents

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

OpenAI’s Olivier Godement, head of product for its platform, and Romain Huet, head of developer experience, are on a whistle-stop tour around the world. Last week, I sat down with the pair in London before DevDay, the company’s annual developer conference. London’s DevDay is the first one for the company outside San Francisco. Godement and Huet are heading to Singapore next. 

It’s been a busy few weeks for the company. In London, OpenAI announced updates to its new Realtime API platform, which allows developers to build voice features into their applications. The company is rolling out new voices and a function that lets developers generate prompts, which will allow them to build apps and more helpful voice assistants more quickly. Meanwhile for consumers, OpenAI announced it was launching ChatGPT search, which allows users to search the internet using the chatbot. Read more here

Both developments pave the way for the next big thing in AI: agents. These are AI assistants that can complete complex chains of tasks, such as booking flights. (You can read my explainer on agents here.) 

“Fast-forward a few years—every human on Earth, every business, has an agent. That agent knows you extremely well. It knows your preferences,” Godement says. The agent will have access to your emails, apps, and calendars and will act like a chief of staff, interacting with each of these tools and even working on long-term problems, such as writing a paper on a particular topic, he says. 

OpenAI’s strategy is to both build agents itself and allow developers to use its software to build their own agents, says Godement. Voice will play an important role in what agents will look and feel like. 

“At the moment most of the apps are chat based … which is cool, but not suitable for all use cases. There are some use cases where you’re not typing, not even looking at the screen, and so voice essentially has a much better modality for that,” he says. 

But there are two big hurdles that need to be overcome before agents can become a reality, Godement says. 

The first is reasoning. Building AI agents requires us to be able to trust that they will be able to complete complex tasks and do the right things, says Huet. That’s where OpenAI “reasoning” feature comes in. Introduced in OpenAI’s o1 model last month, it uses reinforcement learning to teach the model how to process information using “chain of thought.” Giving the model more time to generate answers allows it to recognize and correct mistakes, break down problems into smaller ones, and try different approaches to answering questions, Godement says. 

But OpenAI’s claims about reasoning should be taken with a pinch of salt, says Chirag Shah, a computer science professor at the University of Washington. Large language models are not exhibiting true reasoning. It’s most likely that they have picked up what looks like logic from something they’ve seen in their training data.

“These models sometimes seem to be really amazing at reasoning, but it’s just like they’re really good at pretending, and it only takes a little bit of picking at them to break them,” he says.

There is still much more work to be done, Godement admits. In the short term, AI models such as o1 need to be much more reliable, faster, and cheaper. In the long term, the company needs to apply its chain-of-thought technique to a wider pool of use cases. OpenAI has focused on science, coding, and math. Now it wants to address other fields, such as law, accounting, and economics, he says. 

Second on the to-do list is the ability to connect different tools, Godement says. An AI model’s capabilities will be limited if it has to rely on its training data alone. It needs to be able to surf the web and look for up-to-date information. ChatGPT search is one powerful way OpenAI’s new tools can now do that. 

These tools need to be able not only to retrieve information but to take actions in the real world. Competitor Anthropic announced a new feature where its Claude chatbot can “use” a computer by interacting with its interface to click on things, for example. This is an important feature for agents if they are going to be able to execute tasks like booking flights. Godement says o1 can “sort of” use tools, though not very reliably, and that research on tool use is a “promising development.” 

In the next year, Godemont says, he expects the adoption of AI for customer support and other assistant-based tasks to grow. However, he says that it can be hard to predict how people will adopt and use OpenAI’s technology. 

“Frankly, looking back every year, I’m surprised by use cases that popped up that I did not even anticipate,” he says. “I expect there will be quite a few surprises that you know none of us could predict.” 


Now read the rest of The Algorithm

Deeper Learning

This AI-generated version of Minecraft may represent the future of real-time video generation

When you walk around in a version of the video game Minecraft from the AI companies Decart and Etched, it feels a little off. Sure, you can move forward, cut down a tree, and lay down a dirt block, just like in the real thing. If you turn around, though, the dirt block you just placed may have morphed into a totally new environment. That doesn’t happen in Minecraft. But this new version is entirely AI-generated, so it’s prone to hallucinations. Not a single line of code was written.

Ready, set, go: This version of Minecraft is generated in real time, using a technique known as next-frame prediction. The AI companies behind it did this by training their model, Oasis, on millions of hours of Minecraft game play and recordings of the corresponding actions a user would take in the game. The AI is able to sort out the physics, environments, and controls of Minecraft from this data alone. Read more from Scott J. Mulligan.

Bits and Bytes

AI search could break the web
At its best, AI search can better infer a user’s intent, amplify quality content, and synthesize information from diverse sources. But if AI search becomes our primary portal to the web, it threatens to disrupt an already precarious digital economy, argues Benjamin Brooks, a fellow at the Berkman Klein Center at Harvard University, who used to lead public policy for Stability AI. (MIT Technology Review

AI will add to the e-waste problem. Here’s what we can do about it.
Equipment used to train and run generative AI models could produce up to 5 million tons of e-waste by 2030, a relatively small but significant fraction of the global total. (MIT Technology Review

How an “interview” with a dead luminary exposed the pitfalls of AI
A state-funded radio station in Poland fired its on-air talent and brought in AI-generated presenters. But the experiment caused an outcry and was stopped when tone of them  “interviewed” a dead Nobel laureate. (The New York Times

Meta says yes, please, to more AI-generated slop
In Meta’s latest earnings call, CEO Mark Zuckerberg said we’re likely to see 
“a whole new category of content, which is AI generated or AI summarized content or kind of existing content pulled together by AI in some way.” Zuckerberg added that he thinks “that’s going to be just very exciting.” (404 Media

Chasing AI’s value in life sciences

Inspired by an unprecedented opportunity, the life sciences sector has gone all in on AI. For example, in 2023, Pfizer introduced an internal generative AI platform expected to deliver $750 million to $1 billion in value. And Moderna partnered with OpenAI in April 2024, scaling its AI efforts to deploy ChatGPT Enterprise, embedding the tool’s capabilities across business functions from legal to research.

In drug development, German pharmaceutical company Merck KGaA has partnered with several AI companies for drug discovery and development. And Exscientia, a pioneer in using AI in drug discovery, is taking more steps toward integrating generative AI drug design with robotic lab automation in collaboration with Amazon Web Services (AWS).

Given rising competition, higher customer expectations, and growing regulatory challenges, these investments are crucial. But to maximize their value, leaders must carefully consider how to balance the key factors of scope, scale, speed, and human-AI collaboration.

The early promise of connecting data

The common refrain from data leaders across all industries—but specifically from those within data-rich life sciences organizations—is “I have vast amounts of data all over my organization, but the people who need it can’t find it.” says Dan Sheeran, general manager of health care and life sciences for AWS. And in a complex healthcare ecosystem, data can come from multiple sources including hospitals, pharmacies, insurers, and patients.

“Addressing this challenge,” says Sheeran, “means applying metadata to all existing data and then creating tools to find it, mimicking the ease of a search engine. Until generative AI came along, though, creating that metadata was extremely time consuming.”

ZS’s global head of the digital and technology practice, Mahmood Majeed notes that his teams regularly work on connected data programs, because “connecting data to enable connected decisions across the enterprise gives you the ability to create differentiated experiences.”

Majeed points to Sanofi’s well-publicized example of connecting data with its analytics app, plai, which streamlines research and automates time-consuming data tasks. With this investment, Sanofi reports reducing research processes from weeks to hours and the potential to improve target identification in therapeutic areas like immunology, oncology, or neurology by 20% to 30%.

Achieving the payoff of personalization

Connected data also allows companies to focus on personalized last-mile experiences. This involves tailoring interactions with healthcare providers and understanding patients’ individual motivations, needs, and behaviors.

Early efforts around personalization have relied on “next best action” or “next best engagement” models to do this. These traditional machine learning (ML) models suggest the most appropriate information for field teams to share with healthcare providers, based on predetermined guidelines.

When compared with generative AI models, more traditional machine learning models can be inflexible, unable to adapt to individual provider needs, and they often struggle to connect with other data sources that could provide meaningful context. Therefore, the insights can be helpful but limited.  

Sheeran notes that companies have a real opportunity to improve their ability to gain access to connected data for better decision-making processes, “Because the technology is generative, it can create context based on signals. How does this healthcare provider like to receive information? What insights can we draw about the questions they’re asking? Can their professional history or past prescribing behavior help us provide a more contextualized answer? This is exactly what generative AI is great for.”

Beyond this, pharmaceutical companies spend millions of dollars annually to customize marketing materials. They must ensure the content is translated, tailored to the audience and consistent with regulations for each location they offer products and services. A process that usually takes weeks to develop individual assets has become a perfect use case for generative copy and imagery. With generative AI, the process is reduced to from weeks to minutes and creates competitive advantage with lower costs per asset, Sheeran says.

Accelerating drug discovery with AI, one step at a time

Perhaps the greatest hope for AI in life sciences is its ability to generate insights and intellectual property using biology-specific foundation models. Sheeran says, “our customers have seen the potential for very, very large models to greatly accelerate certain discrete steps in the drug discovery and development processes.” He continues, “Now we have a much broader range of models available, and an even larger set of models coming that tackle other discrete steps.”

By Sheeran’s count, there are approximately six major categories of biology-specific models, each containing five to 25 models under development or already available from universities and commercial organizations.

The intellectual property generated by biology-specific models is a significant consideration, supported by services such as Amazon Bedrock, which ensures customers retain control over their data, with transparency and safeguards to prevent unauthorized retention and misuse.

Finding differentiation in life sciences with scope, scale, and speed

Organizations can differentiate with scope, scale, and speed, while determining how AI can best augment human ingenuity and judgment. “Technology has become so easy to access. It’s omnipresent. What that means is that it’s no longer a differentiator on its own,” says Majeed. He suggests that life sciences leaders consider:

Scope: Have we zeroed in on the right problem? By clearly articulating the problem relative to the few critical things that could drive advantage, organizations can identify technology and business collaborators and set standards for measuring success and driving tangible results.

Scale: What happens when we implement a technology solution on a large scale? The highest-priority AI solutions should be the ones with the most potential for results.Scale determines whether an AI initiative will have a broader, more widespread impact on a business, which provides the window for a greater return on investment, says Majeed.

By thinking through the implications of scale from the beginning, organizations can be clear on the magnitude of change they expect and how bold they need to be to achieve it. The boldest commitment to scale is when companies go all in on AI, as Sanofi is doing, setting goals to transform the entire value chain and setting the tone from the very top.

Speed: Are we set up to quickly learn and correct course? Organizations that can rapidly learn from their data and AI experiments, adjust based on those learnings, and continuously iterate are the ones that will see the most success. Majeed emphasizes, “Don’t underestimate this component; it’s where most of the work happens. A good partner will set you up for quick wins, keeping your teams learning and maintaining momentum.”

Sheeran adds, “ZS has become a trusted partner for AWS because our customers trust that they have the right domain expertise. A company like ZS has the ability to focus on the right uses of AI because they’re in the field and on the ground with medical professionals giving them the ability to constantly stay ahead of the curve by exploring the best ways to improve their current workflows.”

Human-AI collaboration at the heart

Despite the allure of generative AI, the human element is the ultimate determinant of how it’s used. In certain cases, traditional technologies outperform it, with less risk, so understanding what it’s good for is key. By cultivating broad technology and AI fluency throughout the organization, leaders can teach their people to find the most powerful combinations of human-AI collaboration for technology solutions that work. After all, as Majeed says, “it’s all about people—whether it’s customers, patients, or our own employees’ and users’ experiences.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

OpenAI brings a new web search tool to ChatGPT

ChatGPT can now search the web for up-to-date answers to a user’s queries, OpenAI announced today. 

Until now, ChatGPT was mostly restricted to generating answers from its training data, which is current up to October 2023 for GPT-4o, and had limited web search capabilities. Searches about generalized topics will still draw on this information from the model itself, but now ChatGPT will automatically search the web in response to queries about recent information such as sports, stocks, or news of the day, and can deliver rich multi-media results. Users can also manually trigger a web search, but for the most part, the chatbot will make its own decision about when an answer would benefit from information taken from the web, says Adam Fry, OpenAI’s product lead for search.

“Our goal is to make ChatGPT the smartest assistant, and now we’re really enhancing its capabilities in terms of what it has access to from the web,” Fry tells MIT Technology Review. The feature is available today for the chatbot’s paying users. 

ChatGPT triggers a web search when the user asks about local restaurants in this example

While ChatGPT search, as it is known, is initially available to paying customers, OpenAI intends to make it available for free later, even when people are logged out. The company also plans to combine search with its voice features and Canvas, its interactive platform for coding and writing, although these capabilities will not be available in today’s initial launch.

The company unveiled a standalone prototype of web search in July. Those capabilities are now built directly into the chatbot. OpenAI says it has “brought the best of the SearchGPT experience into ChatGPT.” 

OpenAI is the latest tech company to debut an AI-powered search assistant, challenging similar tools from competitors such as Google, Microsoft, and startup Perplexity. Meta, too, is reportedly developing its own AI search engine. As with Perplexity’s interface, users of ChatGPT search can interact with the chatbot in natural language, and it will offer an AI-generated answer with sources and links to further reading. In contrast, Google’s AI Overviews offer a short AI-generated summary at the top of the website, as well as a traditional list of indexed links. 

These new tools could eventually challenge Google’s 90% market share in online search. AI search is a very important way to draw more users, says Chirag Shah, a professor at the University of Washington, who specializes in online search. But he says it is unlikely to chip away at Google’s search dominance. Microsoft’s high-profile attempt with Bing barely made a dent in the market, Shah says. 

Instead, OpenAI is trying to create a new market for more powerful and interactive AI agents, which can take complex actions in the real world, Shah says. 

The new search function in ChatGPT is a step toward these agents. 

It can also deliver highly contextualized responses that take advantage of chat histories, allowing users to go deeper in a search. Currently, ChatGPT search is able to recall conversation histories and continue the conversation with questions on the same topic. 

ChatGPT itself can also remember things about users that it can use later —sometimes it does this automatically, or you can ask it to remember something. Those “long-term” memories affect how it responds to chats. Search doesn’t have this yet—a new web search starts from scratch— but it should get this capability in the “next couple of quarters,” says Fry. When it does, OpenAI says it will allow it to deliver far more personalized results based on what it knows.

“Those might be persistent memories, like ‘I’m a vegetarian,’ or it might be contextual, like ‘I’m going to New York in the next few days,’” says Fry. “If you say ‘I’m going to New York in four days,’ it can remember that fact and the nuance of that point,” he adds. 

To help develop ChatGPT’s web search, OpenAI says it leveraged its partnerships with news organizations such as Reuters, the Atlantic, Le Monde, the Financial Times, Axel Springer, Condé Nast, and Time. However, its results include information not only from these publishers, but any other source online that does not actively block its search crawler.   

It’s a positive development that ChatGPT will now be able to retrieve information from these reputable online sources and generate answers based on them, says Suzan Verberne, a professor of natural-language processing at Leiden University, who has studied information retrieval. It also allows users to ask follow-up questions.

But despite the enhanced ability to search the web and cross-check sources, the tool is not immune from the persistent tendency of AI language models to make things up or get it wrong. When MIT Technology Review tested the new search function and asked it for vacation destination ideas, ChatGPT suggested “luxury European destinations” such as Japan, Dubai, the Caribbean islands, Bali, the Seychelles, and Thailand. It offered as a source an article from the Times, a British newspaper, which listed these locations as well as those in Europe as luxury holiday options.

“Especially when you ask about untrue facts or events that never happened, the engine might still try to formulate a plausible response that is not necessarily correct,” says Verberne. There is also a risk that misinformation might seep into ChatGPT’s answers from the internet if the company has not filtered its sources well enough, she adds. 

Another risk is that the current push to access the web through AI search will disrupt the internet’s digital economy, argues Benjamin Brooks, a fellow at Harvard University’s Berkman Klein Center, who previously led public policy for Stability AI, in an op-ed published by MIT Technology Review today.

“By shielding the web behind an all-knowing chatbot, AI search could deprive creators of the visits and ‘eyeballs’ they need to survive,” Brooks writes.

This AI-generated version of Minecraft may represent the future of real-time video generation

When you walk around in a version of the video game Minecraft from the AI companies Decart and Etched, it feels a little off. Sure, you can move forward, cut down a tree, and lay down a dirt block, just like in the real thing. If you turn around, though, the dirt block you just placed may have morphed into a totally new environment. That doesn’t happen in Minecraft. But this new version is entirely AI-generated, so it’s prone to hallucinations. Not a single line of code was written.

For Decart and Etched, this demo is a proof of concept. They imagine that the technology could be used for real-time generation of videos or video games more generally. “Your screen can turn into a portal—into some imaginary world that doesn’t need to be coded, that can be changed on the fly. And that’s really what we’re trying to target here,” says Dean Leitersdorf, cofounder and CEO of Decart, which came out of stealth this week.

Their version of Minecraft is generated in real time, in a technique known as next-frame prediction. They did this by training their model, Oasis, on millions of hours of Minecraft gameplay and recordings of the corresponding actions a user would take in the game. The AI is able to sort out the physics, environments, and controls of Minecraft from this data alone. 

The companies acknowledge that their version of Minecraft is a little wonky. The resolution is quite low, you can only play for minutes at a time, and it’s prone to hallucinations like the one described above. But they believe that with innovations in chip design and further improvements, there’s no reason they can’t develop a high-fidelity version of Minecraft, or really any game. 

“What if you could say ‘Hey, add a flying unicorn here’? Literally, talk to the model. Or ‘Turn everything here into medieval ages,’ and then, boom, it’s all medieval ages. Or ‘Turn this into Star Wars,’ and it’s all Star Wars,” says Leitersdorf.

A major limitation right now is hardware. They relied on Nvidia cards for their current demo, but in the future, they plan to use Sohu, a new card that Etched has in development, which the firm claims will improve performance by a factor of 10. This gain would significantly cut down on the cost and energy needed to produce real-time interactive video. It would allow Decart and Etched to make a better version of their current demo, allowing the game to run longer, with fewer hallucinations, and at higher resolution. They say the new chip would also make it possible for more players to use the model at once.

“Custom chips for AI hold the potential to unlock significant performance gains and energy efficiency gains,” says Siddharth Garg, a professor of electrical and computer engineering at NYU Tandon, who is not associated with Etched or Decart.

Etched says that its gains come from designing their cards specifically for AI development. For example, the chip uses a single core, which it says makes it possible to handle complicated mathematical operations with more efficiency. The chip also focuses on inference (where an AI makes predictions) over training (where an AI learns from data).

“We are building something much more specialized than all of the chips out on the market today,” says Robert Wachen, cofounder and COO of Etched. They plan to run projects on the new card next year. Until the chip is deployed or its capabilities are verified, Etched’s claims are yet to be substantiated. And given the extent of AI specialization already in the top GPUs on the market, Garg is “very skeptical about a 10x improvement just from smarter or more specialized design.”

But the two companies have big ambitions. If the efficiency gains are close to what Etched claims, they believe, they will be able to generate real-time virtual doctors or tutors. “All of that is coming down the pipe, and it comes from having a better architecture and better hardware to power it. So that’s what we’re really trying to get people to realize with the proof of concept here,” says Wachen.

For the time being, you can try out the demo of their version of Minecraft here.

Palmer Luckey’s vision for the future of mixed reality

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

War is a catalyst for change, an expert in AI and warfare told me in 2022. At the time, the war in Ukraine had just started, and the military AI business was booming. Two years later, things have only ramped up as geopolitical tensions continue to rise.

Silicon Valley players are poised to benefit. One of them is Palmer Luckey, the founder of the virtual-reality headset company Oculus, which he sold to Facebook for $2 billion. After Luckey’s highly public ousting from Meta, he founded Anduril, which focuses on drones, cruise missiles, and other AI-enhanced technologies for the US Department of Defense. The company is now valued at $14 billion. My colleague James O’Donnell interviewed Luckey about his new pet project: headsets for the military. 

Luckey is increasingly convinced that the military, not consumers, will see the value of mixed-reality hardware first: “You’re going to see an AR headset on every soldier, long before you see it on every civilian,” he says. In the consumer world, any headset company is competing with the ubiquity and ease of the smartphone, but he sees entirely different trade-offs in defense. Read the interview here

The use of AI for military purposes is controversial. Back in 2018, Google pulled out of the Pentagon’s Project Maven, an attempt to build image recognition systems to improve drone strikes, following staff walkouts over the ethics of the technology. (Google has since returned to offering services for the defense sector.) There has been a long-standing campaign to ban autonomous weapons, also known as “killer robots,” which powerful militaries such as the US have refused to agree to.  

But the voices that boom even louder belong to an influential faction in Silicon Valley, such as Google’s former CEO Eric Schmidt, who has called for the military to adopt and invest more in AI to get an edge over adversaries. Militaries all over the world have been very receptive to this message.

That’s good news for the tech sector. Military contracts are long and lucrative, for a start. Most recently, the Pentagon purchased services from Microsoft and OpenAI to do search, natural-language processing, machine learning, and data processing, reports The Intercept. In the interview with James, Palmer Luckey says the military is a perfect testing ground for new technologies. Soldiers do as they are told and aren’t as picky as consumers, he explains. They’re also less price-sensitive: Militaries don’t mind spending a premium to get the latest version of a technology.

But there are serious dangers in adopting powerful technologies prematurely in such high-risk areas. Foundation models pose serious national security and privacy threats by, for example, leaking sensitive information, argue researchers at the AI Now Institute and Meredith Whittaker, president of the communication privacy organization Signal, in a new paper. Whittaker, who was a core organizer of the Project Maven protests, has said that the push to militarize AI is really more about enriching tech companies than improving military operations. 

Despite calls for stricter rules around transparency, we are unlikely to see governments restrict their defense sectors in any meaningful way beyond voluntary ethical commitments. We are in the age of AI experimentation, and militaries are playing with the highest stakes of all. And because of the military’s secretive nature, tech companies can experiment with the technology without the need for transparency or even much accountability. That suits Silicon Valley just fine. 


Now read the rest of The Algorithm

Deeper Learning

How Wayve’s driverless cars will meet one of their biggest challenges yet

The UK driverless-car startup Wayve is headed west. The firm’s cars learned to drive on the streets of London. But Wayve has announced that it will begin testing its tech in and around San Francisco as well. And that brings a new challenge: Its AI will need to switch from driving on the left to driving on the right.

Full speed ahead: As visitors to or from the UK will know, making that switch is harder than it sounds. Your view of the road, how the vehicle turns—it’s all different. The move to the US will be a test of Wayve’s technology, which the company claims is more general-purpose than what many of its rivals are offering. Across the Atlantic, the company will now go head to head with the heavyweights of the growing autonomous-car industry, including Cruise, Waymo, and Tesla. Join Will Douglas Heaven on a ride in one of its cars to find out more

Bits and Bytes

Kids are learning how to make their own little language models
Little Language Models is a new application from two PhD researchers at MIT’s Media Lab that helps children understand how AI models work—by getting to build small-scale versions themselves. (MIT Technology Review

Google DeepMind is making its AI text watermark open source
Google DeepMind has developed a tool for identifying AI-generated text called SynthID, which is part of a larger family of watermarking tools for generative AI outputs. The company is applying the watermark to text generated by its Gemini models and making it available for others to use too. (MIT Technology Review

Anthropic debuts an AI model that can “use” a computer
The tool enables the company’s Claude AI model to interact with computer interfaces and take actions such as moving a cursor, clicking on things, and typing text. It’s a very cumbersome and error-prone version of what some have said AI agents will be able to do one day. (Anthropic

Can an AI chatbot be blamed for a teen’s suicide?
A 14-year-old boy committed suicide, and his mother says it was because he was obsessed with an AI chatbot created by Character.AI. She is suing the company. Chatbots have been touted as cures for loneliness, but critics say they actually worse isolation.  (The New York Times

Google, Microsoft, and Perplexity are promoting scientific racism in search results
The internet’s biggest AI-powered search engines are featuring the widely debunked idea that white people are genetically superior to other races. (Wired