How to run an LLM on your laptop

MIT Technology Review’s How To series helps you get things done. 

Simon Willison has a plan for the end of the world. It’s a USB stick, onto which he has loaded a couple of his favorite open-weight LLMs—models that have been shared publicly by their creators and that can, in principle, be downloaded and run with local hardware. If human civilization should ever collapse, Willison plans to use all the knowledge encoded in their billions of parameters for help. “It’s like having a weird, condensed, faulty version of Wikipedia, so I can help reboot society with the help of my little USB stick,” he says.

But you don’t need to be planning for the end of the world to want to run an LLM on your own device. Willison, who writes a popular blog about local LLMs and software development, has plenty of compatriots: r/LocalLLaMA, a subreddit devoted to running LLMs on your own hardware, has half a million members.

For people who are concerned about privacy, want to break free from the control of the big LLM companies, or just enjoy tinkering, local models offer a compelling alternative to ChatGPT and its web-based peers.

The local LLM world used to have a high barrier to entry: In the early days, it was impossible to run anything useful without investing in pricey GPUs. But researchers have had so much success in shrinking down and speeding up models that anyone with a laptop, or even a smartphone, can now get in on the action. “A couple of years ago, I’d have said personal computers are not powerful enough to run the good models. You need a $50,000 server rack to run them,” Willison says. “And I kept on being proved wrong time and time again.”

Why you might want to download your own LLM

Getting into local models takes a bit more effort than, say, navigating to ChatGPT’s online interface. But the very accessibility of a tool like ChatGPT comes with a cost. “It’s the classic adage: If something’s free, you’re the product,” says Elizabeth Seger, the director of digital policy at Demos, a London-based think tank. 

OpenAI, which offers both paid and free tiers, trains its models on users’ chats by default. It’s not too difficult to opt out of this training, and it also used to be possible to remove your chat data from OpenAI’s systems entirely, until a recent legal decision in the New York Times’ ongoing lawsuit against OpenAI required the company to maintain all user conversations with ChatGPT.

Google, which has access to a wealth of data about its users, also trains its models on both free and paid users’ interactions with Gemini, and the only way to opt out of that training is to set your chat history to delete automatically—which means that you also lose access to your previous conversations. In general, Anthropic does not train its models using user conversations, but it will train on conversations that have been “flagged for Trust & Safety review.” 

Training may present particular privacy risks because of the ways that models internalize, and often recapitulate, their training data. Many people trust LLMs with deeply personal conversations—but if models are trained on that data, those conversations might not be nearly as private as users think, according to some experts.

“Some of your personal stories may be cooked into some of the models, and eventually be spit out in bits and bytes somewhere to other people,” says Giada Pistilli, principal ethicist at the company Hugging Face, which runs a huge library of freely downloadable LLMs and other AI resources.

For Pistilli, opting for local models as opposed to online chatbots has implications beyond privacy. “Technology means power,” she says. “And so who[ever] owns the technology also owns the power.” States, organizations, and even individuals might be motivated to disrupt the concentration of AI power in the hands of just a few companies by running their own local models.

Breaking away from the big AI companies also means having more control over your LLM experience. Online LLMs are constantly shifting under users’ feet: Back in April, ChatGPT suddenly started sucking up to users far more than it had previously, and just last week Grok started calling itself MechaHitler on X.

Providers tweak their models with little warning, and while those tweaks might sometimes improve model performance, they can also cause undesirable behaviors. Local LLMs may have their quirks, but at least they are consistent. The only person who can change your local model is you.

Of course, any model that can fit on a personal computer is going to be less powerful than the premier online offerings from the major AI companies. But there’s a benefit to working with weaker models—they can inoculate you against the more pernicious limitations of their larger peers. Small models may, for example, hallucinate more frequently and more obviously than Claude, GPT, and Gemini, and seeing those hallucinations can help you build up an awareness of how and when the larger models might also lie.

“Running local models is actually a really good exercise for developing that broader intuition for what these things can do,” Willison says.

How to get started

Local LLMs aren’t just for proficient coders. If you’re comfortable using your computer’s command-line interface, which allows you to browse files and run apps using text prompts, Ollama is a great option. Once you’ve installed the software, you can download and run any of the hundreds of models they offer with a single command

If you don’t want to touch anything that even looks like code, you might opt for LM Studio, a user-friendly app that takes a lot of the guesswork out of running local LLMs. You can browse models from Hugging Face from right within the app, which provides plenty of information to help you make the right choice. Some popular and widely used models are tagged as “Staff Picks,” and every model is labeled according to whether it can be run entirely on your machine’s speedy GPU, needs to be shared between your GPU and slower CPU, or is too big to fit onto your device at all. Once you’ve chosen a model, you can download it, load it up, and start interacting with it using the app’s chat interface.

As you experiment with different models, you’ll start to get a feel for what your machine can handle. According to Willison, every billion model parameters require about one GB of RAM to run, and I found that approximation to be accurate: My own 16 GB laptop managed to run Alibaba’s Qwen3 14B as long as I quit almost every other app. If you run into issues with speed or usability, you can always go smaller—I got reasonable responses from Qwen3 8B as well.

And if you go really small, you can even run models on your cell phone. My beat-up iPhone 12 was able to run Meta’s Llama 3.2 1B using an app called LLM Farm. It’s not a particularly good model—it very quickly goes off into bizarre tangents and hallucinates constantly—but trying to coax something so chaotic toward usability can be entertaining. If I’m ever on a plane sans Wi-Fi and desperate for a probably false answer to a trivia question, I now know where to look.

Some of the models that I was able to run on my laptop were effective enough that I can imagine using them in my journalistic work. And while I don’t think I’ll depend on phone-based models for anything anytime soon, I really did enjoy playing around with them. “I think most people probably don’t need to do this, and that’s fine,” Willison says. “But for the people who want to do this, it’s so much fun.”

These four charts show where AI companies could go next in the US

No one knows exactly how AI will transform our communities, workplaces, and society as a whole. Because it’s hard to predict the impact AI will have on jobs, many workers and local governments are left trying to read the tea leaves to understand how to prepare and adapt.

A new interactive report released today by the Brookings Institution attempts to map how embedded AI companies and jobs are in different regions of the United States in order to prescribe policy treatments to those struggling to keep up. 

While the impact of AI on tech hubs like San Francisco and Boston is already being felt, AI proponents believe it will transform work everywhere, and in every industry. The report uses various proxies for what the researchers call “AI readiness” to document how unevenly this supposed transformation is taking place. 

Here are four charts to help understand where that could matter. 

1. AI development is still highly focused in tech hubs.

Brookings divides US cities into five categories based on how ready they are to adopt AI-related industries and job offerings. To do so, it looked at local talent pool development, innovations in local institutions, and adoption potential among local companies. 

The “AI Superstars” above represent, unsurprisingly, parts of the San Francisco Bay Area, such outliers that they are given their own category. The “Star AI Hubs,” on the other hand, include large metropolitan areas known for tech work, including Boston, Seattle, and Miami.

2. Concentration of workers and startups is highly centralized, too.

The data shows that the vast majority of people working with AI and startups focused on AI are clustered in the tech hubs above. The report found that almost two-thirds of workers advertising their AI skills work there, and well over 75% of AI startups were founded there. The so-called “Star AI Hubs,” from the likes of New York City and Seattle down to Columbus, Ohio, and Boulder, Colorado, take up another significant portion of the pie. 

It’s clear that most of the developments in AI are concentrated in certain large cities, and this pattern can end up perpetuating itself. According to the report, though, “AI activity has spread into most regional economies across the country,” highlighting the need for policy that encourages growth through AI without sacrificing other areas of the country.

3. Emerging centers of AI show promise but are lacking in one way or another.

Beyond the big, obvious tech-hub cities, Brookings claims, there are 14 regions that show promise in AI development and worker engagement with AI. Among these are cities surrounding academic institutions like the University of Wisconsin in Madison or Texas A&M University in College Station, and regional cultural centers like Pittsburgh, Detroit, and Nashville. 

However, according to Brookings, these places are lacking in some respect or another that limits their development. Take Columbia, South Carolina, for example. Despite a sizable regional population of about 860,000 people and the University of South Carolina right there, the report says the area has struggled with talent development; relatively few students graduate with science and engineering degrees, and few showcase AI skills in their job profiles. 

On the other hand, the Tampa, Florida, metropolitan area struggles with innovation, owing in large part to lagging productivity of local universities. The majority of the regions Brookings examined struggle with adoption, which in the report is measured largely by company engagement with AI-related tools like enterprise data and cloud services.

4. Emerging centers are generally leaning toward industry or government contracts, not both.

Still, these emerging centers show plenty of promise, and funders are taking note. To measure innovation and adoption of AI, the report tallies federal contracts for AI research and development as well as venture capital funding deals. 

If you examine how these emerging centers are collecting each, it appears that many of them are specializing as centers for federal research, like Huntsville, Alabama, or places for VC firms to scout, like the Sacramento area in California. 

While VC interest can beget VC interest, and likewise for government, this may give some indication of where these places have room to grow. “University presence is a tremendous influence on success here,” says Mark Muro, one of the authors of the report. Fostering the relationship between academia and industry could be key to improving the local AI ecosystem. 

Researchers announce babies born from a trial of three-person IVF

Eight babies have been born in the UK thanks to a technology that uses DNA from three people: the two biological parents plus a third person who supplies healthy mitochondrial DNA. The babies were born to mothers who carry genes for mitochondrial diseases and risked passing on severe disorders. The eight babies are healthy, say the researchers behind the trial.

“Mitochondrial disease can have a devastating impact on families,” Doug Turnbull of Newcastle University, one of the researchers behind the study, said in a statement. “Today’s news offers fresh hope to many more women at risk of passing on this condition, who now have the chance to have children growing up without this terrible disease.”

The study, which makes use of a technology called mitochondrial donation, has been described as a “tour de force” and “a remarkable accomplishment” by others in the field. In the team’s approach, patients’ eggs are fertilized with sperm, and the DNA-containing nuclei of those cells are transferred into donated fertilized eggs that have had their own nuclei removed. The new embryos contain the DNA of the intended parents along with a tiny fraction of mitochondrial DNA from the donor, floating in the embryos’ cytoplasm. 

“The concept of [mitochondrial donation] has attracted much commentary and occasionally concern and anxiety,” Stuart Lavery, a consultant in reproductive medicine at University College Hospitals NHS Foundation Trust, said in a statement. “The Newcastle team have demonstrated that it can be used in a clinically effective and ethically acceptable way to prevent disease and suffering.”

Not everyone sees the trial as a resounding success. While five of the children were born “with no health problems,” one developed a fever and a urinary tract infection, and another had muscle jerks. A third was treated for an abnormal heart rhythm. Three of the babies were born with a low level of the very mitochondrial-DNA mutations the treatment was designed to prevent.

Heidi Mertes, a medical ethicist at Ghent University, says she is “moderately optimistic.” “I’m happy that it worked,” she says. “But at the same time, it’s concerning … it’s a call for caution and treading carefully.”

Pavlo Mazur, a former embryologist who has used a similar approach in the conception of 15 babies in Ukraine, believes that trials like this one should be paused until researchers figure out what’s going on. Others believe that researchers should study the technique in people who don’t have mitochondrial mutations, to lower the risk of passing any disease-causing mutations to children.

Long time coming

The news of the births has been long awaited by researchers in the field. Mitochondrial donation was first made legal in the UK in 2015. Two years later, the Human Fertility and Embryology Authority (HFEA), which regulates fertility treatment and research in the UK, granted a fertility clinic in Newcastle the sole license to perform the procedure. Newcastle Fertility Centre at Life launched a trial of mitochondrial donation in 2017 with the aim of treating 25 women a year.

That was eight years ago. Since then, the Newcastle team have been extremely tight-lipped about the trial. That’s despite the fact that other teams elsewhere have used mitochondrial donation to help people achieve pregnancy. A New York–based doctor used a type of mitochondrial donation to help a Jordanian couple conceive in Mexico in 2016. Mitochondrial donation has also been trialed by teams in Ukraine and Greece.

But as the only trial overseen by the HFEA, the Newcastle team’s study was viewed by many as the most “official.” Researchers have been itching to hear how the work has been going, given the potential implications for researchers elsewhere (mitochondrial donation was officially made legal in Australia in 2022). “I’m very glad to see [the results] come out at last,” says Dagan Wells, a reproductive biologist at the University of Oxford who worked on the Greece trial. “It would have been nice to have some information out along the way.”

At the Newcastle clinic, each patient must receive approval from the HFEA to be eligible for mitochondrial donation. Since the trial launched in 2017, 39 patients have won this approval. Twenty-five of them underwent hormonal stimulation to release multiple eggs that could be frozen in storage.

Nineteen of those women went on to have mitochondrial donation. So far, seven of the women have given birth (one had twins), and an eighth is still pregnant. The oldest baby is two years old. The results were published today in the New England Journal of Medicine.

“As parents, all we ever wanted was to give our child a healthy start in life,” one of the mothers, who is remaining anonymous, said in a statement. “Mitochondrial donation IVF made that possible. After years of uncertainty this treatment gave us hope—and then it gave us our baby … Science gave us a chance.”

When each baby was born, the team collected a blood and urine sample to look at the child’s mitochondrial DNA. They found that the levels of mutated DNA were far lower than they would have expected without mitochondrial donation. Three of the mothers were “homoplasmic”—100% of their mitochondrial DNA carried the mutation. But blood tests showed that in the women’s four babies (including the twins), 5% or less of the mitochondrial DNA had the mutation, suggesting they won’t develop disease.

A mixed result

The researchers see this as a positive result. “Children who would otherwise have inherited very high levels are now inheriting levels that are reduced by 77% to 100%,” coauthor Mary Herbert, a professor of reproductive biology at Newcastle University and Monash University, told me during a press briefing.

But three of the eight babies had health symptoms. At seven months, one was diagnosed with a rare form of epilepsy, which seemed to resolve within the following three months. Another baby developed a urinary tract infection.

A third baby developed “prolonged” jaundice, high levels of fat in the blood, and a disturbed heart rhythm that required treatment. The baby seemed to have recovered by 18 months, and doctors believe that the symptoms were not related to the mitochondrial mutations, but the team members admit that they can’t be sure. Given the small sample size, it’s hard to make comparisons with babies conceived in other ways. 

And they acknowledge that a phenomenon called “reversal” is happening in some of the babies. In theory, the children shouldn’t inherit any “bad” mitochondrial DNA from their mothers. But three of them did. The levels of “bad” mitochondrial DNA in the babies’ blood ranged between 5% and 16%. And they were higher in the babies’ urine—the highest figure being 20%.

The researchers don’t know why this is happening. When an embryologist pulls out the nucleus of a fertilized egg, a bit of mitochondria-containing cytoplasm will inevitably be dragged along with it. But the team didn’t see any link between the amount of carried-over cytoplasm and the level of “bad” mitochondria. “We continue to investigate this issue,” Herbert said.

“As long as they don’t understand what’s happening, I would still be worried,” says Mertes.

Such low levels aren’t likely to cause mitochondrial diseases, according to experts contacted by MIT Technology Review. But some are concerned that the percentage of mutated DNA could be higher in different tissues, such as the brain or muscle, or that the levels might change with age. “You never know which tissues [reversal] will show up in,” says Mazur, who has seen the phenomenon in babies born through mitochondrial donation to parents who didn’t have mitochondrial mutations. “It’s chaotic.”

The Newcastle team says it hasn’t looked at other tissues, because it designed the study to be noninvasive.

There has been at least one case in which similar levels of “bad” mitochondria have caused symptoms, says Joanna Poulton, a mitochondrial geneticist at the University of Oxford. She thinks it’s unlikely that the children in the trial will develop any symptoms but adds that “it’s a bit of a worry.”

The age of reversal

No one knows exactly when this reversal happens. But Wells and his colleagues have some idea. In their study in Greece, they looked at the mitochondrial DNA of embryos and checked them again during pregnancy and after birth. The trial was designed to study the impact of mitochondrial donation for infertility—none of the parents involved had genes for mitochondrial disease.

The team has seen mitochondrial reversal in two of the seven babies born in the trial, says Wells. If you put the two sets of results together, mitochondrial donation “seems to have this possibility of reversal occurring in maybe about a third of children,” he says.

In his study, the reversal seemed to occur early on in the embryos’ development, Wells says. Five-day-old embryos “look perfect,” but mitochondrial mutations start showing up in tests taken at around 15 weeks of pregnancy, he says. After that point, the levels appear to be relatively stable. The Newcastle researchers say they will monitor the children until they are five years old.

People enrolling in future trials might opt for amniocentesis, which involves sampling blood from the fetus’s amniotic sac at around 15 to 18 weeks, suggests Mertes. That test might reveal the likely level of mitochondrial mutations in the resulting child. “Then the parents could decide what to do,” says Mertes. “If you could see there was a 90% mutation load [for a] very serious mitochondrial disease, they would still have an option to cancel the pregnancy,” she says.

Wells thinks the Newcastle team’s results are “generally reassuring.” He doesn’t think the trials should be paused. But he wants people to understand that mitochondrial donation is not without risk. “This can only be viewed as a risk reduction strategy, and not a guarantee of having an unaffected child,” he says.

And, as Mertes points out, there’s another option for women who carry mitochondrial DNA mutations: egg donation. Donor eggs fertilized with a partner’s sperm and transferred to a woman’s uterus won’t have her disease-causing mitochondria. 

That option won’t appeal to people who feel strongly about having a genetic link to their children. But Poulton asks: “If you know whose uterus you came out of, does it matter that the [egg] came from somewhere else?”

AI’s giants want to take over the classroom

School’s out and it’s high summer, but a bunch of teachers are plotting how they’re going to use AI this upcoming school year. God help them. 

On July 8, OpenAI, Microsoft, and Anthropic announced a $23 million partnership with one of the largest teachers’ unions in the United States to bring more AI into K–12 classrooms. Called the National Academy for AI Instruction, the initiative will train teachers at a New York City headquarters on how to use AI both for teaching and for tasks like planning lessons and writing reports, starting this fall

The companies could face an uphill battle. Right now, most of the public perceives AI’s use in the classroom as nothing short of ruinous—a surefire way to dampen critical thinking and hasten the decline of our collective attention span (a viral story from New York magazine, for example, described how easy it now is to coast through college thanks to constant access to ChatGPT). 

Amid that onslaught, AI companies insist that AI promises more individualized learning, faster and more creative lesson planning, and quicker grading. The companies sponsoring this initiative are, of course, not doing it out of the goodness of their hearts.

No—as they hunt for profits, their goal is to make users out of teachers and students. Anthropic is pitching its AI models to universities, and OpenAI offers free courses for teachers. In an initial training session for teachers by the new National Academy for AI Instruction, representatives from Microsoft showed teachers how to use the company’s AI tools for lesson planning and emails, according to the New York Times

It’s early days, but what does the evidence actually say about whether AI is helping or hurting students? There’s at least some data to support the case made by tech companies: A recent survey of 1,500 teens conducted by Harvard’s Graduate School of Education showed that kids are using AI to brainstorm and answer questions they’re afraid to ask in the classroom. Studies examining settings ranging from math classes in Nigeria to colleges physics courses at Harvard have suggested that AI tutors can lead students to become more engaged. 

And yet there’s more to the story. The same Harvard survey revealed that kids are also frequently using AI for cheating and shortcuts. And an oft-cited paper from Microsoft found that relying on AI can reduce critical thinking. Not to mention the fact that “hallucinations” of incorrect information are an inevitable part of how large language models work.

There’s a lack of clear evidence that AI can be a net benefit for students, and it’s hard to trust that the AI companies funding this initiative will give honest advice on when not to use AI in the classroom.

Despite the fanfare around the academy’s launch, and the fact the first teacher training is scheduled to take place in just a few months, OpenAI and Anthropic told me they couldn’t share any specifics. 

It’s not as if teachers themselves aren’t already grappling with how to approach AI. One such teacher, Christopher Harris, who leads a library system covering 22 rural school districts in New York, has created a curriculum aimed at AI literacy. Topics range from privacy when using smart speakers (a lesson for second graders) to misinformation and deepfakes (instruction for high schoolers). I asked him what he’d like to see in the curriculum used by the new National Academy for AI Instruction.

“The real outcome should be teachers that are confident enough in their understanding of how AI works and how it can be used as a tool that they can teach students about the technology as well,” he says. The thing to avoid would be overfocusing on tools and pre-built prompts that teachers are instructed to use without knowing how they work. 

But all this will be for naught without an adjustment to how schools evaluate students in the age of AI, Harris says: “The bigger issue will be shifting the fundamental approaches to how we assign and assess student work in the face of AI cheating.”

The new initiative is led by the American Federation of Teachers, which represents 1.8 million members, as well as the United Federation of Teachers, which represents 200,000 members in New York. If they win over these groups, the tech companies will have significant influence over how millions of teachers learn about AI. But some educators are resisting the use of AI entirely, including several hundred who signed an open letter last week.

Helen Choi is one of them. “I think it is incumbent upon educators to scrutinize the tools that they use in the classroom to look past hype,” says Choi, an associate professor at the University of Southern California, where she teaches writing. “Until we know that something is useful, safe, and ethical, we have a duty to resist mass adoption of tools like large language models that are not designed by educators with education in mind.”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

AI text-to-speech programs could “unlearn” how to imitate certain people

A technique known as “machine unlearning” could teach AI models to forget specific voices—an important step in stopping the rise of audio deepfakes, where someone’s voice is copied to carry out fraud or scams.

Recent advances in artificial intelligence have revolutionized the quality of text-to-speech technology so that people can convincingly re-create a piece of text in any voice, complete with natural speaking patterns and intonations, instead of having to settle for a robotic voice reading it out word by word. “Anyone’s voice can be reproduced or copied with just a few seconds of their voice,” says Jong Hwan Ko, a professor at Sungkyunkwan University in Korea and the coauthor of a new paper that demonstrates one of the first applications of machine unlearning to speech generation.

Copied voices have been used in scams, disinformation, and harassment. Ko, who researches audio processing, and his collaborators wanted to prevent this kind of identity fraud. “People are starting to demand ways to opt out of the unknown generation of their voices without consent,” he says. 

AI companies generally keep a tight grip on their models to discourage misuse. For example, if you ask ChatGPT to give you someone’s phone number or instructions for doing something illegal, it will likely just tell you it cannot help. However, as many examples over time have shown, clever prompt engineering or model fine-tuning can sometimes get these models to say things they otherwise wouldn’t. The unwanted information may still be hiding somewhere inside the model so that it can be accessed with the right techniques. 

At present, companies tend to deal with this issue by applying guardrails; the idea is to check whether the prompts or the AI’s responses contain disallowed material. Machine unlearning instead asks whether an AI can be made to forget a piece of information that the company doesn’t want it to know. The technique takes a leaky model and the specific training data to be redacted and uses them to create a new model—essentially, a version of the original that never learned that piece of data. While machine unlearning has ties to older techniques in AI research, it’s only in the past couple of years that it’s been applied to large language models.

Jinju Kim, a master’s student at Sungkyunkwan University who worked on the paper with Ko and others, sees guardrails as fences around the bad data put in place to keep people away from it. “You can’t get through the fence, but some people will still try to go under the fence or over the fence,” says Kim. But unlearning, she says, attempts to remove the bad data altogether, so there is nothing behind the fence at all. 

The way current text-to-speech systems are designed complicates this a little more, though. These so-called “zero-shot” models use examples of people’s speech to learn to re-create any voice, including those not in the training set—with enough data, it can be a good mimic when supplied with even a small sample of someone’s voice. So “unlearning” means a model not only needs to “forget” voices it was trained on but also has to learn not to mimic specific voices it wasn’t trained on. All the while, it still needs to perform well for other voices. 

To demonstrate how to get those results, Kim taught a recreation of VoiceBox, a speech generation model from Meta, that when it was prompted to produce a text sample in one of the voices to be redacted, it should instead respond with a random voice. To make these voices realistic, the model “teaches” itself using random voices of its own creation. 

According to the team’s results, which are to be presented this week at the International Conference on Machine Learning, prompting the model to imitate a voice it has “unlearned” gives back a result that—according to state-of-the-art tools that measure voice similarity—mimics the forgotten voice more than 75% less effectively than the model did before. In practice, this makes the new voice unmistakably different. But the forgetfulness comes at a cost: The model is about 2.8% worse at mimicking permitted voices. While these percentages are a bit hard to interpret, the demo the researchers released online offers very convincing results, both for how well redacted speakers are forgotten and how well the rest are remembered. A sample from the demo is given below. 

A voice sample of a speaker to be forgotten by the model.
The generated text-to-speech audio from the original model using the above as a prompt.
The generated text-to-speech audio using the same prompt, but now from the model where the speaker was forgotten.

Ko says the unlearning process can take “several days,” depending on how many speakers the researchers want the model to forget. Their method also requires an audio clip about five minutes long for each speaker whose voice is to be forgotten.

In machine unlearning, pieces of data are often replaced with randomness so that they can’t be reverse-engineered back to the original. In this paper, the randomness for the forgotten speakers is very high—a sign, the authors claim, that they are truly forgotten by the model. 

 “I have seen people optimizing for randomness in other contexts,” says Vaidehi Patil, a PhD student at the University of North Carolina at Chapel Hill who researches machine unlearning. “This is one of the first works I’ve seen for speech.” Patil is organizing a machine unlearning workshop affiliated with the conference, and the voice unlearning research will also be presented there. 

She points out that unlearning itself involves inherent trade-offs between efficiency and forgetfulness because the process can take time, and can degrade the usability of the final model. “There’s no free lunch. You have to compromise something,” she says.

Machine unlearning may still be at too early a stage for, say, Meta to introduce Ko and Kim’s methods into VoiceBox. But there is likely to be industry interest. Patil is researching unlearning for Google DeepMind this summer, and while Meta did not respond with a comment, it has hesitated for a long time to release VoiceBox to the wider public because it is so vulnerable to misuse. 

The voice unlearning team seems optimistic that its work could someday get good enough for real-life deployment. “In real applications, we would need faster and more scalable solutions,” says Ko. “We are trying to find those.”

Google’s generative video model Veo 3 has a subtitles problem

As soon as Google launched its latest video-generating AI model at the end of May, creatives rushed to put it through its paces. Released just months after its predecessor, Veo 3 allows users to generate sounds and dialogue for the first time, sparking a flurry of hyperrealistic eight-second clips stitched together into ads, ASMR videos, imagined film trailers, and humorous street interviews. Academy Award–nominated director Darren Aronofsky used the tool to create a short film called Ancestra. During a press briefing, Demis Hassabis, Google DeepMind’s CEO, likened the leap forward to “emerging from the silent era of video generation.” 

But others quickly found that in some ways the tool wasn’t behaving as expected. When it generates clips that include dialogue, Veo 3 often adds nonsensical, garbled subtitles, even when the prompts it’s been given explicitly ask for no captions or subtitles to be added. 

Getting rid of them isn’t straightforward—or cheap. Users have been forced to resort to regenerating clips (which costs them more money), using external subtitle-removing tools, or cropping their videos to get rid of the subtitles altogether.

Josh Woodward, vice president of Google Labs and Gemini, posted on X on June 9 that Google had developed fixes to reduce the gibberish text. But over a month later, users are still logging issues with it in Google Labs’ Discord channel, demonstrating how difficult it can be to correct issues in major AI models.

Like its predecessors, Veo 3 is available to paying members of Google’s subscription tiers, which start at $249.99 a month. To generate an eight-second clip, users enter a text prompt describing the scene they’d like to create into Google’s AI filmmaking tool Flow, Gemini, or other Google platforms. Each Veo 3 generation costs a minimum of 20 AI credits, and the account can be topped up at a cost of $25 per 2,500 credits.

Mona Weiss, an advertising creative director, says that regenerating her scenes in a bid to get rid of the random captions is becoming expensive. “If you’re creating a scene with dialogue, up to 40% of its output has gibberish subtitles that make it unusable,” she says. “You’re burning through money trying to get a scene you like, but then you can’t even use it.”

When Weiss reported the problem to Google Labs through its Discord channel in the hopes of getting a refund for her wasted credits, its team pointed her to the company’s official support team. They offered her a refund for the cost of Veo 3, but not for the credits. Weiss declined, as accepting would have meant losing access to the model altogether. The Google Labs’ Discord support team has been telling users that subtitles can be triggered by speech, saying that they’re aware of the problem and are working to fix it. 

So why does Veo 3 insist on adding these subtitles, and why does it appear to be so difficult to solve the problem? It probably comes down to what the model has been trained on.  

Although Google hasn’t made this information public, that training data is likely to include YouTube videos, clips from vlogs and gaming channels, and TikTok edits, many of which come with subtitles. These embedded subtitles are part of the video frames rather than separate text tracks layered on top, meaning it’s difficult to remove them before they’re used for training, says Shuo Niu, an assistant professor at Clark University in Massachusetts who studies video sharing platforms and AI.

“The text-to-video model is trained using reinforcement learning to produce content that mimics human-created videos, and if such videos include subtitles, the model may ‘learn’ that incorporating subtitles enhances similarity with human-generated content,” he says.

“We’re continuously working to improve video creation, especially with text, speech that sounds natural, and audio that syncs perfectly,” a Google spokesperson says. “We encourage users to try their prompt again if they notice an inconsistency and give us feedback using the thumbs up/down option.”

As for why the model ignores instructions such as “No subtitles,” negative prompts (telling a generative AI model not to do something) are usually less effective than positive ones, says Tuhin Chakrabarty, an assistant professor at Stony Brook University who studies AI systems. 

To fix the problem, Google would have to check every frame of each video Veo 3 has been trained on, and either get rid of or relabel those with captions before retraining the model—an endeavor that would take weeks, he says. 

Katerina Cizek, a documentary maker and artistic director at the MIT Open Documentary Lab, believes the problem exemplifies Google’s willingness to launch products before they’re fully ready. 

“Google needed a win,” she says. “They needed to be the first to pump out a tool that generates lip-synched audio. And so that was more important than fixing their subtitle issue.”  

California is set to become the first US state to manage power outages with AI

California’s statewide power grid operator is poised to become the first in North America to deploy artificial intelligence to manage outages, MIT Technology Review has learned. 

“We wanted to modernize our grid operations. This fits in perfectly with that,” says Gopakumar Gopinathan, a senior advisor on power system technologies at the California Independent System Operator—known as the CAISO and pronounced KAI-so. “AI is already transforming different industries. But we haven’t seen many examples of it being used in our industry.” 

At the DTECH Midwest utility industry summit in Minneapolis on July 15, CAISO is set to announce a deal to run a pilot program using new AI software called Genie, from the energy-services giant OATI. The software uses generative AI to analyze and carry out real-time analyses for grid operators and comes with the potential to autonomously make decisions about key functions on the grid, a switch that might resemble going from uniformed traffic officers to sensor-equipped stoplights. 

But while CAISO may deliver electrons to cutting-edge Silicon Valley companies and laboratories, the actual task of managing the state’s electrical system is surprisingly analog. 

Today, CAISO engineers scan outage reports for keywords about maintenance that’s planned or in the works, read through the notes, and then load each item into the grid software system to run calculations on how a downed line or transformer might affect power supply.

“Even if it takes you less than a minute to scan one on average, when you amplify that over 200 or 300 outages, it adds up,” says Abhimanyu Thakur, OATI’s vice president of platforms, visualization, and analytics. “Then different departments are doing it for their own respective keywords. Now we consolidate all of that into a single dictionary of keywords and AI can do this scan and generate a report proactively.” 

If CAISO finds that Genie produces reliable, more efficient data analyses for managing outages, Gopinathan says, the operator may consider automating more functions on the grid. “After a few rounds of testing, I think we’ll have an idea about what is the right time to call it successful or not,” he says. 

Regardless of the outcome, the experiment marks a significant shift. Most grid operators are using the same systems that utilities have used “for decades,” says Richard Doying, who spent more than 20 years as a top executive at the Midcontinent Independent System Operator, the grid operator for an area encompassing 15 states from the upper Midwest down to Louisiana. 

“These organizations are carved up for people working on very specific, specialized tasks and using their own proprietary tools that they’ve developed over time,” says Doying, now a vice president at the consultancy Grid Strategies. “To the extent that some of these new AI tools are able to draw from data across different areas of an organization and conduct more sophisticated analysis, that’s only helpful for grid operators.”

Last year, a Department of Energy report found that AI had potential to speed up studies on grid capacity and transmission, improve weather forecasting to help predict how much energy wind and solar plants would produce at a given time, and optimize planning for electric-vehicle charging networks. Another report by the energy department’s Loan Programs Office concluded that adding more “advanced” technology such as sensors to various pieces of equipment will generate data that can enable AI to do much more over time. 

In April, the PJM Interconnection—the nation’s largest grid system, spanning 13 states along the densely populated mid-Atlantic and Eastern Seaboard—took a big step toward embracing AI by inking a deal with Google to use its Tapestry software to improve regional planning and speed up grid connections for new power generators. 

ERCOT, the Texas grid system, is considering adopting technology similar to what CAISO is now set to use, according to a source with knowledge of the plans who requested anonymity because they were not authorized to speak publicly. ERCOT did not respond to a request for comment. 

Australia offers an example of what the future may look like. In New South Wales, where grid sensors and smart technology are more widely deployed, AI software rolled out in February is now predicting the production and flow of electricity from rooftop solar units across the state and automatically adjusting how much power from those panels can enter the grid. 

Until now, much of the discussion around AI and energy has focused on the electricity demands of AI data centers (check out MIT Technology Review’s Power Hungry series for more on this).

“We’ve been talking a lot about what the grid can do for AI and not nearly as much about what AI can do for the grid,” says Charles Hua, a coauthor of one of last year’s Energy Department reports who now serves executive director of PowerLines, a nonprofit that advocates for improving the affordability and reliability of US grids. “In general, there’s a huge opportunity for grid operators, regulators, and other stakeholders in the utility regulatory system to use AI effectively and harness it for a more resilient, modernized, and strengthened grid.” 

For now, Gopinathan says, he’s remaining cautiously optimistic. 

“I don’t want to overhype it,” he says. 

Still, he adds, “it’s a first step for bigger automation.”

“Right now, this is more limited to our outage management system. Genie isn’t talking to our other parts yet,” he says. “But I see a world where AI agents are able to do a lot more.”

Cybersecurity’s global alarm system is breaking down

Every day, billions of people trust digital systems to run everything from communication to commerce to critical infrastructure. But the global early warning system that alerts security teams to dangerous software flaws is showing critical gaps in coverage—and most users have no idea their digital lives are likely becoming more vulnerable.

Over the past 18 months, two pillars of global cybersecurity have flirted with apparent collapse. In February 2024, the US-backed National Vulnerability Database (NVD)—relied on globally for its free analysis of security threats—abruptly stopped publishing new entries, citing a cryptic “change in interagency support.” Then, in April of this year, the Common Vulnerabilities and Exposures (CVE) program, the fundamental numbering system for tracking software flaws, seemed at similar risk: A leaked letter warned of an imminent contract expiration.

Cybersecurity practitioners have since flooded Discord channels and LinkedIn feeds with emergency posts and memes of “NVD” and “CVE” engraved on tombstones. Unpatched vulnerabilities are the second most common way cyberattackers break in, and they have led to fatal hospital outages and critical infrastructure failures. In a social media post, Jen Easterly, a US cybersecurity expert, said: “Losing [CVE] would be like tearing out the card catalog from every library at once—leaving defenders to sort through chaos while attackers take full advantage.” If CVEs identify each vulnerability like a book in a card catalogue, NVD entries provide the detailed review with context around severity, scope, and exploitability. 

In the end, the Cybersecurity and Infrastructure Security Agency (CISA) extended funding for CVE another year, attributing the incident to a “contract administration issue.” But the NVD’s story has proved more complicated. Its parent organization, the National Institute of Standards and Technology (NIST), reportedly saw its budget cut roughly 12% in 2024, right around the time that CISA pulled its $3.7 million in annual funding for the NVD. Shortly after, as the backlog grew, CISA launched its own “Vulnrichment” program to help address the analysis gap, while promoting a more distributed approach that allows multiple authorized partners to publish enriched data. 

“CISA continuously assesses how to most effectively allocate limited resources to help organizations reduce the risk of newly disclosed vulnerabilities,” says Sandy Radesky, the agency’s associate director for vulnerability management. Rather than just filling the gap, she emphasizes, Vulnrichment was established to provide unique additional information, like recommended actions for specific stakeholders, and to “reduce dependency of the federal government’s role to be the sole provider of vulnerability enrichment.”

Meanwhile, NIST has scrambled to hire contractors to help clear the backlog. Despite a return to pre-crisis processing levels, a boom in vulnerabilities newly disclosed to the NVD has outpaced these efforts. Currently, over 25,000 vulnerabilities await processing—nearly 10 times the previous high in 2017, according to data from the software company Anchore. Before that, the NVD largely kept pace with CVE publications, maintaining a minimal backlog.

“Things have been disruptive, and we’ve been going through times of change across the board,” Matthew Scholl, then chief of the computer security division in NIST’s Information Technology Laboratory, said at an industry event in April. “Leadership has assured me and everyone that NVD is and will continue to be a mission priority for NIST, both in resourcing and capabilities.” Scholl left NIST in May after 20 years at the agency, and NIST declined to comment on the backlog. 

The situation has now prompted multiple government actions, with the Department of Commerce launching an audit of the NVD in May and House Democrats calling for a broader probe of both programs in June. But the damage to trust is already transforming geopolitics and supply chains as security teams prepare for a new era of cyber risk. “It’s left a bad taste, and people are realizing they can’t rely on this,” says Rose Gupta, who builds and runs enterprise vulnerability management programs. “Even if they get everything together tomorrow with a bigger budget, I don’t know that this won’t happen again. So I have to make sure I have other controls in place.”

As these public resources falter, organizations and governments are confronting a critical weakness in our digital infrastructure: Essential global cybersecurity services depend on a complex web of US agency interests and government funding that can be cut or redirected at any time.

Security haves and have-nots

What began as a trickle of software vulnerabilities in the early Internet era has become an unstoppable avalanche, and the free databases that have tracked them for decades have struggled to keep up. In early July, the CVE database crossed over 300,000 catalogued vulnerabilities. Numbers jump unpredictably each year, sometimes by 10% or much more. Even before its latest crisis, the NVD was notorious for delayed publication of new vulnerability analyses, often trailing private security software and vendor advisories by weeks or months.

Gupta has watched organizations increasingly adopt commercial vulnerability management (VM) software that includes its own threat intelligence services. “We’ve definitely become over-reliant on our VM tools,” she says, describing security teams’ growing dependence on vendors like Qualys, Rapid7, and Tenable to supplement or replace unreliable public databases. These platforms combine their own research with various data sources to create proprietary risk scores that help teams prioritize fixes. But not all organizations can afford to fill the NVD’s gap with premium security tools. “Smaller companies and startups, already at a disadvantage, are going to be more at risk,” she explains. 

Komal Rawat, a security engineer in New Delhi whose mid-stage cloud startup has a limited budget, describes the impact in stark terms: “If NVD goes, there will be a crisis in the market. Other databases are not that popular, and to the extent they are adopted, they are not free. If you don’t have recent data, you’re exposed to attackers who do.”

The growing backlog means new devices could be more likely to have vulnerability blind spots—whether that’s a Ring doorbell at home or an office building’s “smart” access control system. The biggest risk may be “one-off” security flaws that fly under the radar. “There are thousands of vulnerabilities that will not affect the majority of enterprises,” says Gupta. “Those are the ones that we’re not getting analysis on, which would leave us at risk.”

NIST acknowledges it has limited visibility into which organizations are most affected by the backlog. “We don’t track which industries use which products and therefore cannot measure impact to specific industries,” a spokesperson says. Instead, the team prioritizes vulnerabilities on the basis of CISA’s known exploits list and those included in vendor advisories like Microsoft Patch Tuesday.

The biggest vulnerability

Brian Martin has watched this system evolve—and deteriorate—from the inside. A former CVE board member and an original project leader behind the Open Source Vulnerability Database, he has built a combative reputation over the decades as a leading historian and practitioner. Martin says his current project, VulnDB (part of Flashpoint Security), outperforms the official databases he once helped oversee. “Our team processes more vulnerabilities, at a much faster turnaround, and we do it for a fraction of the cost,” he says, referring to the tens of millions in government contracts that support the current system. 

When we spoke in May, Martin said his database contains more than 112,000 vulnerabilities with no CVE identifiers—security flaws that exist in the wild but remain invisible to organizations that rely solely on public channels. “If you gave me the money to triple my team, that non-CVE number would be in the 500,000 range,” he said.

In the US, official vulnerability management duties are split between a web of contractors, agencies, and nonprofit centers like the Mitre Corporation. Critics like Martin say that creates potential for redundancy, confusion, and inefficiency, with layers of middle management and relatively few actual vulnerability experts. Others defend the value of this fragmentation. “These programs build on or complement each other to create a more comprehensive, supportive, and diverse community,” CISA said in a statement. “That increases the resilience and usefulness of the entire ecosystem.”

As American leadership wavers, other nations are stepping up. China now operates multiple vulnerability databases, some surprisingly robust but tainted by the possibility that they are subject to state control. In May, the European Union accelerated the launch of its own database, as well as a decentralized “Global CVE” architecture. Following social media and cloud services, vulnerability intelligence has become another front in the contest for technological independence. 

That leaves security professionals to navigate multiple potentially conflicting sources of data. “It’s going to be a mess, but I would rather have too much information than none at all,” says Gupta, describing how her team monitors multiple databases despite the added complexity. 

Resetting software liability

As defenders adapt to the fragmenting landscape, the tech industry faces another reckoning: Why don’t software vendors carry more responsibility for protecting their customers from security issues? Major vendors routinely disclose—but don’t necessarily patch—thousands of new vulnerabilities each year. A single exposure could crash critical systems or increase the risks of fraud and data misuse. 

For decades, the industry has hidden behind legal shields. “Shrink-wrap licenses” once forced consumers to broadly waive their right to hold software vendors liable for defects. Today’s end-user license agreements (EULAs), often delivered in pop-up browser windows, have evolved into incomprehensibly long documents. Last November, a lab project called “EULAS of Despair” used the length of War and Peace (587,287 words) to measure these sprawling contracts. The worst offender? Twitter, at 15.83 novels’ worth of fine print.

“This is a legal fiction that we’ve created around this whole ecosystem, and it’s just not sustainable,” says Andrea Matwyshyn, a US special advisor and technology law professor at Penn State University, where she directs the Policy Innovation Lab of Tomorrow. “Some people point to the fact that software can contain a mix of products and services, creating more complex facts. But just like in engineering or financial litigation, even the most messy scenarios can be resolved with the assistance of experts.”

This liability shield is finally beginning to crack. In July 2024, a faulty security update in CrowdStrike’s popular endpoint detection software crashed millions of Windows computers worldwide and caused outages at everything from airlines to hospitals to 911 systems. The incident led to billions in estimated damages, and the city of Portland, Oregon, even declared a “state of emergency.” Now, affected companies like Delta Airlines have hired high-priced attorneys to pursue major damages—a signal opening of the floodgates to litigation.

Despite the soaring number of vulnerabilities, many fall into long-established categories, such as SQL injections that interfere with database queries and buffer memory overflows that enable code to be executed remotely. Matwyshyn advocates for a mandatory “software bill of materials,” or S-BOM—an ingredients list that would let organizations understand what components and potential vulnerabilities exist throughout their software supply chains. One recent report found 30% of data breaches stemmed from the vulnerabilities of third-party software vendors or cloud service providers.

She adds: “When you can’t tell the difference between the companies that are cutting corners and a company that has really invested in doing right by their customers, that results in a market where everyone loses.”

CISA leadership shares this sentiment, with a spokesperson emphasizing its “secure-by-design principles,” such as “making essential security features available without additional cost, eliminating classes of vulnerabilities, and building products in a way that reduces the cybersecurity burden on customers.”

Avoiding a digital ‘dark age’

It will likely come as no surprise that practitioners are looking to AI to help fill the gap, while at the same time preparing for a coming swarm of cyberattacks by AI agents. Security researchers have used an OpenAI model to discover new “zero-day” vulnerabilities. And both the NVD and CVE teams are developing “AI-powered tools” to help streamline data collection, identification, and processing. NIST says that “up to 65% of our analysis time has been spent generating CPEs”—product information codes that pinpoint affected software. If AI can solve even part of this tedious process, it could dramatically speed up the analysis pipeline.

But Martin cautions against optimism around AI, noting that the technology remains unproven and often riddled with inaccuracies—which, in security, can be fatal. “Rather than AI or ML [machine learning], there are ways to strategically automate bits of the processing of that vulnerability data while ensuring 99.5% accuracy,” he says. 

AI also fails to address more fundamental challenges in governance. The CVE Foundation, launched in April 2025 by breakaway board members, proposes a globally funded nonprofit model similar to that of the internet’s addressing system, which transitioned from US government control to international governance. Other security leaders are pushing to revitalize open-source alternatives like Google’s OSV Project or the NVD++ (maintained by VulnCheck), which are accessible to the public but currently have limited resources.

As these various reform efforts gain momentum, the world is waking up to the fact that vulnerability intelligence—like disease surveillance or aviation safety—requires sustained cooperation and public investment. Without it, a patchwork of paid databases will be all that remains, threatening to leave all but the richest organizations and nations permanently exposed.

Matthew King is a technology and environmental journalist based in New York. He previously worked for cybersecurity firm Tenable.

The first babies have been born following “simplified” IVF in a mobile lab

This week I’m sending congratulations to two sets of parents in South Africa. Babies Milayah and Rossouw arrived a few weeks ago. All babies are special, but these two set a new precedent. They’re the first to be born following “simplified” IVF performed in a mobile lab.

This new mobile lab is essentially a trailer crammed with everything an embryologist needs to perform IVF on a shoestring. It was designed to deliver reproductive treatments to people who live in rural parts of low-income countries, where IVF can be prohibitively expensive or even nonexistent. And it seems to work!

While IVF is increasingly commonplace in wealthy countries—around 12% of all births in Spain result from such procedures—it remains expensive and isn’t always covered by insurance or national health providers. And it’s even less accessible in low-income countries—especially for people who live in rural areas.

People often assume that countries with high birth rates don’t need access to fertility treatments, says Gerhard Boshoff, an embryologist at the University of Pretoria in South Africa. Sub-Saharan African countries like Niger, Angola, and Benin all have birth rates above 40 per 1,000 people, which is over four times the rates in Italy and Japan, for example.

But that doesn’t mean people in Sub-Saharan Africa don’t need IVF. Globally, around one in six adults experience infertility at some point in their lives, according to the World Health Organization. Research by the organization suggests that infertility rates are similar in high-income and low-income countries. As the WHO’s director general Tedros Adhanom Ghebreyesus puts it: “Infertility does not discriminate.”

For many people in rural areas of low-income countries, IVF clinics simply don’t exist. South Africa is considered a “reproductive hub” of the African continent, but even in that country there are fewer than 30 clinics for a population of over 60 million. A recent study found there were no such clinics in Angola or Malawi.  

Willem Ombelet, a retired gynecologist, first noticed these disparities back in the 1980s, while he was working at an IVF lab in Pretoria. “I witnessed that infertility was [more prevalent] in the black population than the white population—but they couldn’t access IVF because of apartheid,” he says. The experience spurred him to find ways to make IVF accessible for everyone. In the 1990s, he launched The Walking Egg—a science and art project with that goal.

In 2008, Ombelet met Jonathan Van Blerkom, a reproductive biologist and embryologist who had already been experimenting with a simplified version of IVF. Typically, embryos are cultured in an incubator that provides a sterile mix of gases. Van Blerkom’s approach was to preload tubes with the required gases and seal them with a rubber stopper. “We don’t need a fancy lab,” says Ombelet.

a sleeping infant in a hat and fuzzy sweater
Milayah was born on June 18.
COURTESY OF THE WALKING EGG

Eggs and sperm can be injected into the tubes through the stoppers, and the resulting embryos can be grown inside. All you really need is a good microscope and a way to keep the tube warm, says Ombelet. Once the embryos are around five days old, they can be transferred to a person’s uterus or frozen. “The cost is one tenth or one twentieth of a normal lab,” says Ombelet.

Ombelet, Van Blerkom, and their colleagues found that this approach appeared to work as well as regular IVF. The team ran their first pilot trial at a clinic in Belgium in 2012. The first babies conceived with the simplified IVF process were born later that year.

More recently, Boshoff wondered if the team could take the show on the road. Making IVF simpler and cheaper is one thing, but getting it to people who don’t have access to IVF care is another. What if the team could pack the simplified IVF lab into a trailer and drive it around rural South Africa?

“We just needed to figure out how to have everything in a very confined space,” says Boshoff. As part of the Walking Egg project, he and his colleagues found a way to organize the lab equipment and squeeze in air filters. He then designed a “fold-out system” that allowed the team to create a second room when the trailer was parked. This provides some privacy for people who are having embryos transferred, he says.

People who want to use the mobile IVF lab will first have to undergo treatment at a local medical facility, where they will take drugs that stimulate their ovaries to release eggs, and then have those eggs collected. The rest of the process can be done in the mobile lab, says Boshoff, who presented his work at the European Society of Human Reproduction and Embryology’s annual meeting in Paris earlier this month.

The first trial started last year. The team partnered with one of the few existing fertility clinics in rural South Africa, which put them in touch with 10 willing volunteers. Five of the 10 women got pregnant following their simplified IVF in the mobile lab. One miscarried, but four pregnancies continued. On June 18, baby Milayah arrived. Two days later, another mother welcomed baby Rossouw. The other babies could come any day now.

“We’ve proven that a very cheap and easy [IVF] method can be used even in a mobile unit and have comparable results to regular IVF,” says Ombelet, who says his team is planning similar trials in Egypt and Indonesia. “The next step is to roll it out all over the world.”

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

This tool strips away anti-AI protections from digital art

A new technique called LightShed will make it harder for artists to use existing protective tools to stop their work from being ingested for AI training. It’s the next step in a cat-and-mouse game—across technology, law, and culture—that has been going on between artists and AI proponents for years. 

Generative AI models that create images need to be trained on a wide variety of visual material, and data sets that are used for this training allegedly include copyrighted art without permission. This has worried artists, who are concerned that the models will learn their style, mimic their work, and put them out of a job.

These artists got some potential defenses in 2023, when researchers created tools like Glaze and Nightshade to protect artwork by “poisoning” it against AI training (Shawn Shan was even named MIT Technology Review’s Innovator of the Year last year for his work on these). LightShed, however, claims to be able to subvert these tools and others like them, making it easy for the artwork to be used for training once again.

To be clear, the researchers behind LightShed aren’t trying to steal artists’ work. They just don’t want people to get a false sense of security. “You will not be sure if companies have methods to delete these poisons but will never tell you,” says Hanna Foerster, a PhD student at the University of Cambridge and the lead author of a paper on the work. And if they do, it may be too late to fix the problem.

AI models work, in part, by implicitly creating boundaries between what they perceive as different categories of images. Glaze and Nightshade change enough pixels to push a given piece of art over this boundary without affecting the image’s quality, causing the model to see it as something it’s not. These almost imperceptible changes are called perturbations, and they mess up the AI model’s ability to understand the artwork.

Glaze makes models misunderstand style (e.g., interpreting a photorealistic painting as a cartoon). Nightshade instead makes the model see the subject incorrectly (e.g., interpreting a cat in a drawing as a dog). Glaze is used to defend an artist’s individual style, whereas Nightshade is used to attack AI models that crawl the internet for art.

Foerster worked with a team of researchers from the Technical University of Darmstadt and the University of Texas at San Antonio to develop LightShed, which learns how to see where tools like Glaze and Nightshade splash this sort of digital poison onto art so that it can effectively clean it off. The group will present its findings at the Usenix Security Symposium, a leading global cybersecurity conference, in August. 

The researchers trained LightShed by feeding it pieces of art with and without Nightshade, Glaze, and other similar programs applied. Foerster describes the process as teaching LightShed to reconstruct “just the poison on poisoned images.” Identifying a cutoff for how much poison will actually confuse an AI makes it easier to “wash” just the poison off. 

LightShed is incredibly effective at this. While other researchers have found simple ways to subvert poisoning, LightShed appears to be more adaptable. It can even apply what it’s learned from one anti-AI tool—say, Nightshade—to others like Mist or MetaCloak without ever seeing them ahead of time. While it has some trouble performing against small doses of poison, those are less likely to kill the AI models’ abilities to understand the underlying art, making it a win-win for the AI—or a lose-lose for the artists using these tools.

Around 7.5 million people, many of them artists with small and medium-size followings and fewer resources, have downloaded Glaze to protect their art. Those using tools like Glaze see it as an important technical line of defense, especially when the state of regulation around AI training and copyright is still up in the air. The LightShed authors see their work as a warning that tools like Glaze are not permanent solutions. “It might need a few more rounds of trying to come up with better ideas for protection,” says Foerster.

The creators of Glaze and Nightshade seem to agree with that sentiment: The website for Nightshade warned the tool wasn’t future-proof before work on LightShed ever began. And Shan, who led research on both tools, still believes defenses like his have meaning even if there are ways around them. 

“It’s a deterrent,” says Shan—a way to warn AI companies that artists are serious about their concerns. The goal, as he puts it, is to put up as many roadblocks as possible so that AI companies find it easier to just work with artists. He believes that “most artists kind of understand this is a temporary solution,” but that creating those obstacles against the unwanted use of their work is still valuable.

Foerster hopes to use what she learned through LightShed to build new defenses for artists, including clever watermarks that somehow persist with the artwork even after it’s gone through an AI model. While she doesn’t believe this will protect a work against AI forever, she thinks this could help tip the scales back in the artist’s favor once again.