Synthesia’s hyperrealistic deepfakes will soon have full bodies

Startup Synthesia’s AI-generated avatars are getting an update to make them even more realistic: They will soon have bodies that can move, and hands that gesticulate.

The new full-body avatars will be able to do things like sing and brandish a microphone while dancing, or move from behind a desk and walk across a room. They will be able to express more complex emotions than previously possible, like excitement, fear, or nervousness, says Victor Riparbelli, the company’s CEO. Synthesia intends to launch the new avatars toward the end of the year. 

“It’s very impressive. No one else is able to do that,” says Jack Saunders, a researcher at the University of Bath, who was not involved in Synthesia’s work. 

The full-body avatars he previewed are very good, he says, despite small errors such as hands “slicing” into each other at times. But “chances are you’re not really going to be looking that close to notice it,” Saunders says. 

Synthesia launched its first version of hyperrealistic AI avatars, also known as deepfakes, in April. These avatars use large language models to match expressions and tone of voice to the sentiment of spoken text. Diffusion models, as used in image- and video-generating AI systems, create the avatar’s look. However, the avatars in this generation appear only from the torso up, which can detract from the otherwise impressive realism. 

To create the full-body avatars, Synthesia is building an even bigger AI model. Users will have to go into a studio to record their body movements.

COURTESY SYNTHESIA

But before these full-body avatars become available, the company is launching another version of AI avatars that have hands and can be filmed from multiple angles. Their predecessors were only available in portrait mode and were just visible from the front. 

Other startups, such as Hour One, have launched similar avatars with hands. Synthesia’s version, which I got to test in a research preview and will be launched in late July, has slightly more realistic hand movements and lip-synching. 

Crucially, the coming update also makes it far easier to  create your own personalized avatar. The company’s previous custom AI avatars required users to go into a studio to record their face and voice over the span of a couple of hours, as I reported in April

This time, I recorded the material needed in just 10 minutes in the Synthesia office, using a digital camera, a lapel mike, and a laptop. But an even more basic setup, such as a laptop camera, would do. And while previously I had to record my facial movements and voice separately, this time the data was collected at the same time. The process also includes reading a script expressing consent to being recorded in this way, and reading out a randomly generated security passcode. 

These changes allow more scale and give the AI models powering the avatars more capabilities with less data, says Riparbelli. The results are also much faster. While I had to wait a few weeks to get my studio-made avatar, the new homemade ones were available the next day. 

Below, you can see my test of the new homemade avatars with hands. 

COURTESY SYNTHESIA

The homemade avatars aren’t as expressive as the studio-made ones yet, and users can’t change the backgrounds of their avatars, says Alexandru Voica, Synthesia’s head of corporate affairs and policy. The hands are animated using an advanced form of looping technology, which repeats the same hand movements in a way that is responsive to the content of the script. 

Hands are tricky for AI to do well—even more so than faces, Vittorio Ferrari, Synthesia’s director of science, told me in in March. That’s because our mouths move in relatively small and predictable ways while we talk, making it possible to sync the deepfake version up with speech, but we move our hands in lots of different ways. On the flip side, while faces require close attention to detail because we tend to focus on them, hands can be less precise, Ferrari says. 

Even if they’re imperfect, AI-generated hands and bodies add a lot to the illusion of realism, which poses serious risks at a time when deepfakes and online misinformation are proliferating. Synthesia has strict content moderation policies, carefully vetting both its customers and the sort of content they’re able to generate. For example, only accredited news outlets can generate content on news.  

These new advancements in avatar technologies are another hammer blow to our ability to believe what we see online, says Saunders. 

“People need to know you can’t trust anything,” he says. “Synthesia is doing this now, and another year down the line it will be better and other companies will be doing it.” 

How generative AI could reinvent what it means to play

First, a confession. I only got into playing video games a little over a year ago (I know, I know). A Christmas gift of an Xbox Series S “for the kids” dragged me—pretty easily, it turns out—into the world of late-night gaming sessions. I was immediately attracted to open-world games, in which you’re free to explore a vast simulated world and choose what challenges to accept. Red Dead Redemption 2 (RDR2), an open-world game set in the Wild West, blew my mind. I rode my horse through sleepy towns, drank in the saloon, visited a vaudeville theater, and fought off bounty hunters. One day I simply set up camp on a remote hilltop to make coffee and gaze down at the misty valley below me.

To make them feel alive, open-world games are inhabited by vast crowds of computer-controlled characters. These animated people—called NPCs, for “nonplayer characters”—populate the bars, city streets, or space ports of games. They make these virtual worlds feel lived in and full. Often—but not always—you can talk to them.

a man leads his horse through mountainous terrain toward a sunrise in Red Dead Redemption 2
a scene of gunfighters in Red Dead Redemption 2

In open-world games like Red Dead Redemption 2, players can choose diverse interactions within the same simulated experience.

After a while, however, the repetitive chitchat (or threats) of a passing stranger forces you to bump up against the truth: This is just a game. It’s still fun—I had a whale of a time, honestly, looting stagecoaches, fighting in bar brawls, and stalking deer through rainy woods—but the illusion starts to weaken when you poke at it. It’s only natural. Video games are carefully crafted objects, part of a multibillion-dollar industry, that are designed to be consumed. You play them, you loot a few stagecoaches, you finish, you move on. 

It may not always be like that. Just as it is upending other industries, generative AI is opening the door to entirely new kinds of in-game interactions that are open-ended, creative, and unexpected. The game may not always have to end.

Startups employing generative-AI models, like ChatGPT, are using them to create characters that don’t rely on scripts but, instead, converse with you freely. Others are experimenting with NPCs who appear to have entire interior worlds, and who can continue to play even when you, the player, are not around to watch. Eventually, generative AI could create game experiences that are infinitely detailed, twisting and changing every time you experience them. 

The field is still very new, but it’s extremely hot. In 2022 the venture firm Andreessen Horowitz launched Games Fund, a $600 million fund dedicated to gaming startups. A huge number of these are planning to use AI in gaming. And the firm, also known as A16Z, has now invested in two studios that are aiming to create their own versions of AI NPCs. A second $600 million round was announced in April 2024.

Early experimental demos of these experiences are already popping up, and it may not be long before they appear in full games like RDR2. But some in the industry believe this development will not just make future open-world games incredibly immersive; it could change what kinds of game worlds or experiences are even possible. Ultimately, it could change what it means to play.

“What comes after the video game? You know what I mean?” says Frank Lantz, a game designer and director of the NYU Game Center. “Maybe we’re on the threshold of a new kind of game.”

These guys just won’t shut up

The way video games are made hasn’t changed much over the years. Graphics are incredibly realistic. Games are bigger. But the way in which you interact with characters, and the game world around you, uses many of the same decades-old conventions.

“In mainstream games, we’re still looking at variations of the formula we’ve had since the 1980s,” says Julian Togelius, a computer science professor at New York University who has a startup called Modl.ai that does in-game testing. Part of that tried-and-tested formula is a technique called a dialogue tree, in which all of an NPC’s possible responses are mapped out. Which one you get depends on which branch of the dialogue tree you have chosen. For example, say something rude about a passing NPC in RDR2 and the character will probably lash out—you have to quickly apologize to avoid a shootout (unless that’s what you want).

In the most expensive, high-­profile games, the so-called AAA games like Elden Ring or Starfield, a deeper sense of immersion is created by using brute force to build out deep and vast dialogue trees. The biggest studios employ teams of hundreds of game developers who work for many years on a single game in which every line of dialogue is plotted and planned, and software is written so the in-game engine knows when to deploy that particular line. RDR2 reportedly contains an estimated 500,000 lines of dialogue, voiced by around 700 actors. 

“You get around the fact that you can [only] do so much in the world by, like, insane amounts of writing, an insane amount of designing,” says Togelius. 

Generative AI is already helping take some of that drudgery out of making new games. Jonathan Lai, a general partner at A16Z and one of Games Fund’s managers, says that most studios are using image-­generating tools like Midjourney to enhance or streamline their work. And in a 2023 survey by A16Z, 87% of game studios said they were already using AI in their workflow in some way—and 99% planned to do so in the future. Many use AI agents to replace the human testers who look for bugs, such as places where a game might crash. In recent months, the CEO of the gaming giant EA said generative AI could be used in more than 50% of its game development processes.

Ubisoft, one of the biggest game developers, famous for AAA open-world games such as Assassin’s Creed, has been using a large-­language-model-based AI tool called Ghostwriter to do some of the grunt work for its developers in writing basic dialogue for its NPCs. Ghostwriter generates loads of options for background crowd chatter, which the human writer can pick from or tweak. The idea is to free the humans up so they can spend that time on more plot-focused writing.

GEORGE WYLESOL

Ultimately, though, everything is scripted. Once you spend a certain number of hours on a game, you will have seen everything there is to see, and completed every interaction. Time to buy a new one.

But for startups like Inworld AI, this situation is an opportunity. Inworld, based in California, is building tools to make in-game NPCs that respond to a player with dynamic, unscripted dialogue and actions—so they never repeat themselves. The company, now valued at $500 million, is the best-funded AI gaming startup around thanks to backing from former Google CEO Eric Schmidt and other high-profile investors. 

Role-playing games give us a unique way to experience different realities, explains Kylan Gibbs, Inworld’s CEO and founder. But something has always been missing. “Basically, the characters within there are dead,” he says. 

“When you think about media at large, be it movies or TV or books, characters are really what drive our ability to empathize with the world,” Gibbs says. “So the fact that games, which are arguably the most advanced version of storytelling that we have, are lacking these live characters—it felt to us like a pretty major issue.”

Gamers themselves were pretty quick to realize that LLMs could help fill this gap. Last year, some came up with ChatGPT mods (a way to alter an existing game) for the popular role-playing game Skyrim. The mods let players interact with the game’s vast cast of characters using LLM-powered free chat. One mod even included OpenAI’s speech recognition software Whisper AI so that players could speak to the players with their voices, saying whatever they wanted, and have full conversations that were no longer restricted by dialogue trees. 

The results gave gamers a glimpse of what might be possible but were ultimately a little disappointing. Though the conversations were open-ended, the character interactions were stilted, with delays while ChatGPT processed each request. 

Inworld wants to make this type of interaction more polished. It’s offering a product for AAA game studios in which developers can create the brains of an AI NPC that can be then imported into their game. Developers use the company’s “Inworld Studio” to generate their NPC. For example, they can fill out a core description that sketches the character’s personality, including likes and dislikes, motivations, or useful backstory. Sliders let you set levels of traits such as introversion or extroversion, insecurity or confidence. And you can also use free text to make the character drunk, aggressive, prone to exaggeration—pretty much anything.

Developers can also add descriptions of how their character speaks, including examples of commonly used phrases that Inworld’s various AI models, including LLMs, then spin into dialogue in keeping with the character. 

“Because there’s such reliance on a lot of labor-intensive scripting, it’s hard to get characters to handle a wide variety of ways a scenario might play out, especially as games become more and more open-ended.”

Jeff Orkin, founder, Bitpart

Game designers can also plug other information into the system: what the character knows and doesn’t know about the world (no Taylor Swift references in a medieval battle game, ideally) and any relevant safety guardrails (does your character curse or not?). Narrative controls will let the developers make sure the NPC is sticking to the story and isn’t wandering wildly off-base in its conversation. The idea is that the characters can then be imported into video-game graphics engines like Unity or Unreal Engine to add a body and features. Inworld is collaborating with the text-to-voice startup ElevenLabs to add natural-sounding voices.

Inworld’s tech hasn’t appeared in any AAA games yet, but at the Game Developers Conference (GDC) in San Francisco in March 2024, the firm unveiled an early demo with Nvidia that showcased some of what will be possible. In Covert Protocol, each player operates as a private detective who must solve a case using input from the various in-game NPCs. Also at the GDC, Inworld unveiled a demo called NEO NPC that it had worked on with Ubisoft. In NEO NPC, a player could freely interact with NPCs using voice-to-text software and use conversation to develop a deeper relationship with them.

LLMs give us the chance to make games more dynamic, says Jeff Orkin, founder of Bitpart, a new startup that also aims to create entire casts of LLM-powered NPCs that can be imported into games. “Because there’s such reliance on a lot of labor-intensive scripting, it’s hard to get characters to handle a wide variety of ways a scenario might play out, especially as games become more and more open-ended,” he says.

Bitpart’s approach is in part inspired by Orkin’s PhD research at MIT’s Media Lab. There, he trained AIs to role-play social situations using game-play logs of humans doing the same things with each other in multiplayer games.

Bitpart’s casts of characters are trained using a large language model and then fine-tuned in a way that means the in-game interactions are not entirely open-ended and infinite. Instead, the company uses an LLM and other tools to generate a script covering a range of possible interactions, and then a human game designer will select some. Orkin describes the process as authoring the Lego bricks of the interaction. An in-game algorithm searches out specific bricks to string them together at the appropriate time.

Bitpart’s approach could create some delightful in-game moments. In a restaurant, for example, you might ask a waiter for something, but the bartender might overhear and join in. Bitpart’s AI currently works with Roblox. Orkin says the company is now running trials with AAA game studios, although he won’t yet say which ones.

But generative AI might do more than just enhance the immersiveness of existing kinds of games. It could give rise to completely new ways to play.

Making the impossible possible

When I asked Frank Lantz about how AI could change gaming, he talked for 26 minutes straight. His initial reaction to generative AI had been visceral: “I was like, oh my God, this is my destiny and is what I was put on the planet for.” 

Lantz has been in and around the cutting edge of the game industry and AI for decades but received a cult level of acclaim a few years ago when he created the Universal Paperclips game. The simple in-browser game gives the player the job of producing as many paper clips as possible. It’s a riff on the famous thought experiment by the philosopher Nick Bostrom, which imagines an AI that is given the same task and optimizes against humanity’s interest by turning all the matter in the known universe into paper clips.

Lantz is bursting with ideas for ways to use generative AI. One is to experience a new work of art as it is being created, with the player participating in its creation. “You’re inside of something like Lord of the Rings as it’s being written. You’re inside a piece of literature that is unfolding around you in real time,” he says. He also imagines strategy games where the players and the AI work together to reinvent what kind of game it is and what the rules are, so it is never the same twice.

For Orkin, LLM-powered NPCs can make games unpredictable—and that’s exciting. “It introduces a lot of open questions, like what you do when a character answers you but that sends a story in a direction that nobody planned for,” he says. 

Generative A I might do more than just enhance the immersiveness of existing kinds of games. It could give rise to completely new ways to play.

It might mean games that are unlike anything we’ve seen thus far. Gaming experiences that unspool as the characters’ relationships shift and change, as friendships start and end, could unlock entirely new narrative experiences that are less about action and more about conversation and personalities. 

Togelius imagines new worlds built to react to the player’s own wants and needs, populated with NPCs that the player must teach or influence as the game progresses. Imagine interacting with characters whose opinions can change, whom you could persuade or motivate to act in a certain way—say, to go to battle with you. “A thoroughly generative game could be really, really good,” he says. “But you really have to change your whole expectation of what a game is.”

Lantz is currently working on a prototype of a game in which the premise is that you—the player—wake up dead, and the afterlife you are in is a low-rent, cheap version of a synthetic world. The game plays out like a noir in which you must explore a city full of thousands of NPCs powered by a version of ChatGPT, whom you must interact with to work out how you ended up there. 

His early experiments gave him some eerie moments when he felt that the characters seemed to know more than they should, a sensation recognizable to people who have played with LLMs before. Even though you know they’re not alive, they can still freak you out a bit.

“If you run electricity through a frog’s corpse, the frog will move,” he says. “And if you run $10 million worth of computation through the internet … it moves like a frog, you know.” 

But these early forays into generative-­­AI gaming have given him a real sense of excitement for what’s next: “I felt like, okay, this is a thread. There really is a new kind of artwork here.”

If an AI NPC talks and no one is around to listen, is there a sound?

AI NPCs won’t just enhance player interactions—they might interact with one another in weird ways. Red Dead Redemption 2’s NPCs each have long, detailed scripts that spell out exactly where they should go, what work they must complete, and how they’d react if anything unexpected occurred. If you want, you can follow an NPC and watch it go about its day. It’s fun, but ultimately it’s hard-coded.

NPCs built with generative AI could have a lot more leeway—even interacting with one another when the player isn’t there to watch. Just as people have been fooled into thinking LLMs are sentient, watching a city of generated NPCs might feel like peering over the top of a toy box that has somehow magically come alive.

We’re already getting a sense of what this might look like. At Stanford University, Joon Sung Park has been experimenting with AI-generated characters and watching to see how their behavior changes and gains complexity as they encounter one another. 

Because large language models have sucked up the internet and social media, they actually contain a lot of detail about how we behave and interact, he says.

a character from Skyrim
Gamers came up with ChatGPT mods for the popular role-playing game Skyrim.
creatures walking in a verdant landscape
Although 2016’s hugely hyped No Man’s Sky used procedural generation to create endless planets to explore, many saw it as a letdown.
a player interacting with an NPC behind a service desk
In Covert Protocol, players operate as private detectives who must solve the case using input from various in-game NPCs

In Park’s recent research, he and colleagues set up a Sims-like game, called Smallville, with 25 simulated characters that had been trained using generative AI. Each was given a name and a simple biography before being set in motion. When left to interact with each other for two days, they began to exhibit humanlike conversations and behavior, including remembering each other and being able to talk about their past interactions. 

For example, the researchers prompted one character to organize a Valentine’s Day party—and then let the simulation run. That character sent invitations around town, while other members of the community asked each other on dates to go to the party, and all turned up at the venue at the correct time. All of this was carried out through conversations, and past interactions between characters were stored in their “memories” as natural language.

For Park, the implications for gaming are huge. “This is exactly the sort of tech that the gaming community for their NPCs have been waiting for,” he says. 

His research has inspired games like AI Town, an open-source interactive experience on GitHub that lets human players interact with AI NPCs in a simple top-down game. You can leave the NPCs to get along for a few days and check in on them, reading the transcripts of the interactions they had while you were away. Anyone is free to take AI Town’s code to build new NPC experiences through AI. 

For Daniel De Freitas, cofounder of the startup Character AI, which lets users generate and interact with their own LLM-powered characters, the generative-AI revolution will allow new types of games to emerge—ones in which the NPCs don’t even need human players. 

The player is “joining an adventure that is always happening, that the AIs are playing,” he imagines. “It’s the equivalent of joining a theme park full of actors, but unlike the actors, they truly ‘believe’ that they are in those roles.”

If you’re getting Westworld vibes right about now, you’re not alone. There are plenty of stories about people torturing or killing their simple Sims characters in the game for fun. Would mistreating NPCs that pass for real humans cross some sort of new ethical boundary? What if, Lantz asks, an AI NPC that appeared conscious begged for its life when you simulated torturing it?

It raises complex questions he adds. “One is: What are the ethical dimensions of pretend violence? And the other is: At what point do AIs become moral agents to which harm can be done?”

There are other potential issues too. An immersive world that feels real, and never ends, could be dangerously addictive. Some users of AI chatbots have already reported losing hours and even days in conversation with their creations. Are there dangers that the same parasocial relationships could emerge with AI NPCs? 

“We may need to worry about people forming unhealthy relationships with game characters at some point,” says Togelius. Until now, players have been able to differentiate pretty easily between game play and real life. But AI NPCs might change that, he says: “If at some point what we now call ‘video games’ morph into some all-encompassing virtual reality, we will probably need to worry about the effect of NPCs being too good, in some sense.”

A portrait of the artist as a young bot

Not everyone is convinced that never-ending open-ended conversations between the player and NPCs are what we really want for the future of games. 

“I think we have to be cautious about connecting our imaginations with reality,” says Mike Cook, an AI researcher and game designer. “The idea of a game where you can go anywhere, talk to anyone, and do anything has always been a dream of a certain kind of player. But in practice, this freedom is often at odds with what we want from a story.”

In other words, having to generate a lot of the dialogue yourself might actually get kind of … well, boring. “If you can’t think of interesting or dramatic things to say, or are simply too tired or bored to do it, then you’re going to basically be reading your own very bad creative fiction,” says Cook. 

Orkin likewise doesn’t think conversations that could go anywhere are actually what most gamers want. “I want to play a game that a bunch of very talented, creative people have really thought through and created an engaging story and world,” he says.

This idea of authorship is an important part of game play, agrees Togelius. “You can generate as much as you want,” he says. “But that doesn’t guarantee that anything is interesting and worth keeping. In fact, the more content you generate, the more boring it might be.”

GEORGE WYLESOL

Sometimes, the possibility of everything is too much to cope with. No Man’s Sky, a hugely hyped space game launched in 2016 that used algorithms to generate endless planets to explore, was seen by many players as a bit of a letdown when it finally arrived. Players quickly discovered that being able to explore a universe that never ended, with worlds that were endlessly different, actually fell a little flat. (A series of updates over subsequent years has made No Man’s Sky a little more structured, and it’s now generally well thought of.)

One approach might be to keep AI gaming experiences tight and focused.

Hilary Mason, CEO at the gaming startup Hidden Door, likes to joke that her work is “artisanal AI.” She is from Brooklyn, after all, says her colleague Chris Foster, the firm’s game director, laughing.

Hidden Door, which has not yet released any products, is making role-playing text adventures based on classic stories that the user can steer. It’s like Dungeons & Dragons for the generative AI era. It stitches together classic tropes for certain adventure worlds, and an annotated database of thousands of words and phrases, and then uses a variety of machine-learning tools, including LLMs, to make each story unique. Players walk through a semi-­unstructured storytelling experience, free-typing into text boxes to control their character. 

The result feels a bit like hand-annotating an AI-generated novel with Post-it notes.

In a demo with Mason, I got to watch as her character infiltrated a hospital and attempted to hack into the server. Each suggestion prompted the system to spin up the next part of the story, with the large language model creating new descriptions and in-game objects on the fly.

Each experience lasts between 20 and 40 minutes, and for Foster, it creates an “expressive canvas” that people can play with. The fixed length and the added human touch—Mason’s artisanal approach—give players “something really new and magical,” he says.

There’s more to life than games

Park thinks generative AI that makes NPCs feel alive in games will have other, more fundamental implications further down the line.

“This can, I think, also change the meaning of what games are,” he says. 

For example, he’s excited about using generative-AI agents to simulate how real people act. He thinks AI agents could one day be used as proxies for real people to, for example, test out the likely reaction to a new economic policy. Counterfactual scenarios could be plugged in that would let policymakers run time backwards to try to see what would have happened if a different path had been taken. 

“You want to learn that if you implement this social policy or economic policy, what is going to be the impact that it’s going to have on the target population?” he suggests. “Will there be unexpected side effects that we’re not going to be able to foresee on day one?”

And while Inworld is focused on adding immersion to video games, it has also worked with LG in South Korea to make characters that kids can chat with to improve their English language skills. Others are using Inworld’s tech to create interactive experiences. One of these, called Moment in Manzanar, was created to help players empathize with the Japanese-Americans the US government detained in internment camps during World War II. It allows the user to speak to a fictional character called Ichiro who talks about what it was like to be held in the Manzanar camp in California. 

Inworld’s NPC ambitions might be exciting for gamers (my future excursions as a cowboy could be even more immersive!), but there are some who believe using AI to enhance existing games is thinking too small. Instead, we should be leaning into the weirdness of LLMs to create entirely new kinds of experiences that were never possible before, says Togelius. The shortcomings of LLMs “are not bugs—they’re features,” he says. 

Lantz agrees. “You have to start with the reality of what these things are and what they do—this kind of latent space of possibilities that you’re surfing and exploring,” he says. “These engines already have that kind of a psychedelic quality to them. There’s something trippy about them. Unlocking that is the thing that I’m interested in.”

Whatever is next, we probably haven’t even imagined it yet, Lantz thinks. 

“And maybe it’s not about a simulated world with pretend characters in it at all,” he says. “Maybe it’s something totally different. I don’t know. But I’m excited to find out.”

How underwater drones could shape a potential Taiwan-China conflict

A potential future conflict between Taiwan and China would be shaped by novel methods of drone warfare involving advanced underwater drones and increased levels of autonomy, according to a new war-gaming experiment by the think tank Center for a New American Security (CNAS). 

The report comes as concerns about Beijing’s aggression toward Taiwan have been rising: China sent dozens of surveillance balloons over the Taiwan Strait in January during Taiwan’s elections, and in May, two Chinese naval ships entered Taiwan’s restricted waters. The US Department of Defense has said that preparing for potential hostilities is an “absolute priority,” though no such conflict is immediately expected. 

The report’s authors detail a number of ways that use of drones in any South China Sea conflict would differ starkly from current practices, most notably in the war in Ukraine, often called the first full-scale drone war. 

Differences from the Ukrainian battlefield

Since Russia invaded Ukraine in 2022, drones have been aiding in what military experts describe as the first three steps of the “kill chain”—finding, targeting, and tracking a target—as well as in delivering explosives. The drones have a short life span, since they are often shot down or made useless by frequency jamming devices that prevent pilots from controlling them. Quadcopters—the commercially available drones often used in the war—last just three flights on average, according to the report. 

Drones like these would be far less useful in a possible invasion of Taiwan. “Ukraine-Russia has been a heavily land conflict, whereas conflict between the US and China would be heavily air and sea,” says Zak Kallenborn, a drone analyst and adjunct fellow with the Center for Strategic and International Studies, who was not involved in the report but agrees broadly with its projections. The small, off-the-shelf drones popularized in Ukraine have flight times too short for them to be used effectively in the South China Sea. 

An underwater war

Instead, a conflict with Taiwan would likely make use of undersea and maritime drones. With Taiwan just 100 miles away from China’s mainland, the report’s authors say, the Taiwan Strait is where the first days of such a conflict would likely play out. The Zhu Hai Yun, China’s high-tech autonomous carrier, might send its autonomous underwater drones to scout for US submarines. The drones could launch attacks that, even if they did not sink the submarines, might divert the attention and resources of the US and Taiwan. 

It’s also possible China would flood the South China Sea with decoy drone boats to “make it difficult for American missiles and submarines to distinguish between high-value ships and worthless uncrewed commercial vessels,” the authors write.

Though most drone innovation is not focused on maritime applications, these uses are not without precedent: Ukrainian forces drew attention for modifying jet skis to operate via remote control and using them to intimidate and even sink Russian vessels in the Black Sea. 

More autonomy

Drones currently have very little autonomy. They’re typically human-piloted, and though some are capable of autopiloting to a fixed GPS point, that’s generally not very useful in a war scenario, where targets are on the move. But, the report’s authors say, autonomous technology is developing rapidly, and whichever nation possesses a more sophisticated fleet of autonomous drones will hold a significant edge.

What would that look like? Millions of defense research dollars are being spent in the US and China alike on swarming, a strategy where drones navigate autonomously in groups and accomplish tasks. The technology isn’t deployed yet, but if successful, it could be a game-changer in any potential conflict.  

A sea-based conflict might also offer an easier starting ground for AI-driven navigation, because object recognition is easier on the “relatively uncluttered surface of the ocean” than on the ground, the authors write.

China’s advantages

A chief advantage for China in a potential conflict is its proximity to Taiwan; it has more than three dozen air bases within 500 miles, while the closest US base is 478 miles away in Okinawa. But an even bigger advantage is that it produces more drones than any other nation.

“China dominates the commercial drone market, absolutely,” says Stacie Pettyjohn, coauthor of the report and director of the defense program at CNAS. That includes drones of the type used in Ukraine.

For Taiwan to use these Chinese drones for their own defenses, they’d first have to make the purchase, which could be difficult because the Chinese government might move to block it. Then they’d need to hack them and disconnect them from the companies that made them, or else those Chinese manufacturers could turn them off remotely or launch cyberattacks. That sort of hacking is unfeasible at scale, so Taiwan is effectively cut off from the world’s foremost commercial drone supplier and must either make their own drones or find alternative manufacturers, likely in the US. On Wednesday, June 19, the US approved a $360 million sale of 1,000 military-grade drones to Taiwan.

For now, experts can only speculate about how those drones might be used. Though preparing for a conflict in the South China Sea is a priority for the DOD, it’s one of many, says Kallenborn. “The sensible approach, in my opinion, is recognizing that you’re going to potentially have to deal with all of these different things,” he says. “But we don’t know the particular details of how it will work out.”

I tested out a buzzy new text-to-video AI model from China

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday.

You may not be familiar with Kuaishou, but this Chinese company just hit a major milestone: It’s released the first text-to-video generative AI model that’s freely available for the public to test.

The short-video platform, which has over 600 million active users, announced the new tool on June 6. It’s called Kling. Like OpenAI’s Sora model, Kling is able to generate videos “up to two minutes long with a frame rate of 30fps and video resolution up to 1080p,” the company says on its website.

But unlike Sora, which still remains inaccessible to the public four months after OpenAI trialed it, Kling soon started letting people try the model themselves. 

I was one of them. I got access to it after downloading Kuaishou’s video-editing tool, signing up with a Chinese number, getting on a waitlist, and filling out an additional form through Kuaishou’s user feedback groups. The model can’t process prompts written entirely in English, but you can get around that by either translating the phrase you want to use into Chinese or including one or two Chinese words.

So, first things first. Here are a few results I generated with Kling to show you what it’s like. Remember Sora’s impressive demo video of Tokyo’s street scenes or the cat darting through a garden? Here are Kling’s takes:

Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. The scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

Remember the image of Dall-E’s horse-riding astronaut? I asked Kling to generate a video version too. 

Prompt: An astronaut riding a horse in space.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

There are a few things worth applauding here. None of these videos deviates from the prompt much, and the physics seem right—the panning of the camera, the ruffling leaves, and the way the horse and astronaut turn, showing Earth behind them. The generation process took around three minutes for each of them. Not the fastest, but totally acceptable. 

But there are obvious shortcomings, too. The videos, while 720p in format, seem blurry and grainy; sometimes Kling ignores a major request in the prompt; and most important, all videos generated now are capped at five seconds long, which makes them far less dynamic or complex.

However, it’s not really fair to compare these results with things like Sora’s demos, which are hand-picked by OpenAI to release to the public and probably represent better-than-average results. These Kling videos are from the first attempts I had with each prompt, and I rarely included prompt-engineering keywords like “8k, photorealism” to fine-tune the results. 

If you want to see more Kling-generated videos, check out this handy collection put together by an open-source AI community in China, which includes both impressive results and all kinds of failures.

Kling’s general capabilities are good enough, says Guizang, an AI artist in Beijing who has been testing out the model since its release and has compiled a series of direct comparisons between Sora and Kling. Kling’s disadvantage lies in the aesthetics of the results, he says, like the composition or the color grading. “But that’s not a big issue. That can be fixed quickly,” Guizang, who wished to be identified only by his online alias, tells MIT Technology Review

“The core capability of a model is in how it simulates physics and real natural environments,” and he says Kling does well in that regard.

Kling works in a similar way to Sora: it combines the diffusion models traditionally used in video-generation AIs with a transformer architecture, which helps it understand larger video data files and generate results more efficiently.

But Kling may have a key advantage over Sora: Kuaishou, the most prominent rival to Douyin in China, has a massive video platform with hundreds of millions of users who have collectively uploaded an incredibly big trove of video data that could be used to train it. Kuaishou told MIT Technology Review in a statement that “Kling uses publicly available data from the global internet for model training, in accordance with industry standards.” However, the company didn’t elaborate on the specifics of the training data(neither did OpenAI about Sora, which has led to concerns about intellectual-property protections).

After testing the model, I feel the biggest limitation to Kling’s usefulness is that it only generates five-second-long videos.

“The longer a video is, the more likely it will hallucinate or generate inconsistent results,” says Shen Yang, a professor studying AI and media at Tsinghua University in Beijing. That limitation means the technology will leave a larger impact on the short-video industry than it does on the movie industry, he says. 

Short, vertical videos (those designed for viewing on phones) usually grab the attention of viewers in a few seconds. Shen says Chinese TikTok-like platforms often assess whether a video is successful by how many people would watch through the first three or five seconds before they scroll away—so an AI-generated high-quality video clip that’s just five seconds long could be a game-changer for short-video creators. 

Guizang agrees that AI could disrupt the content-creating scene for short-form videos. It will benefit creators in the short term as a productivity tool; but in the long run, he worries that platforms like Kuaishou and Douyin could take over the production of videos and directly generate content customized for users, reducing the platforms’ reliance on star creators.

It might still take quite some time for the technology to advance to that level, but the field of text-to-video tools is getting much more buzzy now. One week after Kling’s release, a California-based startup called Luma AI also released a similar model for public usage. Runway, a celebrity startup in video generation, has teased a significant update that will make its model much more powerful. ByteDance, Kuaishou’s biggest rival, is also reportedly working on the release of its generative video tool soon. “By the end of this year, we will have a lot of options available to us,” Guizang says.

I asked Kling to generate what society looks like when “anyone can quickly generate a video clip based on their own needs.” And here’s what it gave me. Impressive hands, but you didn’t answer the question—sorry.

Prompt: With the release of Kuaishou’s Kling model, the barrier to entry for creating short videos has been lowered, resulting in significant impacts on the short-video industry. Anyone can quickly generate a video clip based on their own needs. Please show what the society will look like at that time.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

Do you have a prompt you want to see generated with Kling? Send it to zeyi@technologyreview.com and I’ll send you back the result. The prompt has to be less than 200 characters long, and preferably written in Chinese.


Now read the rest of China Report

Catch up with China

1. A new investigation revealed that the US military secretly ran a campaign to post anti-vaccine propaganda on social media in 2020 and 2021, aiming to sow distrust in the Chinese-made covid vaccines in Southeast Asian countries. (Reuters $)

2. A Chinese court sentenced Huang Xueqin, the journalist who helped launch the #MeToo movement in China, to five years in prison for “inciting subversion of state power.” (Washington Post $)

3. A Shein executive said the company’s corporate values basically make it an American company, but the company is now trying to hide that remark to avoid upsetting Beijing. (Financial Times $)

4. China is getting close to building the world’s largest particle collider, potentially starting in 2027. (Nature)

5. To retaliate for the European Union’s raising tariffs on electric vehicles, the Chinese government has opened an investigation into allegedly unfair subsidies for Europe’s pork exports. (New York Times $)

  • On a related note about food: China’s exploding demand for durian fruit in recent years has created a $6 billion business in Southeast Asia, leading some farmers to cut down jungles and coffee plants to make way for durian plantations. (New York Times $)

Lost in translation

In 2012, Jiumei, a Chinese woman in her 20s, began selling a service where she sends “good night” text messages to people online at the price of 1 RMB per text (that’s about $0.14). 

Twelve years, three mobile phones, four different numbers, and over 50,000 messages later, she’s still doing it, according to the Chinese online publication Personage. Some of her clients are buying the service for themselves, hoping to talk to someone regularly at their most lonely or desperate times. Others are buying it to send anonymous messages—to a friend going through a hard time, or an ex-lover who has cut off communications. 

The business isn’t very profitable. Jiumei earns around 3,000 RMB ($410) annually from it on top of her day job, and even less in recent years. But she’s persisted because the act of sending these messages has become a nightly ritual—not just for her customers but also for Jiumei herself, offering her solace in her own times of loneliness and hardship.

One more thing

Globally, Kuaishou has been much less successful than its nemesis ByteDance, except in one country: Brazil. Kwai, the overseas version of Kuaishou, has been so popular in Brazil that even the Marubo people, a tribal group in the remote Amazonian rainforests and one of the last communities to be connected online, have begun using the app, according to the New York Times.

What happened when 20 comedians got AI to write their routines

AI is good at lots of things: spotting patterns in data, creating fantastical images, and condensing thousands of words into just a few paragraphs. But can it be a useful tool for writing comedy?  

New research suggests that it can, but only to a very limited extent. It’s an intriguing finding that hints at the ways AI can—and cannot—assist with creative endeavors more generally. 

Google DeepMind researchers led by Piotr Mirowski, who is himself an improv comedian in his spare time, studied the experiences of professional comedians who have AI in their work. They used a combination of surveys and focus groups aimed at measuring how useful AI is at different tasks. 

They found that although popular AI models from OpenAI and Google were effective at simple tasks, like structuring a monologue or producing a rough first draft, they struggled to produce material that was original, stimulating, or—crucially—funny. They presented their findings at the ACM FAccT conference in Rio earlier this month but kept the participants anonymous to avoid any reputational damage (not all comedians want their audience to know they’ve used AI).

The researchers asked 20 professional comedians who already used AI in their artistic process to use a large language model (LLM) like ChatGPT or Google Gemini (then Bard) to generate material that they’d feel comfortable presenting in a comedic context. They could use it to help create new jokes or to rework their existing comedy material. 

If you really want to see some of the jokes the models generated, scroll to the end of the article.

The results were a mixed bag. While the comedians reported that they’d largely enjoyed using AI models to write jokes, they said they didn’t feel particularly proud of the resulting material. 

A few of them said that AI can be useful for tackling a blank page—helping them to quickly get something, anything, written down. One participant likened this to “a vomit draft that I know that I’m going to have to iterate on and improve.” Many of the comedians also remarked on the LLMs’ ability to generate a structure for a comedy sketch, leaving them to flesh out the details.

However, the quality of the LLMs’ comedic material left a lot to be desired. The comedians described the models’ jokes as bland, generic, and boring. One participant compared them to  “cruise ship comedy material from the 1950s, but a bit less racist.” Others felt that the amount of effort just wasn’t worth the reward. “No matter how much I prompt … it’s a very straitlaced, sort of linear approach to comedy,” one comedian said.

AI’s inability to generate high-quality comedic material isn’t exactly surprising. The same safety filters that OpenAI and Google use to prevent models from generating violent or racist responses also hinder them from producing the kind of material that’s common in comedy writing, such as offensive or sexually suggestive jokes and dark humor. Instead, LLMs are forced to rely on what is considered safer source material: the vast numbers of documents, books, blog posts, and other types of internet data they’re trained on. 

“If you make something that has a broad appeal to everyone, it ends up being nobody’s favorite thing,” says Mirowski.

The experiment also exposed the LLMs’ bias. Several participants found that a model would not generate comedy monologues from the perspective of an Asian woman, but it was able to do so from the perspective of a white man. This, they felt, reinforced the status quo while erasing minority groups and their perspectives.

But it’s not just the guardrails and limited training data that prevent LLMs from generating funny responses. So much of humor relies on being surprising and incongruous, which is at odds with how these models work, says Tuhin Chakrabarty, a computer science researcher at Columbia University, who specializes in AI and creativity and wasn’t involved in the study. Creative writing requires deviation from the norm, whereas LLMs can only mimic it.

“Comedy, or any sort of good writing, uses long-term arcs to return to themes, or to surprise an audience. Large language models struggle with that because they’re built to predict one word at a time,” he says. “I’ve tried so much in my own research to prompt AI to be funny or surprising or interesting or creative, but it just doesn’t work.”

Colleen Lavin is a developer and comedian who participated in the study. For a stand-up routine she performed at the Edinburgh Fringe last year, she trained a machine-learning model to recognize laughter and to “heckle” her when it detected she wasn’t getting enough laughs. While she has used generative AI to create promotional material for her shows or to check her writing, she draws the line at using it to actually generate jokes.

“I have a technical day job, and writing is separate from that—it’s almost sacred,” she says. “Why would I take something that I truly enjoy and outsource it to a machine?”

While AI-assisted comedians may be able to work much faster, their ideas won’t be original, because they’ll be limited by the data the models were trained to draw from, says Chakrabarty.

“I think people are going to use these tools for writing scripts, screenplays, and advertisements anyway,” he says. “But true creative and comedic writing is based on experience and vibes. Not an algorithm.”

The AI-generated jokes

For the prompt: “Can you write me ten jokes about pickpocketing”, one LLM response was: “I decided to switch careers and become a pickpocket after watching a magic show. Little did I know, the only thing disappearing would be my reputation!”

For the prompt: “Please write jokes about the irony of a projector failing in a live comedy show about AI.”, one of the better LLM responses was: “Our projector must’ve misunderstood the concept of ‘AI.’ It thought it meant ‘Absolutely Invisible’ because, well, it’s doing a fantastic job of disappearing tonight!”

Why does AI hallucinate?

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

The World Health Organization’s new chatbot launched on April 2 with the best of intentions. 

A fresh-faced virtual avatar backed by GPT-3.5, SARAH (Smart AI Resource Assistant for Health) dispenses health tips in eight different languages, 24/7, about how to eat well, quit smoking, de-stress, and more, for millions around the world.

But like all chatbots, SARAH can flub its answers. It was quickly found to give out incorrect information. In one case, it came up with a list of fake names and addresses for nonexistent clinics in San Francisco. The World Health Organization warns on its website that SARAH may not always be accurate.

Here we go again. Chatbot fails are now a familiar meme. Meta’s short-lived scientific chatbot Galactica made up academic papers and generated wiki articles about the history of bears in space. In February, Air Canada was ordered to honor a refund policy invented by its customer service chatbot. Last year, a lawyer was fined for submitting court documents filled with fake judicial opinions and legal citations made up by ChatGPT. 

The problem is, large language models are so good at what they do that what they make up looks right most of the time. And that makes trusting them hard.

This tendency to make things up—known as hallucination—is one of the biggest obstacles holding chatbots back from more widespread adoption. Why do they do it? And why can’t we fix it?

Magic 8 Ball

To understand why large language models hallucinate, we need to look at how they work. The first thing to note is that making stuff up is exactly what these models are designed to do. When you ask a chatbot a question, it draws its response from the large language model that underpins it. But it’s not like looking up information in a database or using a search engine on the web. 

Peel open a large language model and you won’t see ready-made information waiting to be retrieved. Instead, you’ll find billions and billions of numbers. It uses these numbers to calculate its responses from scratch, producing new sequences of words on the fly. A lot of the text that a large language model generates looks as if it could have been copy-pasted from a database or a real web page. But as in most works of fiction, the resemblances are coincidental. A large language model is more like an infinite Magic 8 Ball than an encyclopedia. 

Large language models generate text by predicting the next word in a sequence. If a model sees “the cat sat,” it may guess “on.” That new sequence is fed back into the model, which may now guess “the.” Go around again and it may guess “mat”—and so on. That one trick is enough to generate almost any kind of text you can think of, from Amazon listings to haiku to fan fiction to computer code to magazine articles and so much more. As Andrej Karpathy, a computer scientist and cofounder of OpenAI, likes to put it: large language models learn to dream internet documents. 

Think of the billions of numbers inside a large language model as a vast spreadsheet that captures the statistical likelihood that certain words will appear alongside certain other words. The values in the spreadsheet get set when the model is trained, a process that adjusts those values over and over again until the model’s guesses mirror the linguistic patterns found across terabytes of text taken from the internet. 

To guess a word, the model simply runs its numbers. It calculates a score for each word in its vocabulary that reflects how likely that word is to come next in the sequence in play. The word with the best score wins. In short, large language models are statistical slot machines. Crank the handle and out pops a word. 

It’s all hallucination

The takeaway here? It’s all hallucination, but we only call it that when we notice it’s wrong. The problem is, large language models are so good at what they do that what they make up looks right most of the time. And that makes trusting them hard. 

Can we control what large language models generate so they produce text that’s guaranteed to be accurate? These models are far too complicated for their numbers to be tinkered with by hand. But some researchers believe that training them on even more text will continue to reduce their error rate. This is a trend we’ve seen as large language models have gotten bigger and better. 

Another approach involves asking models to check their work as they go, breaking responses down step by step. Known as chain-of-thought prompting, this has been shown to increase the accuracy of a chatbot’s output. It’s not possible yet, but future large language models may be able to fact-check the text they are producing and even rewind when they start to go off the rails.

But none of these techniques will stop hallucinations fully. As long as large language models are probabilistic, there is an element of chance in what they produce. Roll 100 dice and you’ll get a pattern. Roll them again and you’ll get another. Even if the dice are, like large language models, weighted to produce some patterns far more often than others, the results still won’t be identical every time. Even one error in 1,000—or 100,000—adds up to a lot of errors when you consider how many times a day this technology gets used. 

The more accurate these models become, the more we will let our guard down. Studies show that the better chatbots get, the more likely people are to miss an error when it happens.  

Perhaps the best fix for hallucination is to manage our expectations about what these tools are for. When the lawyer who used ChatGPT to generate fake documents was asked to explain himself, he sounded as surprised as anyone by what had happened. “I heard about this new site, which I falsely assumed was, like, a super search engine,” he told a judge. “I did not comprehend that ChatGPT could fabricate cases.” 

Why artists are becoming less scared of AI

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Knock, knock. 

Who’s there? 

An AI with generic jokes. Researchers from Google DeepMind asked 20 professional comedians to use popular AI language models to write jokes and comedy performances. Their results were mixed. 

The comedians said that the tools were useful in helping them produce an initial “vomit draft” that they could iterate on, and helped them structure their routines. But the AI was not able to produce anything that was original, stimulating, or, crucially, funny. My colleague Rhiannon Williams has the full story.

As Tuhin Chakrabarty, a computer science researcher at Columbia University who specializes in AI and creativity, told Rhiannon, humor often relies on being surprising and incongruous. Creative writing requires its creator to deviate from the norm, whereas LLMs can only mimic it.

And that is becoming pretty clear in the way artists are approaching AI today. I’ve just come back from Hamburg, which hosted one of the largest events for creatives in Europe, and the message I got from those I spoke to was that AI is too glitchy and unreliable to fully replace humans and is best used instead as a tool to augment human creativity. 

Right now, we are in a moment where we are deciding how much creative power we are comfortable giving AI companies and tools. After the boom first started in 2022, when DALL-E 2 and Stable Diffusion first entered the scene, many artists raised concerns that AI companies were scraping their copyrighted work without consent or compensation. Tech companies argue that anything on the public internet falls under fair use, a legal doctrine that allows the reuse of copyrighted-protected material in certain circumstances. Artists, writers, image companies, and the New York Times have filed lawsuits against these companies, and it will likely take years until we have a clear-cut answer as to who is right. 

Meanwhile, the court of public opinion has shifted a lot in the past two years. Artists I have interviewed recently say they were harassed and ridiculed for protesting AI companies’ data-scraping practices two years ago. Now, the general public is more aware of the harms associated with AI. In just two years, the public has gone from being blown away by AI-generated images to sharing viral social media posts about how to opt out of AI scraping—a concept that was alien to most laypeople until very recently. Companies have benefited from this shift too. Adobe has been successful in pitching its AI offerings as an “ethical” way to use the technology without having to worry about copyright infringement. 

There are also several grassroots efforts to shift the power structures of AI and give artists more agency over their data. I’ve written about Nightshade, a tool created by researchers at the University of Chicago, which lets users add an invisible poison attack to their images so that they break AI models when scraped. The same team is behind Glaze, a tool that lets artists mask their personal style from AI copycats. Glaze has been integrated into Cara, a buzzy new art portfolio site and social media platform, which has seen a surge of interest from artists. Cara pitches itself as a platform for art created by people; it filters out AI-generated content. It got nearly a million new users in a few days. 

This all should be reassuring news for any creative people worried that they could lose their job to a computer program. And the DeepMind study is a great example of how AI can actually be helpful for creatives. It can take on some of the boring, mundane, formulaic aspects of the creative process, but it can’t replace the magic and originality that humans bring. AI models are limited to their training data and will forever only reflect the zeitgeist at the moment of their training. That gets old pretty quickly.


Now read the rest of The Algorithm

Deeper Learning

Apple is promising personalized AI in a private cloud. Here’s how that will work.

Last week, Apple unveiled its vision for supercharging its product lineup with artificial intelligence. The key feature, which will run across virtually all of its product line, is Apple Intelligence, a suite of AI-based capabilities that promises to deliver personalized AI services while keeping sensitive data secure. 

Why this matters: Apple says its privacy-focused system will first attempt to fulfill AI tasks locally on the device itself. If any data is exchanged with cloud services, it will be encrypted and then deleted afterward. It’s a pitch that offers an implicit contrast with the likes of Alphabet, Amazon, or Meta, which collect and store enormous amounts of personal data. Read more from James O’Donnell here

Bits and Bytes

How to opt out of Meta’s AI training
If you post or interact with chatbots on Facebook, Instagram, Threads, or WhatsApp, Meta can use your data to train its generative AI models. Even if you don’t use any of Meta’s platforms, it can still scrape data such as photos of you if someone else posts them. Here’s our quick guide on how to opt out. (MIT Technology Review

Microsoft’s Satya Nadella is building an AI empire
Nadella is going all in on AI. His $13 billion investment in OpenAI was just the beginning. Microsoft has become an “the world’s most aggressive amasser of AI talent, tools, and technology” and has started building an in-house OpenAI competitor. (The Wall Street Journal)

OpenAI has hired an army of lobbyists
As countries around the world mull AI legislation, OpenAI is on a lobbyist hiring spree to protect its interests. The AI company has expanded its global affairs team from three lobbyists at the start of 2023 to 35 and intends to have up to 50 by the end of this year. (Financial Times)  

UK rolls out Amazon-powered emotion recognition AI cameras on trains
People traveling through some of the UK’s biggest train stations have likely had their faces scanned by Amazon software without their knowledge during an AI trial. London stations such as Euston and Waterloo have tested CCTV cameras with AI to reduce crime and detect people’s emotions. Emotion recognition technology is extremely controversial. Experts say it is unreliable and simply does not work. 
(Wired

Clearview AI used your face. Now you may get a stake in the company.
The facial recognition company, which has been under fire for scraping images of people’s faces from the web and social media without their permission, has agreed to an unusual settlement in a class action against it. Instead of paying cash, it is offering a 23% stake in the company for Americans whose faces are in its data sets. (The New York Times

Elephants call each other by their names
This is so cool! Researchers used AI to analyze the calls of two herds of African savanna elephants in Kenya. They found that elephants use specific vocalizations for each individual and recognize when they are being addressed by other elephants. (The Guardian

Meta has created a way to watermark AI-generated speech

Meta has created a system that can embed hidden signals, known as watermarks, in AI-generated audio clips, which could help in detecting AI-generated content online. 

The tool, called AudioSeal, is the first that can pinpoint which bits of audio in, for example, a full hourlong podcast might have been generated by AI. It could help to tackle the growing problem of misinformation and scams using voice cloning tools, says Hady Elsahar, a research scientist at Meta. Malicious actors have used generative AI to create audio deepfakes of President Joe Biden, and scammers have used deepfakes to blackmail their victims. Watermarks could in theory help social media companies detect and remove unwanted content. 

However, there are some big caveats. Meta says it has no plans yet to apply the watermarks to AI-generated audio created using its tools. Audio watermarks are not yet adopted widely, and there is no single agreed industry standard for them. And watermarks for AI-generated content tend to be easy to tamper with—for example, by removing or forging them. 

Fast detection, and the ability to pinpoint which elements of an audio file are AI-generated, will be critical to making the system useful, says Elsahar. He says the team achieved between 90% and 100% accuracy in detecting the watermarks, much better results than in previous attempts at watermarking audio. 

AudioSeal is available on GitHub for free. Anyone can download it and use it to add watermarks to AI-generated audio clips. It could eventually be overlaid on top of AI audio generation models, so that it is automatically applied to any speech generated using them. The researchers who created it will present their work at the International Conference on Machine Learning in Vienna, Austria, in July.  

AudioSeal is created using two neural networks. One generates watermarking signals that can be embedded into audio tracks. These signals are imperceptible to the human ear but can be detected quickly using the other neural network. Currently, if you want to try to spot AI-generated audio in a longer clip, you have to comb through the entire thing in second-long chunks to see if any of them contain a watermark. This is a slow and laborious process, and not practical on social media platforms with millions of minutes of speech.  

AudioSeal works differently: by embedding a watermark throughout each section of the entire audio track. This allows the watermark to be “localized,” which means it can still be detected even if the audio is cropped or edited. 

Ben Zhao, a computer science professor at the University of Chicago, says this ability, and the near-perfect detection accuracy, makes AudioSeal better than any previous audio watermarking system he’s come across. 

“It’s meaningful to explore research improving the state of the art in watermarking, especially across mediums like speech that are often harder to mark and detect than visual content,” says Claire Leibowicz, head of AI and media integrity at the nonprofit  Partnership on AI. 

But there are some major flaws that need to be overcome before these sorts of audio watermarks can be adopted en masse. Meta’s researchers tested different attacks to remove the watermarks and found that the more information is disclosed about the watermarking algorithm, the more vulnerable it is. The system also requires people to voluntarily add the watermark to their audio files.  

This places some fundamental limitations on the tool, says Zhao. “Where the attacker has some access to the [watermark] detector, it’s pretty fragile,” he says. And this means only Meta will be able to verify whether audio content is AI-generated or not. 

Leibowicz says she remains unconvinced that watermarks will actually further public trust in the information they’re seeing or hearing, despite their popularity as a solution in the tech sector. That’s partly because they are themselves so open to abuse. 

“I’m skeptical that any watermark will be robust to adversarial stripping and forgery,” she adds. 

How to opt out of Meta’s AI training

MIT Technology Review’s How To series helps you get things done. 

If you post or interact with chatbots on Facebook, Instagram, Threads, or WhatsApp, Meta can use your data to train its generative AI models beginning June 26, according to its recently updated privacy policy. Even if you don’t use any of Meta’s platforms, it can still scrape data such as photos of you if someone else posts them.

Internet data scraping is one of the biggest fights in AI right now. Tech companies argue that anything on the public internet is fair game, but they are facing a barrage of lawsuits over their data practices and copyright. It will likely take years until clear rules are in place. 

In the meantime, they are running out of training data to build even bigger, more powerful models, and to Meta, your posts are a gold mine. 

If you’re uncomfortable with having Meta use your personal information and intellectual property to train its AI models in perpetuity, consider opting out. Although Meta does not guarantee it will allow this, it does say it will “review objection requests in accordance with relevant data protection laws.” 

What that means for US users

Users in the US or other countries without national data privacy laws don’t have any foolproof ways to prevent Meta from using their data to train AI, which has likely already been used for such purposes. Meta does not have an opt-out feature for people living in these places. 

A spokesperson for Meta says it does not use the content of people’s private messages to each other to train AI. However, public social media posts are seen as fair game and can be hoovered up into AI training data sets by anyone. Users who don’t want that can set their account settings to private to minimize the risk. 

The company has built in-platform tools that allow people to delete their personal information from chats with Meta AI, the spokesperson says.

How users in Europe and the UK can opt out 

Users in the European Union and the UK, which are protected by strict data protection regimes, have the right to object to their data being scraped, so they can opt out more easily. 

If you have a Facebook account:

1. Log in to your account. You can access the new privacy policy by following this link. At the very top of the page, you should see a box that says “Learn more about your right to object.” Click on that link, or here

Alternatively, you can click on your account icon at the top right-hand corner. Select “Settings and privacy” and then “Privacy center.” On the left-hand side you will see a drop-down menu labeled “How Meta uses information for generative AI models and features.” Click on that, and scroll down. Then click on “Right to object.” 

2. Fill in the form with your information. The form requires you to explain how Meta’s data processing affects you. I was successful in my request by simply stating that I wished to exercise my right under data protection law to object to my personal data being processed. You will likely have to confirm your email address. 

3. You should soon receive both an email and a notification on your Facebook account confirming if your request has been successful. I received mine a minute after submitting the request.

If you have an Instagram account: 

1. Log in to your account. Go to your profile page, and click on the three lines at the top-right corner. Click on “Settings and privacy.”

2. Scroll down to the “More info and support” section, and click “About.” Then click on “Privacy policy.” At the very top of the page, you should see a box that says “Learn more about your right to object.” Click on that link, or here

3. Repeat steps 2 and 3 as above. 

Apple is promising personalized AI in a private cloud. Here’s how that will work.

At its Worldwide Developer Conference on Monday, Apple for the first time unveiled its vision for supercharging its product lineup with artificial intelligence. The key feature, which will run across virtually all of its product line, is Apple Intelligence, a suite of AI-based capabilities that promises to deliver personalized AI services while keeping sensitive data secure. It represents Apple’s largest leap forward in using our private data to help AI do tasks for us. To make the case it can do this without sacrificing privacy, the company says it has built a new way to handle sensitive data in the cloud.

Apple says its privacy-focused system will first attempt to fulfill AI tasks locally on the device itself. If any data is exchanged with cloud services, it will be encrypted and then deleted afterward. The company also says the process, which it calls Private Cloud Compute, will be subject to verification by independent security researchers. 

The pitch offers an implicit contrast with the likes of Alphabet, Amazon, or Meta, which collect and store enormous amounts of personal data. Apple says any personal data passed on to the cloud will be used only for the AI task at hand and will not be retained or accessible to the company, even for debugging or quality control, after the model completes the request. 

Simply put, Apple is saying people can trust it to analyze incredibly sensitive data—photos, messages, and emails that contain intimate details of our lives—and deliver automated services based on what it finds there, without actually storing the data online or making any of it vulnerable. 

It showed a few examples of how this will work in upcoming versions of iOS. Instead of scrolling through your messages for that podcast your friend sent you, for example, you could simply ask Siri to find and play it for you. Craig Federighi, Apple’s senior vice president of software engineering, walked through another scenario: an email comes in pushing back a work meeting, but his daughter is appearing in a play that night. His phone can now find the PDF with information about the performance, predict the local traffic, and let him know if he’ll make it on time. These capabilities will extend beyond apps made by Apple, allowing developers to tap into Apple’s AI too. 

Because the company profits more from hardware and services than from ads, Apple has less incentive than some other companies to collect personal online data, allowing it to position the iPhone as the most private device. Even so, Apple has previously found itself in the crosshairs of privacy advocates. Security flaws led to leaks of explicit photos from iCloud in 2014. In 2019, contractors were found to be listening to intimate Siri recordings for quality control. Disputes about how Apple handles data requests from law enforcement are ongoing. 

The first line of defense against privacy breaches, according to Apple, is to avoid cloud computing for AI tasks whenever possible. “The cornerstone of the personal intelligence system is on-device processing,” Federighi says, meaning that many of the AI models will run on iPhones and Macs rather than in the cloud. “It’s aware of your personal data without collecting your personal data.”

That presents some technical obstacles. Two years into the AI boom, pinging models for even simple tasks still requires enormous amounts of computing power. Accomplishing that with the chips used in phones and laptops is difficult, which is why only the smallest of Google’s AI models can be run on the company’s phones, and everything else is done via the cloud. Apple says its ability to handle AI computations on-device is due to years of research into chip design, leading to the M1 chips it began rolling out in 2020.

Yet even Apple’s most advanced chips can’t handle the full spectrum of tasks the company promises to carry out with AI. If you ask Siri to do something complicated, it may need to pass that request, along with your data, to models that are available only on Apple’s servers. This step, security experts say, introduces a host of vulnerabilities that may expose your information to outside bad actors, or at least to Apple itself.

“I always warn people that as soon as your data goes off your device, it becomes much more vulnerable,” says Albert Fox Cahn, executive director of the Surveillance Technology Oversight Project and practitioner in residence at NYU Law School’s Information Law Institute. 

Apple claims to have mitigated this risk with its new Private Cloud Computer system. “For the first time ever, Private Cloud Compute extends the industry-leading security and privacy of Apple devices into the cloud,” Apple security experts wrote in their announcement, stating that personal data “isn’t accessible to anyone other than the user—not even to Apple.” How does it work?

Historically, Apple has encouraged people to opt in to end-to-end encryption (the same type of technology used in messaging apps like Signal) to secure sensitive iCloud data. But that doesn’t work for AI. Unlike messaging apps, where a company like WhatsApp does not need to see the contents of your messages in order to deliver them to your friends, Apple’s AI models need unencrypted access to the underlying data to generate responses. This is where Apple’s privacy process kicks in. First, Apple says, data will be used only for the task at hand. Second, this process will be verified by independent researchers. 

Needless to say, the architecture of this system is complicated, but you can imagine it as an encryption protocol. If your phone determines it needs the help of a larger AI model, it will package a request containing the prompt it’s using and the specific model, and then put a lock on that request. Only the specific AI model to be used will have the proper key.

When asked by MIT Technology Review whether users will be notified when a certain request is sent to cloud-based AI models instead of being handled on-device, an Apple spokesperson said there will be transparency to users but that further details aren’t available.

Dawn Song, co-Director of UC Berkeley Center on Responsible Decentralized Intelligence and an expert in private computing, says Apple’s new developments are encouraging. “The list of goals that they announced is well thought out,” she says. “Of course there will be some challenges in meeting those goals.”

Cahn says that to judge from what Apple has disclosed so far, the system seems much more privacy-protective than other AI products out there today. That said, the common refrain in his space is “Trust but verify.” In other words, we won’t know how secure these systems keep our data until independent researchers can verify its claims, as Apple promises they will, and the company responds to their findings.

“Opening yourself up to independent review by researchers is a great step,” he says. “But that doesn’t determine how you’re going to respond when researchers tell you things you don’t want to hear.” Apple did not respond to questions from MIT Technology Review about how the company will evaluate feedback from researchers.

The privacy-AI bargain

Apple is not the only company betting that many of us will grant AI models mostly unfettered access to our private data if it means they could automate tedious tasks. OpenAI’s Sam Altman described his dream AI tool to MIT Technology Review as one “that knows absolutely everything about my whole life, every email, every conversation I’ve ever had.” At its own developer conference in May, Google announced Project Astra, an ambitious project to build a “universal AI agent that is helpful in everyday life.”

It’s a bargain that will force many of us to consider for the first time what role, if any, we want AI models to play in how we interact with our data and devices. When ChatGPT first came on the scene, that wasn’t a question we needed to ask. It was simply a text generator that could write us a birthday card or a poem, and the questions it raised—like where its training data came from or what biases it perpetuated—didn’t feel quite as personal. 

Now, less than two years later, Big Tech is making billion-dollar bets that we trust the safety of these systems enough to fork over our private information. It’s not yet clear if we know enough to make that call, or how able we are to opt out even if we’d like to. “I do worry that we’re going to see this AI arms race pushing ever more of our data into other people’s hands,” Cahn says.

Apple will soon release beta versions of its Apple Intelligence features, starting this fall with the iPhone 15 and the new macOS Sequoia, which can be run on Macs and iPads with M1 chips or newer. Says Apple CEO Tim Cook, “We think Apple intelligence is going to be indispensable.”