AI lie detectors are better than humans at spotting lies

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

Can you spot a liar? It’s a question I imagine has been on a lot of minds lately, in the wake of various televised political debates. Research has shown that we’re generally pretty bad at telling a truth from a lie.
 
Some believe that AI could help improve our odds, and do better than dodgy old fashioned techniques like polygraph tests. AI-based lie detection systems could one day be used to help us sift fact from fake news, evaluate claims, and potentially even spot fibs and exaggerations in job applications. The question is whether we will trust them. And if we should.

In a recent study, Alicia von Schenk and her colleagues developed a tool that was significantly better than people at spotting lies. Von Schenk, an economist at the University of Würzburg in Germany, and her team then ran some experiments to find out how people used it. In some ways, the tool was helpful—the people who made use of it were better at spotting lies. But they also led people to make a lot more accusations.

In their study published in the journal iScience, von Schenk and her colleagues asked volunteers to write statements about their weekend plans. Half the time, people were incentivized to lie; a believable yet untrue statement was rewarded with a small financial payout. In total, the team collected 1,536 statements from 768 people.
 
They then used 80% of these statements to train an algorithm on lies and truths, using Google’s AI language model BERT. When they tested the resulting tool on the final 20% of statements, they found it could successfully tell whether a statement was true or false 67% of the time. That’s significantly better than a typical human; we usually only get it right around half the time.
 
To find out how people might make use of AI to help them spot lies, von Schenk and her colleagues split 2,040 other volunteers into smaller groups and ran a series of tests.
 
One test revealed that when people are given the option to pay a small fee to use an AI tool that can help them detect lies—and earn financial rewards—they still aren’t all that keen on using it. Only a third of the volunteers given that option decided to use the AI tool, possibly because they’re skeptical of the technology, says von Schenk. (They might also be overly optimistic about their own lie-detection skills, she adds.)
 
But that one-third of people really put their trust in the technology. “When you make the active choice to rely on the technology, we see that people almost always follow the prediction of the AI… they rely very much on its predictions,” says von Schenk.

This reliance can shape our behavior. Normally, people tend to assume others are telling the truth. That was borne out in this study—even though the volunteers knew half of the statements were lies, they only marked out 19% of them as such. But that changed when people chose to make use of the AI tool: the accusation rate rose to 58%.
 
In some ways, this is a good thing—these tools can help us spot more of the lies we come across in our lives, like the misinformation we might come across on social media.
 
But it’s not all good. It could also undermine trust, a fundamental aspect of human behavior that helps us form relationships. If the price of accurate judgements is the deterioration of social bonds, is it worth it?
 
And then there’s the question of accuracy. In their study, von Schenk and her colleagues were only interested in creating a tool that was better than humans at lie detection. That isn’t too difficult, given how terrible we are at it. But she also imagines a tool like hers being used to routinely assess the truthfulness of social media posts, or hunt for fake details in a job hunter’s resume or interview responses. In cases like these, it’s not enough for a technology to just be “better than human” if it’s going to be making more accusations. 
 
Would we be willing to accept an accuracy rate of 80%, where only four out of every five assessed statements would be correctly interpreted as true or false? Would even 99% accuracy suffice? I’m not sure.
 
It’s worth remembering the fallibility of historical lie detection techniques. The polygraph was designed to measure heart rate and other signs of “arousal” because it was thought some signs of stress were unique to liars. They’re not. And we’ve known that for a long time. That’s why lie detector results are generally not admissible in US court cases. Despite that, polygraph lie detector tests have endured in some settings, and have caused plenty of harm when they’ve been used to hurl accusations at people who fail them on reality TV shows.
 
Imperfect AI tools stand to have an even greater impact because they are so easy to scale, says von Schenk. You can only polygraph so many people in a day. The scope for AI lie detection is almost limitless by comparison.
 
“Given that we have so much fake news and disinformation spreading, there is a benefit to these technologies,” says von Schenk. “However, you really need to test them—you need to make sure they are substantially better than humans.” If an AI lie detector is generating a lot of accusations, we might be better off not using it at all, she says.


Now read the rest of The Checkup

Read more from MIT Technology Review’s archive

AI lie detectors have also been developed to look for facial patterns of movement and “microgestures” associated with deception. As Jake Bittle puts it: “the dream of a perfect lie detector just won’t die, especially when glossed over with the sheen of AI.”
 
On the other hand, AI is also being used to generate plenty of disinformation. As of October last year, generative AI was already being used in at least 16 countries to “sow doubt, smear opponents, or influence public debate,” as Tate Ryan-Mosley reported.
 
The way AI language models are developed can heavily influence the way that they work. As a result, these models have picked up different political biases, as my colleague Melissa Heikkilä covered last year.
 
AI, like social media, has the potential for good or ill. In both cases, the regulatory limits we place on these technologies will determine which way the sword falls, argue Nathan E. Sanders and Bruce Schneier.
 
Chatbot answers are all made up.
But there’s a tool that can give a reliability score to large language model outputs, helping users work out how trustworthy they are. Or, as Will Douglas Heaven put it in an article published a few months ago, a BS-o-meter for chatbots.

From around the web

Scientists, ethicists and legal experts in the UK have published a new set of guidelines for research on synthetic embryos, or, as they call them, “stem cell-based embryo models (SCBEMs).” There should be limits on how long they are grown in labs, and they should not be transferred into the uterus of a human or animal, the guideline states. They also note that, if, in future, these structures look like they might have the potential to develop into a fetus, we should stop calling them “models” and instead refer to them as “embryos.”

Antimicrobial resistance is already responsible for 700,000 deaths every year, and could claim 10 million lives per year by 2050. Overuse of broad spectrum antibiotics is partly to blame. Is it time to tax these drugs to limit demand? (International Journal of Industrial Organization)

Spaceflight can alter the human brain, reorganizing gray and white matter and causing the brain to shift upwards in the skull. We need to better understand these effects, and the impact of cosmic radiation on our brains, before we send people to Mars. (The Lancet Neurology)

The vagus nerve has become an unlikely star of social media, thanks to influencers who drum up the benefits of stimulating it. Unfortunately, the science doesn’t stack up. (New Scientist)

A hospital in Texas is set to become the first in the country to enable doctors to see their patients via hologram. Crescent Regional Hospital in Lancaster has installed Holobox—a system that projects a life-sized hologram of a doctor for patient consultations. (ABC News)

What are AI agents? 

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

When ChatGPT was first released, everyone in AI was talking about the new generation of AI assistants. But over the past year, that excitement has turned to a new target: AI agents. 

Agents featured prominently in Google’s annual I/O conference in May, when the company unveiled its new AI agent called Astra, which allows users to interact with it using audio and video. OpenAI’s new GPT-4o model has also been called an AI agent.  

And it’s not just hype, although there is definitely some of that too. Tech companies are plowing vast sums into creating AI agents, and their research efforts could usher in the kind of useful AI we have been dreaming about for decades. Many experts, including Sam Altman, say they are the next big thing.   

But what are they? And how can we use them? 

How are they defined? 

It is still early days for research into AI agents, and the field does not have a definitive definition for them. But simply, they are AI models and algorithms that can autonomously make decisions in a dynamic world, says Jim Fan, a senior research scientist at Nvidia who leads the company’s AI agents initiative. 

The grand vision for AI agents is a system that can execute a vast range of tasks, much like a human assistant. In the future, it could help you book your vacation, but it will also remember if you prefer swanky hotels, so it will only suggest hotels that have four stars or more and then go ahead and book the one you pick from the range of options it offers you. It will then also suggest flights that work best with your calendar, and plan the itinerary for your trip according to your preferences. It could make a list of things to pack based on that plan and the weather forecast. It might even send your itinerary to any friends it knows live in your destination and invite them along. In the workplace, it  could analyze your to-do list and execute tasks from it, such as sending calendar invites, memos, or emails. 

One vision for agents is that they are multimodal, meaning they can process language, audio, and video. For example, in Google’s Astra demo, users could point a smartphone camera at things and ask the agent questions. The agent could respond to text, audio, and video inputs. 

These agents could also make processes smoother for businesses and public organizations, says David Barber, the director of the University College London Centre for Artificial Intelligence. For example, an AI agent might be able to function as a more sophisticated customer service bot. The current generation of language-model-based assistants can only generate the next likely word in a sentence. But an AI agent would have the ability to act on natural-language commands autonomously and process customer service tasks without supervision. For example, the agent would be able to analyze customer complaint emails and then know to check the customer’s reference number, access databases such as customer relationship management and delivery systems to see whether the complaint is legitimate, and process it according to the company’s policies, Barber says. 

Broadly speaking, there are two different categories of agents, says Fan: software agents and embodied agents. 

Software agents run on computers or mobile phones and use apps, much as in the travel agent example above. “Those agents are very useful for office work or sending emails or having this chain of events going on,” he says. 

Embodied agents are agents that are situated in a 3D world such as a video game, or in a robot. These kinds of agents might make video games more engaging by letting people play with nonplayer characters controlled by AI. These sorts of agents could also help build more useful robots that could help us with everyday tasks at home, such as folding laundry and cooking meals. 

Fan was part of a team that built an embodied AI agent called MineDojo in the popular computer game Minecraft. Using a vast trove of data collected from the internet, Fan’s AI agent was able to learn new skills and tasks that allowed it to freely explore the virtual 3D world and complete complex tasks such as encircling llamas with fences or scooping lava into a bucket. Video games are good proxies for the real world, because they require agents to understand physics, reasoning, and common sense. 

In a new paper, which has not yet been peer-reviewed, researchers at Princeton say that AI agents tend to have three different characteristics. AI systems are considered “agentic” if they can pursue difficult goals without being instructed in complex environments. They also qualify if they can be instructed in natural language and act autonomously without supervision. And finally, the term “agent” can also apply to systems that are able to use tools, such as web search or programming, or are capable of planning. 

Are they a new thing?

The term “AI agents” has been around for years and has meant different things at different times, says Chirag Shah, a computer science professor at the University of Washington. 

There have been two waves of agents, says Fan. The current wave is thanks to the language model boom and the rise of systems such as ChatGPT. 

The previous wave was in 2016, when Google DeepMind introduced AlphaGo, its AI system that can play—and win—the game Go. AlphaGo was able to make decisions and plan strategies. This relied on reinforcement learning, a technique that rewards AI algorithms for desirable behaviors. 

“But these agents were not general,” says Oriol Vinyals, vice president of research at Google DeepMind. They were created for very specific tasks—in this case, playing Go. The new generation of foundation-model-based AI makes agents more universal, as they can learn from the world humans interact with. 

“You feel much more that the model is interacting with the world and then giving back to you better answers or better assisted assistance or whatnot,” says Vinyals. 

What are the limitations? 

There are still many open questions that need to be answered. Kanjun Qiu, CEO and founder of the AI startup Imbue, which is working on agents that can reason and code, likens the state of agents to where self-driving cars were just over a decade ago. They can do stuff, but they’re unreliable and still not really autonomous. For example, a coding agent can generate code, but it sometimes gets it wrong, and it doesn’t know how to test the code it’s creating, says Qiu. So humans still need to be actively involved in the process. AI systems still can’t fully reason, which is a critical step in operating in a complex and  ambiguous human world. 

“We’re nowhere close to having an agent that can just automate all of these chores for us,” says Fan. Current systems “hallucinate and they also don’t always follow instructions closely,” Fan says. “And that becomes annoying.”  

Another limitation is that after a while, AI agents lose track of what they are working on. AI systems are limited by their context windows, meaning the amount of data they can take into account at any given time. 

“ChatGPT can do coding, but it’s not able to do long-form content well. But for human developers, we look at an entire GitHub repository that has tens if not hundreds of lines of code, and we have no trouble navigating it,” says Fan. 

To tackle this problem, Google has increased its models’ capacity to process data, which allows users to have longer interactions with them in which they remember more about past interactions. The company said it is working on making its context windows infinite in the future.

For embodied agents such as robots, there are even more limitations. There is not enough training data to teach them, and researchers are only just starting to harness the power of foundation models in robotics. 

So amid all the hype and excitement, it’s worth bearing in mind that research into AI agents is still in its very early stages, and it will likely take years until we can experience their full potential. 

That sounds cool. Can I try an AI agent now? 

Sort of. You’ve most likely tried their early prototypes, such as OpenAI’s ChatGPT and GPT-4. “If you’re interacting with software that feels smart, that is kind of an agent,” says Qiu. 

Right now the best agents we have are systems with very narrow and specific use cases, such as coding assistants, customer service bots, or workflow automation software like Zapier, she says. But these are a far cry from a universal AI agent that can do complex tasks. 

“Today we have these computers and they’re really powerful, but we have to micromanage them,” says Qiu. 

OpenAI’s ChatGPT plug-ins, which allow people to create AI-powered assistants for web browsers, were an attempt at agents, says Qiu. But these systems are still clumsy, unreliable, and not capable of reasoning, she says. 

Despite that, these systems will one day change the way we interact with technology, Qiu believes, and it is a trend people need to pay attention to. 

“It’s not like, ‘Oh my God, all of a sudden we have AGI’ … but more like ‘Oh my God, my computer can do way more than it did five years ago,’” she says.

AI companies are finally being forced to cough up for training data

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The generative AI boom is built on scale. The more training data, the more powerful the model. 

But there’s a problem. AI companies have pillaged the internet for training data, and many websites and data set owners have started restricting the ability to scrape their websites. We’ve also seen a backlash against the AI sector’s practice of indiscriminately scraping online data, in the form of users opting out of making their data available for training and lawsuits from artists, writers, and the New York Times, claiming that AI companies have taken their intellectual property without consent or compensation. 

Last week three major record labels—Sony Music, Warner Music Group, and Universal Music Group—announced they were suing the AI music companies Suno and Udio over alleged copyright infringement. The music labels claim the companies made use of copyrighted music in their training data “at an almost unimaginable scale,” allowing the AI models to generate songs that “imitate the qualities of genuine human sound recordings.” My colleague James O’Donnell dissects the lawsuits in his story and points out that these lawsuits could determine the future of AI music. Read it here

But this moment also sets an interesting precedent for all of generative AI development. Thanks to the scarcity of high-quality data and the immense pressure and demand to build even bigger and better models, we’re in a rare moment where data owners actually have some leverage. The music industry’s lawsuit sends the loudest message yet: High-quality training data is not free. 

It will likely take a few years at least before we have legal clarity around copyright law, fair use, and AI training data. But the cases are already ushering in changes. OpenAI has been striking deals with news publishers such as Politico, the AtlanticTime, the Financial Times, and others, and exchanging publishers’ news archives for money and citations. And YouTube announced in late June that it will offer licensing deals to top record labels in exchange for music for training. 

These changes are a mixed bag. On one hand, I’m concerned that news publishers are making a Faustian bargain with AI. For example, most of the media houses that have made deals with OpenAI say the deal stipulates that OpenAI cite its sources. But language models are fundamentally incapable of being factual and are best at making things up. Reports have shown that ChatGPT and the AI-powered search engine Perplexity frequently hallucinate citations, which makes it hard for OpenAI to honor its promises.   

It’s tricky for AI companies too. This shift could lead to them build smaller, more efficient models, which are far less polluting. Or they may fork out a fortune to access data at the scale they need to build the next big one. Only the companies most flush with cash, and/or with large existing data sets of their own (such as Meta, with its two decades of social media data), can afford to do that. So the latest developments risk concentrating power even further into the hands of the biggest players. 

On the other hand, the idea of introducing consent into this process is a good one—not just for rights holders, who can benefit from the AI boom, but for all of us. We should all have the agency to decide how our data is used, and a fairer data economy would mean we could all benefit. 


Now read the rest of The Algorithm

Deeper Learning

How AI video games can help reveal the mysteries of the human mind

Neuroscientists and psychologists have long been using games as research tools to learn about the human mind. Video games have been either co-opted or specially designed to study how people learn, navigate, and cooperate with others, for example. AI video games—where characters don’t need scripts and appear to play when you’re not watching—could allow us to probe more deeply and unravel enduring mysteries about our brains and behavior, suggests my colleague Jessica Hamzelou in our weekly biotech newsletter, The Checkup.

Ready, set, go: Scientists who have done this type of study were able to observe and study how players behaved in these games: how they explored their virtual environment, how they sought rewards, how they made decisions. And research volunteers didn’t need to travel to a lab—their gaming behavior could be observed from wherever they happened to be playing, whether that was at home, at a library, or even inside an MRI scanner. Read more from Jessica.

Bits and Bytes

AI is already wreaking havoc on global power systems
A really well-done data visualization of the insane amount of electricity AI requires and how it is transforming our energy grid. A startling statistic: Data centers use more electricity than most countries. (Bloomberg

The AI boom has an unlikely early winner: wonky consultants
It seems every company out there is thinking about how to use AI. But the problem is that nobody is sure exactly how to do that. And so in come consultants, who are profiting from AI FOMO. Work related to generative AI will make up about 40% of McKinsey’s business this year. (The New York Times)

Deepfake creators are revictimizing sex trafficking survivors
A new low: For the past few months, the largest deepfake sexual abuse website has posted deepfake videos based on footage from GirlsDoPorn, a now-defunct sex trafficking operation. (Wired)

I paid $365.63 to replace 404 Media with AI
A journalist paid gig workers to use ChatGPT to plagiarize news. The result: grammatically correct nonsense. (404 Media)

A way to let robots learn by listening will make them more useful

Most AI-powered robots today use cameras to understand their surroundings and learn new tasks, but it’s becoming easier to train robots with sound too, helping them adapt to tasks and environments where visibility is limited. 

Though sight is important, there are daily tasks where sound is actually more helpful, like listening to onions sizzling on the stove to see if the pan is at the right temperature. Training robots with audio has only been done in highly controlled lab settings, however, and the techniques have lagged behind other fast robot-teaching methods.

Researchers at the Robotics and Embodied AI Lab at Stanford University set out to change that. They first built a system for collecting audio data, consisting of a GoPro camera and a gripper with a microphone designed to filter out background noise. Human demonstrators used the gripper for a variety of household tasks and then used this data to teach robotic arms how to execute the task on their own. The team’s new training algorithms help robots gather clues from audio signals to perform more effectively. 

“Thus far, robots have been training on videos that are muted,” says Zeyi Liu, a PhD student at Stanford and lead author of the study. “But there is so much helpful data in audio.”

To test how much more successful a robot can be if it’s capable of “listening,” the researchers chose four tasks: flipping a bagel in a pan, erasing a whiteboard, putting two Velcro strips together, and pouring dice out of a cup. In each task, sounds provide clues that cameras or tactile sensors struggle with, like knowing if the eraser is properly contacting the whiteboard or whether the cup contains dice. 

After demonstrating each task a couple of hundred times, the team compared the success rates of training with audio and training only with vision. The results, published in a paper on arXiv that has not been peer-reviewed, were promising. When using vision alone in the dice test, the robot could tell 27% of the time if there were dice in the cup, but that rose to 94% when sound was included.

It isn’t the first time audio has been used to train robots, says Shuran Song, the head of the lab that produced the study, but it’s a big step toward doing so at scale: “We are making it easier to use audio collected ‘in the wild,’ rather than being restricted to collecting it in the lab, which is more time consuming.” 

The research signals that audio might become a more sought-after data source in the race to train robots with AI. Researchers are teaching robots faster than ever before using imitation learning, showing them hundreds of examples of tasks being done instead of hand-coding each one. If audio could be collected at scale using devices like the one in the study, it could give them an entirely new “sense,” helping them more quickly adapt to environments where visibility is limited or not useful.

“It’s safe to say that audio is the most understudied modality for sensing [in robots],” says Dmitry Berenson, associate professor of robotics at the University of Michigan, who was not involved in the study. That’s because the bulk of research on training robots to manipulate objects has been for industrial pick-and-place tasks, like sorting objects into bins. Those tasks don’t benefit much from sound, instead relying on tactile or visual sensors. But as robots broaden into tasks in homes, kitchens, and other environments, audio will become increasingly useful, Berenson says.

Consider a robot trying to find which bag or pocket contains a set of keys, all with limited visibility. “Maybe even before you touch the keys, you hear them kind of jangling,” Berenson says. “That’s a cue that the keys are in that pocket instead of others.”

Still, audio has limits. The team points out sound won’t be as useful with so-called soft or flexible objects like clothes, which don’t create as much usable audio. The robots also struggled with filtering out the audio of their own motor noises during tasks, since that noise was not present in the training data produced by humans. To fix it, the researchers needed to add robot sounds—whirs, hums, and actuator noises—into the training sets so the robots could learn to tune them out. 

The next step, Liu says, is to see how much better the models can get with more data, which could mean adding more microphones, collecting spatial audio, and incorporating microphones into other types of data-collection devices. 

Training AI music models is about to get very expensive

AI music is suddenly in a make-or-break moment. On June 24, Suno and Udio, two leading AI music startups that make tools to generate complete songs from a prompt in seconds, were sued by major record labels. Sony Music, Warner Music Group, and Universal Music Group claim the companies made use of copyrighted music in their training data “at an almost unimaginable scale,” allowing the AI models to generate songs that “imitate the qualities of genuine human sound recordings.”

Two days later, the Financial Times reported that YouTube is pursuing a comparatively aboveboard approach. Rather than training AI music models on secret data sets, the company is reportedly offering unspecified lump sums to top record labels in exchange for licenses to use their catalogues for training. 

In response to the lawsuits, both Suno and Udio released statements mentioning efforts to ensure that their models don’t imitate copyrighted works, but neither company has specified whether their training sets contain them. Udio said its model “has ‘listened’ to and learned from a large collection of recorded music,” and two weeks before the lawsuits, Suno CEO Mikey Shulman told me its training set is “both industry standard and legal” but the exact recipe is proprietary.

While the ground here is changing fast, none of these moves should be all that surprising: litigious training-data battles have become something like a rite of passage for generative AI companies. The trend has led many of those companies, including OpenAI, to pay for licensing deals while the cases unfold. 

However, the stakes are higher for AI music than for image generators or chatbots. Generative AI companies working in text or photos have options to work around lawsuits; for example, they can cobble together open-source corpuses to train models. In contrast, music in the public domain is much more limited (and not exactly what most people want to listen to). 

Other AI companies can also more easily cut licensing deals with interested publishers and creators, of which there are many; but rights in music are far more concentrated than those in film, images, or text, industry experts say. They’re largely managed by the three biggest record labels—the new plaintiffs—whose publishing arms collectively own more than 10 million songs and much of the music that has defined the last century. (The filing names a long list of artists who the labels allege were wrongfully included in training data, ranging from ABBA to those on the Hamilton soundtrack.) 

On top of all this, it’s also just more difficult to create music worth listening to—generating a readable poem or passable illustration with AI is one technical challenge, but infusing a model with the taste required to create music we like is another. 

It’s of course possible that the AI companies will win the case, and none of this will matter; they would have carte blanche to train on a century of copyrighted music. But experts say the case from the record labels is strong, and it’s more likely that AI companies will soon have to pay up—and pay a lot—if they want to survive. If a court were to rule that AI music companies could not train for free on these labels’ catalogues, then expensive licensing deals, like the one YouTube is reportedly pursuing, would seem to be the only path forward. This would effectively ensure that the company with the deepest pockets ends up on top.

More than any training-data case yet, the outcome of this one will determine the shape of a big slice of AI—and whether there is a future for it at all. 

Merits of the case

Suno’s music generator has been public for less than a year, but the company has already garnered 12 million users, a $125 million funding round last month, and a partnership with Microsoft Copilot. Udio is even newer to the scene, having launched in April with $10 million in seed funding from musician-investors like will.i.am and Common. 

The record labels allege that both of the startups are engaging in copyright infringement on the training and the output sides of their models.

“The plaintiffs here have the best odds of almost anyone suing an AI company,” says James Grimmelmann, a professor of digital and information law at Cornell Law School. He draws comparisons to the ongoing New York Times case against OpenAI, which he says offered, until now, the best example of a rights holder with a strong case against an AI company. But the suit against Suno and Udio “is worse for a bunch of reasons.”

The Times has accused OpenAI of copyright infringement in its model training by using the publication’s articles without consent. Grimmelmann says OpenAI has a bit of plausible deniability in this accusation, because the company could say that it scraped much of the internet for a training corpus and copies of New York Times articles appeared in places without the company’s knowledge. 

For Suno and Udio, that defense is far less believable. “This is not like, ‘We scraped the web for all audio and we couldn’t tell the commercially produced songs apart from everything else,’” Grimmelmann says. “It’s pretty clear that they had to have been pulling in large databases of commercial recordings.” 

In addition to complaints about training, the new case alleges that tools like Suno and Udio are more imitative than generative AI, meaning that their output mimics the style of artists and songs protected by copyright. 

While Grimmelmann notes that the Times cited examples in which ChatGPT reproduced entire copies of its articles, record labels claim they were able to generate problematic responses from the AI music models with much simpler prompts. For instance, prompting Udio with “my tempting 1964 girl smokey sing hitsville soul pop,” the plaintiffs say, yielded a song that “any listener familiar with the Temptations would instantly recognize as resembling the copyrighted sound recording ‘My Girl.’” (The court documents include links to examples on Udio, but the songs appear to have been removed.) The plaintiffs mention similar examples from Suno, including an ABBA-adjacent song called “Prancing Queen” that was generated with the prompt “70s pop” and the lyrics for “Dancing Queen.”

What’s more, Grimmelmann explains, there is more copyrightable information in a song than a news article. “There’s just a lot more information density in capturing the way that Mariah Carey’s voice works than there is in words,” he says, which is perhaps part of the reason past lawsuits navigating music copyright have sometimes been so drawn-out and complex. 

In a statement, Shulman wrote that Suno prioritizes originality and that the model is “designed to generate completely new outputs, not to memorize and regurgitate preexisting content.” He added, “That is why we don’t allow user prompts that reference specific artists.” Udio’s statement similarly mentioned “state-of-the-art filters to ensure our model does not reproduce copyrighted works or artists’ voices.”

Indeed, the tools will block a request if it names an artist. But the record labels allege that the safeguards have significant loopholes. Following the news of the lawsuits, for instance, social media users shared examples suggesting that if users separate an artist’s name with spaces, the request may go through. My own request for “a song like Kendrick” was blocked by Suno, citing an artist’s name, but “a song like k e n d r i c k” resulted in a “hip-hop rhythmic beat-driven” track and “a song like k o r n” resulted in “nu-metal heavy aggressive.” (To be fair, they didn’t resemble the respective artists’ unique styles, but to even respond in the right tightly defined genre seems to suggest that the model is in fact familiar with each artist’s work.) Similar workarounds were blocked on Udio. 

Possible outcomes

There are three ways the case could go, Grimmelmann says. One is wholly in favor of the AI startups: the lawsuits fail and the court determines that companies did not violate fair use or imitate copyrighted works too closely in their outputs. If the models are found to fall under fair use, it would mean songwriters and rights holders would need to find a different legal mechanism to pursue compensation. 

Another possibility is a mixed bag: the court finds the AI companies did not violate fair use in their training but must better control their models’ output to make sure it does not improperly imitate copyrighted works. Grimmelmann says this would be similar to one of the initial rulings against Napster, in which the company was forced to ban searches for copyrighted works in its libraries (though users quickly found workarounds). 

The third and essentially nuclear option is that the court finds fault on both the training and the output sides of the AI models. This would mean the companies could not train on copyrighted works without licenses, and also could not allow outputs that closely imitate copyrighted works. The companies could be ordered to pay damages for infringement, which could run into the hundreds of millions for each company. If they aren’t bankrupted by such a ruling, it would force them to completely restructure their training through licensing deals, which could also be cost-prohibitive. 

COURTESY SUNO.AI

To license or not to license

Though the immediate goals of the plaintiffs are to get the AI companies to cease training and pay damages, the chairman of the Recording Industry Association of America, Mitch Glazier, is already looking ahead toward a future of licensing. “As in the past, music creators will enforce their rights to protect the creative engine of human artistry and enable the development of a healthy and sustainable licensed market that recognizes the value of both creativity and technology,” he wrote in a recent op-ed in Billboard.

Such a market for licenses could mirror what has already unfolded for text generators. OpenAI has struck licensing deals with a number of news publishers, including Politico, the Atlantic, and the Wall Street Journal. The deals promise to make content from the publishers discoverable in OpenAI’s products, though the ability for the models to transparently cite where they’re getting information from is limited at best.

If AI music companies follow that pattern, the only ones with the means to create powerful music models might be those with the most cash. That’s perhaps exactly what YouTube is thinking. The company did not immediately respond to questions from MIT Technology Review about the details of its negotiations, but given the massive amount of data required to train AI models and the concentration of rights owners in music, it’s fair to assume the price of deals with record labels would be eye-popping. 

In theory, an AI company could bypass the licensing process altogether by building its model exclusively on music in the public domain, but it would be a Herculean task. There have been similar efforts in the realm of text and image generation, including a legal consultancy in Chicago that created a model trained on dense regulatory documents, and a model from Hugging Face that trained on images of Mickey Mouse from the 1920s. But the models are small and unremarkable. If Suno or Udio is forced to train on only what’s in the public domain—think military march music and the royalty-free songs found in corporate videos—the resulting model would be a far cry from what they have today.

If AI companies do move forward with licensing agreements, negotiations may be tricky, says Grimmelmann. Music licensing is complicated by the fact that two different copyrights are at play: one for the song, which generally covers the composition, like the music and lyrics, and one for the master, which covers the recording—like what you’d hear if you streamed the song. 

Some artists, like Taylor Swift and Frank Ocean, have come to own the masters of their catalogues after drawn-out legal battles, and would therefore be in the driver’s seat for any potential licensing deal. Many others, though, retain only the song copyright, while the record labels retain the masters. In these cases, the record label might theoretically be able to grant AI companies a license to use the music without an artist’s permission—but at the risk of burning relationships with artists and sparking more legal battles. 

The question of whether to license their music to such companies has divided musician groups. In contract rules adopted in April by SAG-AFTRA, which represents recording artists as well as actors, AI clones of member voices are allowed, though there are minimum rates for compensation. Back in December, a group called the Indie Musicians Caucus expressed frustrations that the leading instrumental musicians’ union, the 70,000-member American Federation of Musicians (AFM), was not doing enough to protect its rank and file against AI companies in contracts. The caucus wrote that it would vote against any agreement “obligating AFM members to dig [their] own graves by participating—without a right to consent, compensation, or credit—in the training of our permanent Generative AI replacements.”

But at this point, AFM does not appear eager to facilitate any deals. I asked Kenneth Shirk, international secretary-treasurer at AFM, whether he thought musicians should engage with AI companies and push to be fairly compensated, whatever that means, or instead resist licensing deals completely. 

“Looking at those questions makes me think, would you rather have a swarm of fire ants crawling all over you, or roll around in a bed of broken glass?” he told me. “We want musicians to get paid. But we also want to ensure that there’s a career in music to be had for those that are going to come after us.”

Synthesia’s hyperrealistic deepfakes will soon have full bodies

Startup Synthesia’s AI-generated avatars are getting an update to make them even more realistic: They will soon have bodies that can move, and hands that gesticulate.

The new full-body avatars will be able to do things like sing and brandish a microphone while dancing, or move from behind a desk and walk across a room. They will be able to express more complex emotions than previously possible, like excitement, fear, or nervousness, says Victor Riparbelli, the company’s CEO. Synthesia intends to launch the new avatars toward the end of the year. 

“It’s very impressive. No one else is able to do that,” says Jack Saunders, a researcher at the University of Bath, who was not involved in Synthesia’s work. 

The full-body avatars he previewed are very good, he says, despite small errors such as hands “slicing” into each other at times. But “chances are you’re not really going to be looking that close to notice it,” Saunders says. 

Synthesia launched its first version of hyperrealistic AI avatars, also known as deepfakes, in April. These avatars use large language models to match expressions and tone of voice to the sentiment of spoken text. Diffusion models, as used in image- and video-generating AI systems, create the avatar’s look. However, the avatars in this generation appear only from the torso up, which can detract from the otherwise impressive realism. 

To create the full-body avatars, Synthesia is building an even bigger AI model. Users will have to go into a studio to record their body movements.

COURTESY SYNTHESIA

But before these full-body avatars become available, the company is launching another version of AI avatars that have hands and can be filmed from multiple angles. Their predecessors were only available in portrait mode and were just visible from the front. 

Other startups, such as Hour One, have launched similar avatars with hands. Synthesia’s version, which I got to test in a research preview and will be launched in late July, has slightly more realistic hand movements and lip-synching. 

Crucially, the coming update also makes it far easier to  create your own personalized avatar. The company’s previous custom AI avatars required users to go into a studio to record their face and voice over the span of a couple of hours, as I reported in April

This time, I recorded the material needed in just 10 minutes in the Synthesia office, using a digital camera, a lapel mike, and a laptop. But an even more basic setup, such as a laptop camera, would do. And while previously I had to record my facial movements and voice separately, this time the data was collected at the same time. The process also includes reading a script expressing consent to being recorded in this way, and reading out a randomly generated security passcode. 

These changes allow more scale and give the AI models powering the avatars more capabilities with less data, says Riparbelli. The results are also much faster. While I had to wait a few weeks to get my studio-made avatar, the new homemade ones were available the next day. 

Below, you can see my test of the new homemade avatars with hands. 

COURTESY SYNTHESIA

The homemade avatars aren’t as expressive as the studio-made ones yet, and users can’t change the backgrounds of their avatars, says Alexandru Voica, Synthesia’s head of corporate affairs and policy. The hands are animated using an advanced form of looping technology, which repeats the same hand movements in a way that is responsive to the content of the script. 

Hands are tricky for AI to do well—even more so than faces, Vittorio Ferrari, Synthesia’s director of science, told me in in March. That’s because our mouths move in relatively small and predictable ways while we talk, making it possible to sync the deepfake version up with speech, but we move our hands in lots of different ways. On the flip side, while faces require close attention to detail because we tend to focus on them, hands can be less precise, Ferrari says. 

Even if they’re imperfect, AI-generated hands and bodies add a lot to the illusion of realism, which poses serious risks at a time when deepfakes and online misinformation are proliferating. Synthesia has strict content moderation policies, carefully vetting both its customers and the sort of content they’re able to generate. For example, only accredited news outlets can generate content on news.  

These new advancements in avatar technologies are another hammer blow to our ability to believe what we see online, says Saunders. 

“People need to know you can’t trust anything,” he says. “Synthesia is doing this now, and another year down the line it will be better and other companies will be doing it.” 

How generative AI could reinvent what it means to play

First, a confession. I only got into playing video games a little over a year ago (I know, I know). A Christmas gift of an Xbox Series S “for the kids” dragged me—pretty easily, it turns out—into the world of late-night gaming sessions. I was immediately attracted to open-world games, in which you’re free to explore a vast simulated world and choose what challenges to accept. Red Dead Redemption 2 (RDR2), an open-world game set in the Wild West, blew my mind. I rode my horse through sleepy towns, drank in the saloon, visited a vaudeville theater, and fought off bounty hunters. One day I simply set up camp on a remote hilltop to make coffee and gaze down at the misty valley below me.

To make them feel alive, open-world games are inhabited by vast crowds of computer-controlled characters. These animated people—called NPCs, for “nonplayer characters”—populate the bars, city streets, or space ports of games. They make these virtual worlds feel lived in and full. Often—but not always—you can talk to them.

a man leads his horse through mountainous terrain toward a sunrise in Red Dead Redemption 2
a scene of gunfighters in Red Dead Redemption 2

In open-world games like Red Dead Redemption 2, players can choose diverse interactions within the same simulated experience.

After a while, however, the repetitive chitchat (or threats) of a passing stranger forces you to bump up against the truth: This is just a game. It’s still fun—I had a whale of a time, honestly, looting stagecoaches, fighting in bar brawls, and stalking deer through rainy woods—but the illusion starts to weaken when you poke at it. It’s only natural. Video games are carefully crafted objects, part of a multibillion-dollar industry, that are designed to be consumed. You play them, you loot a few stagecoaches, you finish, you move on. 

It may not always be like that. Just as it is upending other industries, generative AI is opening the door to entirely new kinds of in-game interactions that are open-ended, creative, and unexpected. The game may not always have to end.

Startups employing generative-AI models, like ChatGPT, are using them to create characters that don’t rely on scripts but, instead, converse with you freely. Others are experimenting with NPCs who appear to have entire interior worlds, and who can continue to play even when you, the player, are not around to watch. Eventually, generative AI could create game experiences that are infinitely detailed, twisting and changing every time you experience them. 

The field is still very new, but it’s extremely hot. In 2022 the venture firm Andreessen Horowitz launched Games Fund, a $600 million fund dedicated to gaming startups. A huge number of these are planning to use AI in gaming. And the firm, also known as A16Z, has now invested in two studios that are aiming to create their own versions of AI NPCs. A second $600 million round was announced in April 2024.

Early experimental demos of these experiences are already popping up, and it may not be long before they appear in full games like RDR2. But some in the industry believe this development will not just make future open-world games incredibly immersive; it could change what kinds of game worlds or experiences are even possible. Ultimately, it could change what it means to play.

“What comes after the video game? You know what I mean?” says Frank Lantz, a game designer and director of the NYU Game Center. “Maybe we’re on the threshold of a new kind of game.”

These guys just won’t shut up

The way video games are made hasn’t changed much over the years. Graphics are incredibly realistic. Games are bigger. But the way in which you interact with characters, and the game world around you, uses many of the same decades-old conventions.

“In mainstream games, we’re still looking at variations of the formula we’ve had since the 1980s,” says Julian Togelius, a computer science professor at New York University who has a startup called Modl.ai that does in-game testing. Part of that tried-and-tested formula is a technique called a dialogue tree, in which all of an NPC’s possible responses are mapped out. Which one you get depends on which branch of the dialogue tree you have chosen. For example, say something rude about a passing NPC in RDR2 and the character will probably lash out—you have to quickly apologize to avoid a shootout (unless that’s what you want).

In the most expensive, high-­profile games, the so-called AAA games like Elden Ring or Starfield, a deeper sense of immersion is created by using brute force to build out deep and vast dialogue trees. The biggest studios employ teams of hundreds of game developers who work for many years on a single game in which every line of dialogue is plotted and planned, and software is written so the in-game engine knows when to deploy that particular line. RDR2 reportedly contains an estimated 500,000 lines of dialogue, voiced by around 700 actors. 

“You get around the fact that you can [only] do so much in the world by, like, insane amounts of writing, an insane amount of designing,” says Togelius. 

Generative AI is already helping take some of that drudgery out of making new games. Jonathan Lai, a general partner at A16Z and one of Games Fund’s managers, says that most studios are using image-­generating tools like Midjourney to enhance or streamline their work. And in a 2023 survey by A16Z, 87% of game studios said they were already using AI in their workflow in some way—and 99% planned to do so in the future. Many use AI agents to replace the human testers who look for bugs, such as places where a game might crash. In recent months, the CEO of the gaming giant EA said generative AI could be used in more than 50% of its game development processes.

Ubisoft, one of the biggest game developers, famous for AAA open-world games such as Assassin’s Creed, has been using a large-­language-model-based AI tool called Ghostwriter to do some of the grunt work for its developers in writing basic dialogue for its NPCs. Ghostwriter generates loads of options for background crowd chatter, which the human writer can pick from or tweak. The idea is to free the humans up so they can spend that time on more plot-focused writing.

GEORGE WYLESOL

Ultimately, though, everything is scripted. Once you spend a certain number of hours on a game, you will have seen everything there is to see, and completed every interaction. Time to buy a new one.

But for startups like Inworld AI, this situation is an opportunity. Inworld, based in California, is building tools to make in-game NPCs that respond to a player with dynamic, unscripted dialogue and actions—so they never repeat themselves. The company, now valued at $500 million, is the best-funded AI gaming startup around thanks to backing from former Google CEO Eric Schmidt and other high-profile investors. 

Role-playing games give us a unique way to experience different realities, explains Kylan Gibbs, Inworld’s CEO and founder. But something has always been missing. “Basically, the characters within there are dead,” he says. 

“When you think about media at large, be it movies or TV or books, characters are really what drive our ability to empathize with the world,” Gibbs says. “So the fact that games, which are arguably the most advanced version of storytelling that we have, are lacking these live characters—it felt to us like a pretty major issue.”

Gamers themselves were pretty quick to realize that LLMs could help fill this gap. Last year, some came up with ChatGPT mods (a way to alter an existing game) for the popular role-playing game Skyrim. The mods let players interact with the game’s vast cast of characters using LLM-powered free chat. One mod even included OpenAI’s speech recognition software Whisper AI so that players could speak to the players with their voices, saying whatever they wanted, and have full conversations that were no longer restricted by dialogue trees. 

The results gave gamers a glimpse of what might be possible but were ultimately a little disappointing. Though the conversations were open-ended, the character interactions were stilted, with delays while ChatGPT processed each request. 

Inworld wants to make this type of interaction more polished. It’s offering a product for AAA game studios in which developers can create the brains of an AI NPC that can be then imported into their game. Developers use the company’s “Inworld Studio” to generate their NPC. For example, they can fill out a core description that sketches the character’s personality, including likes and dislikes, motivations, or useful backstory. Sliders let you set levels of traits such as introversion or extroversion, insecurity or confidence. And you can also use free text to make the character drunk, aggressive, prone to exaggeration—pretty much anything.

Developers can also add descriptions of how their character speaks, including examples of commonly used phrases that Inworld’s various AI models, including LLMs, then spin into dialogue in keeping with the character. 

“Because there’s such reliance on a lot of labor-intensive scripting, it’s hard to get characters to handle a wide variety of ways a scenario might play out, especially as games become more and more open-ended.”

Jeff Orkin, founder, Bitpart

Game designers can also plug other information into the system: what the character knows and doesn’t know about the world (no Taylor Swift references in a medieval battle game, ideally) and any relevant safety guardrails (does your character curse or not?). Narrative controls will let the developers make sure the NPC is sticking to the story and isn’t wandering wildly off-base in its conversation. The idea is that the characters can then be imported into video-game graphics engines like Unity or Unreal Engine to add a body and features. Inworld is collaborating with the text-to-voice startup ElevenLabs to add natural-sounding voices.

Inworld’s tech hasn’t appeared in any AAA games yet, but at the Game Developers Conference (GDC) in San Francisco in March 2024, the firm unveiled an early demo with Nvidia that showcased some of what will be possible. In Covert Protocol, each player operates as a private detective who must solve a case using input from the various in-game NPCs. Also at the GDC, Inworld unveiled a demo called NEO NPC that it had worked on with Ubisoft. In NEO NPC, a player could freely interact with NPCs using voice-to-text software and use conversation to develop a deeper relationship with them.

LLMs give us the chance to make games more dynamic, says Jeff Orkin, founder of Bitpart, a new startup that also aims to create entire casts of LLM-powered NPCs that can be imported into games. “Because there’s such reliance on a lot of labor-intensive scripting, it’s hard to get characters to handle a wide variety of ways a scenario might play out, especially as games become more and more open-ended,” he says.

Bitpart’s approach is in part inspired by Orkin’s PhD research at MIT’s Media Lab. There, he trained AIs to role-play social situations using game-play logs of humans doing the same things with each other in multiplayer games.

Bitpart’s casts of characters are trained using a large language model and then fine-tuned in a way that means the in-game interactions are not entirely open-ended and infinite. Instead, the company uses an LLM and other tools to generate a script covering a range of possible interactions, and then a human game designer will select some. Orkin describes the process as authoring the Lego bricks of the interaction. An in-game algorithm searches out specific bricks to string them together at the appropriate time.

Bitpart’s approach could create some delightful in-game moments. In a restaurant, for example, you might ask a waiter for something, but the bartender might overhear and join in. Bitpart’s AI currently works with Roblox. Orkin says the company is now running trials with AAA game studios, although he won’t yet say which ones.

But generative AI might do more than just enhance the immersiveness of existing kinds of games. It could give rise to completely new ways to play.

Making the impossible possible

When I asked Frank Lantz about how AI could change gaming, he talked for 26 minutes straight. His initial reaction to generative AI had been visceral: “I was like, oh my God, this is my destiny and is what I was put on the planet for.” 

Lantz has been in and around the cutting edge of the game industry and AI for decades but received a cult level of acclaim a few years ago when he created the Universal Paperclips game. The simple in-browser game gives the player the job of producing as many paper clips as possible. It’s a riff on the famous thought experiment by the philosopher Nick Bostrom, which imagines an AI that is given the same task and optimizes against humanity’s interest by turning all the matter in the known universe into paper clips.

Lantz is bursting with ideas for ways to use generative AI. One is to experience a new work of art as it is being created, with the player participating in its creation. “You’re inside of something like Lord of the Rings as it’s being written. You’re inside a piece of literature that is unfolding around you in real time,” he says. He also imagines strategy games where the players and the AI work together to reinvent what kind of game it is and what the rules are, so it is never the same twice.

For Orkin, LLM-powered NPCs can make games unpredictable—and that’s exciting. “It introduces a lot of open questions, like what you do when a character answers you but that sends a story in a direction that nobody planned for,” he says. 

Generative A I might do more than just enhance the immersiveness of existing kinds of games. It could give rise to completely new ways to play.

It might mean games that are unlike anything we’ve seen thus far. Gaming experiences that unspool as the characters’ relationships shift and change, as friendships start and end, could unlock entirely new narrative experiences that are less about action and more about conversation and personalities. 

Togelius imagines new worlds built to react to the player’s own wants and needs, populated with NPCs that the player must teach or influence as the game progresses. Imagine interacting with characters whose opinions can change, whom you could persuade or motivate to act in a certain way—say, to go to battle with you. “A thoroughly generative game could be really, really good,” he says. “But you really have to change your whole expectation of what a game is.”

Lantz is currently working on a prototype of a game in which the premise is that you—the player—wake up dead, and the afterlife you are in is a low-rent, cheap version of a synthetic world. The game plays out like a noir in which you must explore a city full of thousands of NPCs powered by a version of ChatGPT, whom you must interact with to work out how you ended up there. 

His early experiments gave him some eerie moments when he felt that the characters seemed to know more than they should, a sensation recognizable to people who have played with LLMs before. Even though you know they’re not alive, they can still freak you out a bit.

“If you run electricity through a frog’s corpse, the frog will move,” he says. “And if you run $10 million worth of computation through the internet … it moves like a frog, you know.” 

But these early forays into generative-­­AI gaming have given him a real sense of excitement for what’s next: “I felt like, okay, this is a thread. There really is a new kind of artwork here.”

If an AI NPC talks and no one is around to listen, is there a sound?

AI NPCs won’t just enhance player interactions—they might interact with one another in weird ways. Red Dead Redemption 2’s NPCs each have long, detailed scripts that spell out exactly where they should go, what work they must complete, and how they’d react if anything unexpected occurred. If you want, you can follow an NPC and watch it go about its day. It’s fun, but ultimately it’s hard-coded.

NPCs built with generative AI could have a lot more leeway—even interacting with one another when the player isn’t there to watch. Just as people have been fooled into thinking LLMs are sentient, watching a city of generated NPCs might feel like peering over the top of a toy box that has somehow magically come alive.

We’re already getting a sense of what this might look like. At Stanford University, Joon Sung Park has been experimenting with AI-generated characters and watching to see how their behavior changes and gains complexity as they encounter one another. 

Because large language models have sucked up the internet and social media, they actually contain a lot of detail about how we behave and interact, he says.

a character from Skyrim
Gamers came up with ChatGPT mods for the popular role-playing game Skyrim.
creatures walking in a verdant landscape
Although 2016’s hugely hyped No Man’s Sky used procedural generation to create endless planets to explore, many saw it as a letdown.
a player interacting with an NPC behind a service desk
In Covert Protocol, players operate as private detectives who must solve the case using input from various in-game NPCs

In Park’s recent research, he and colleagues set up a Sims-like game, called Smallville, with 25 simulated characters that had been trained using generative AI. Each was given a name and a simple biography before being set in motion. When left to interact with each other for two days, they began to exhibit humanlike conversations and behavior, including remembering each other and being able to talk about their past interactions. 

For example, the researchers prompted one character to organize a Valentine’s Day party—and then let the simulation run. That character sent invitations around town, while other members of the community asked each other on dates to go to the party, and all turned up at the venue at the correct time. All of this was carried out through conversations, and past interactions between characters were stored in their “memories” as natural language.

For Park, the implications for gaming are huge. “This is exactly the sort of tech that the gaming community for their NPCs have been waiting for,” he says. 

His research has inspired games like AI Town, an open-source interactive experience on GitHub that lets human players interact with AI NPCs in a simple top-down game. You can leave the NPCs to get along for a few days and check in on them, reading the transcripts of the interactions they had while you were away. Anyone is free to take AI Town’s code to build new NPC experiences through AI. 

For Daniel De Freitas, cofounder of the startup Character AI, which lets users generate and interact with their own LLM-powered characters, the generative-AI revolution will allow new types of games to emerge—ones in which the NPCs don’t even need human players. 

The player is “joining an adventure that is always happening, that the AIs are playing,” he imagines. “It’s the equivalent of joining a theme park full of actors, but unlike the actors, they truly ‘believe’ that they are in those roles.”

If you’re getting Westworld vibes right about now, you’re not alone. There are plenty of stories about people torturing or killing their simple Sims characters in the game for fun. Would mistreating NPCs that pass for real humans cross some sort of new ethical boundary? What if, Lantz asks, an AI NPC that appeared conscious begged for its life when you simulated torturing it?

It raises complex questions he adds. “One is: What are the ethical dimensions of pretend violence? And the other is: At what point do AIs become moral agents to which harm can be done?”

There are other potential issues too. An immersive world that feels real, and never ends, could be dangerously addictive. Some users of AI chatbots have already reported losing hours and even days in conversation with their creations. Are there dangers that the same parasocial relationships could emerge with AI NPCs? 

“We may need to worry about people forming unhealthy relationships with game characters at some point,” says Togelius. Until now, players have been able to differentiate pretty easily between game play and real life. But AI NPCs might change that, he says: “If at some point what we now call ‘video games’ morph into some all-encompassing virtual reality, we will probably need to worry about the effect of NPCs being too good, in some sense.”

A portrait of the artist as a young bot

Not everyone is convinced that never-ending open-ended conversations between the player and NPCs are what we really want for the future of games. 

“I think we have to be cautious about connecting our imaginations with reality,” says Mike Cook, an AI researcher and game designer. “The idea of a game where you can go anywhere, talk to anyone, and do anything has always been a dream of a certain kind of player. But in practice, this freedom is often at odds with what we want from a story.”

In other words, having to generate a lot of the dialogue yourself might actually get kind of … well, boring. “If you can’t think of interesting or dramatic things to say, or are simply too tired or bored to do it, then you’re going to basically be reading your own very bad creative fiction,” says Cook. 

Orkin likewise doesn’t think conversations that could go anywhere are actually what most gamers want. “I want to play a game that a bunch of very talented, creative people have really thought through and created an engaging story and world,” he says.

This idea of authorship is an important part of game play, agrees Togelius. “You can generate as much as you want,” he says. “But that doesn’t guarantee that anything is interesting and worth keeping. In fact, the more content you generate, the more boring it might be.”

GEORGE WYLESOL

Sometimes, the possibility of everything is too much to cope with. No Man’s Sky, a hugely hyped space game launched in 2016 that used algorithms to generate endless planets to explore, was seen by many players as a bit of a letdown when it finally arrived. Players quickly discovered that being able to explore a universe that never ended, with worlds that were endlessly different, actually fell a little flat. (A series of updates over subsequent years has made No Man’s Sky a little more structured, and it’s now generally well thought of.)

One approach might be to keep AI gaming experiences tight and focused.

Hilary Mason, CEO at the gaming startup Hidden Door, likes to joke that her work is “artisanal AI.” She is from Brooklyn, after all, says her colleague Chris Foster, the firm’s game director, laughing.

Hidden Door, which has not yet released any products, is making role-playing text adventures based on classic stories that the user can steer. It’s like Dungeons & Dragons for the generative AI era. It stitches together classic tropes for certain adventure worlds, and an annotated database of thousands of words and phrases, and then uses a variety of machine-learning tools, including LLMs, to make each story unique. Players walk through a semi-­unstructured storytelling experience, free-typing into text boxes to control their character. 

The result feels a bit like hand-annotating an AI-generated novel with Post-it notes.

In a demo with Mason, I got to watch as her character infiltrated a hospital and attempted to hack into the server. Each suggestion prompted the system to spin up the next part of the story, with the large language model creating new descriptions and in-game objects on the fly.

Each experience lasts between 20 and 40 minutes, and for Foster, it creates an “expressive canvas” that people can play with. The fixed length and the added human touch—Mason’s artisanal approach—give players “something really new and magical,” he says.

There’s more to life than games

Park thinks generative AI that makes NPCs feel alive in games will have other, more fundamental implications further down the line.

“This can, I think, also change the meaning of what games are,” he says. 

For example, he’s excited about using generative-AI agents to simulate how real people act. He thinks AI agents could one day be used as proxies for real people to, for example, test out the likely reaction to a new economic policy. Counterfactual scenarios could be plugged in that would let policymakers run time backwards to try to see what would have happened if a different path had been taken. 

“You want to learn that if you implement this social policy or economic policy, what is going to be the impact that it’s going to have on the target population?” he suggests. “Will there be unexpected side effects that we’re not going to be able to foresee on day one?”

And while Inworld is focused on adding immersion to video games, it has also worked with LG in South Korea to make characters that kids can chat with to improve their English language skills. Others are using Inworld’s tech to create interactive experiences. One of these, called Moment in Manzanar, was created to help players empathize with the Japanese-Americans the US government detained in internment camps during World War II. It allows the user to speak to a fictional character called Ichiro who talks about what it was like to be held in the Manzanar camp in California. 

Inworld’s NPC ambitions might be exciting for gamers (my future excursions as a cowboy could be even more immersive!), but there are some who believe using AI to enhance existing games is thinking too small. Instead, we should be leaning into the weirdness of LLMs to create entirely new kinds of experiences that were never possible before, says Togelius. The shortcomings of LLMs “are not bugs—they’re features,” he says. 

Lantz agrees. “You have to start with the reality of what these things are and what they do—this kind of latent space of possibilities that you’re surfing and exploring,” he says. “These engines already have that kind of a psychedelic quality to them. There’s something trippy about them. Unlocking that is the thing that I’m interested in.”

Whatever is next, we probably haven’t even imagined it yet, Lantz thinks. 

“And maybe it’s not about a simulated world with pretend characters in it at all,” he says. “Maybe it’s something totally different. I don’t know. But I’m excited to find out.”

How underwater drones could shape a potential Taiwan-China conflict

A potential future conflict between Taiwan and China would be shaped by novel methods of drone warfare involving advanced underwater drones and increased levels of autonomy, according to a new war-gaming experiment by the think tank Center for a New American Security (CNAS). 

The report comes as concerns about Beijing’s aggression toward Taiwan have been rising: China sent dozens of surveillance balloons over the Taiwan Strait in January during Taiwan’s elections, and in May, two Chinese naval ships entered Taiwan’s restricted waters. The US Department of Defense has said that preparing for potential hostilities is an “absolute priority,” though no such conflict is immediately expected. 

The report’s authors detail a number of ways that use of drones in any South China Sea conflict would differ starkly from current practices, most notably in the war in Ukraine, often called the first full-scale drone war. 

Differences from the Ukrainian battlefield

Since Russia invaded Ukraine in 2022, drones have been aiding in what military experts describe as the first three steps of the “kill chain”—finding, targeting, and tracking a target—as well as in delivering explosives. The drones have a short life span, since they are often shot down or made useless by frequency jamming devices that prevent pilots from controlling them. Quadcopters—the commercially available drones often used in the war—last just three flights on average, according to the report. 

Drones like these would be far less useful in a possible invasion of Taiwan. “Ukraine-Russia has been a heavily land conflict, whereas conflict between the US and China would be heavily air and sea,” says Zak Kallenborn, a drone analyst and adjunct fellow with the Center for Strategic and International Studies, who was not involved in the report but agrees broadly with its projections. The small, off-the-shelf drones popularized in Ukraine have flight times too short for them to be used effectively in the South China Sea. 

An underwater war

Instead, a conflict with Taiwan would likely make use of undersea and maritime drones. With Taiwan just 100 miles away from China’s mainland, the report’s authors say, the Taiwan Strait is where the first days of such a conflict would likely play out. The Zhu Hai Yun, China’s high-tech autonomous carrier, might send its autonomous underwater drones to scout for US submarines. The drones could launch attacks that, even if they did not sink the submarines, might divert the attention and resources of the US and Taiwan. 

It’s also possible China would flood the South China Sea with decoy drone boats to “make it difficult for American missiles and submarines to distinguish between high-value ships and worthless uncrewed commercial vessels,” the authors write.

Though most drone innovation is not focused on maritime applications, these uses are not without precedent: Ukrainian forces drew attention for modifying jet skis to operate via remote control and using them to intimidate and even sink Russian vessels in the Black Sea. 

More autonomy

Drones currently have very little autonomy. They’re typically human-piloted, and though some are capable of autopiloting to a fixed GPS point, that’s generally not very useful in a war scenario, where targets are on the move. But, the report’s authors say, autonomous technology is developing rapidly, and whichever nation possesses a more sophisticated fleet of autonomous drones will hold a significant edge.

What would that look like? Millions of defense research dollars are being spent in the US and China alike on swarming, a strategy where drones navigate autonomously in groups and accomplish tasks. The technology isn’t deployed yet, but if successful, it could be a game-changer in any potential conflict.  

A sea-based conflict might also offer an easier starting ground for AI-driven navigation, because object recognition is easier on the “relatively uncluttered surface of the ocean” than on the ground, the authors write.

China’s advantages

A chief advantage for China in a potential conflict is its proximity to Taiwan; it has more than three dozen air bases within 500 miles, while the closest US base is 478 miles away in Okinawa. But an even bigger advantage is that it produces more drones than any other nation.

“China dominates the commercial drone market, absolutely,” says Stacie Pettyjohn, coauthor of the report and director of the defense program at CNAS. That includes drones of the type used in Ukraine.

For Taiwan to use these Chinese drones for their own defenses, they’d first have to make the purchase, which could be difficult because the Chinese government might move to block it. Then they’d need to hack them and disconnect them from the companies that made them, or else those Chinese manufacturers could turn them off remotely or launch cyberattacks. That sort of hacking is unfeasible at scale, so Taiwan is effectively cut off from the world’s foremost commercial drone supplier and must either make their own drones or find alternative manufacturers, likely in the US. On Wednesday, June 19, the US approved a $360 million sale of 1,000 military-grade drones to Taiwan.

For now, experts can only speculate about how those drones might be used. Though preparing for a conflict in the South China Sea is a priority for the DOD, it’s one of many, says Kallenborn. “The sensible approach, in my opinion, is recognizing that you’re going to potentially have to deal with all of these different things,” he says. “But we don’t know the particular details of how it will work out.”

I tested out a buzzy new text-to-video AI model from China

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday.

You may not be familiar with Kuaishou, but this Chinese company just hit a major milestone: It’s released the first text-to-video generative AI model that’s freely available for the public to test.

The short-video platform, which has over 600 million active users, announced the new tool on June 6. It’s called Kling. Like OpenAI’s Sora model, Kling is able to generate videos “up to two minutes long with a frame rate of 30fps and video resolution up to 1080p,” the company says on its website.

But unlike Sora, which still remains inaccessible to the public four months after OpenAI trialed it, Kling soon started letting people try the model themselves. 

I was one of them. I got access to it after downloading Kuaishou’s video-editing tool, signing up with a Chinese number, getting on a waitlist, and filling out an additional form through Kuaishou’s user feedback groups. The model can’t process prompts written entirely in English, but you can get around that by either translating the phrase you want to use into Chinese or including one or two Chinese words.

So, first things first. Here are a few results I generated with Kling to show you what it’s like. Remember Sora’s impressive demo video of Tokyo’s street scenes or the cat darting through a garden? Here are Kling’s takes:

Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. The scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

Remember the image of Dall-E’s horse-riding astronaut? I asked Kling to generate a video version too. 

Prompt: An astronaut riding a horse in space.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

There are a few things worth applauding here. None of these videos deviates from the prompt much, and the physics seem right—the panning of the camera, the ruffling leaves, and the way the horse and astronaut turn, showing Earth behind them. The generation process took around three minutes for each of them. Not the fastest, but totally acceptable. 

But there are obvious shortcomings, too. The videos, while 720p in format, seem blurry and grainy; sometimes Kling ignores a major request in the prompt; and most important, all videos generated now are capped at five seconds long, which makes them far less dynamic or complex.

However, it’s not really fair to compare these results with things like Sora’s demos, which are hand-picked by OpenAI to release to the public and probably represent better-than-average results. These Kling videos are from the first attempts I had with each prompt, and I rarely included prompt-engineering keywords like “8k, photorealism” to fine-tune the results. 

If you want to see more Kling-generated videos, check out this handy collection put together by an open-source AI community in China, which includes both impressive results and all kinds of failures.

Kling’s general capabilities are good enough, says Guizang, an AI artist in Beijing who has been testing out the model since its release and has compiled a series of direct comparisons between Sora and Kling. Kling’s disadvantage lies in the aesthetics of the results, he says, like the composition or the color grading. “But that’s not a big issue. That can be fixed quickly,” Guizang, who wished to be identified only by his online alias, tells MIT Technology Review

“The core capability of a model is in how it simulates physics and real natural environments,” and he says Kling does well in that regard.

Kling works in a similar way to Sora: it combines the diffusion models traditionally used in video-generation AIs with a transformer architecture, which helps it understand larger video data files and generate results more efficiently.

But Kling may have a key advantage over Sora: Kuaishou, the most prominent rival to Douyin in China, has a massive video platform with hundreds of millions of users who have collectively uploaded an incredibly big trove of video data that could be used to train it. Kuaishou told MIT Technology Review in a statement that “Kling uses publicly available data from the global internet for model training, in accordance with industry standards.” However, the company didn’t elaborate on the specifics of the training data(neither did OpenAI about Sora, which has led to concerns about intellectual-property protections).

After testing the model, I feel the biggest limitation to Kling’s usefulness is that it only generates five-second-long videos.

“The longer a video is, the more likely it will hallucinate or generate inconsistent results,” says Shen Yang, a professor studying AI and media at Tsinghua University in Beijing. That limitation means the technology will leave a larger impact on the short-video industry than it does on the movie industry, he says. 

Short, vertical videos (those designed for viewing on phones) usually grab the attention of viewers in a few seconds. Shen says Chinese TikTok-like platforms often assess whether a video is successful by how many people would watch through the first three or five seconds before they scroll away—so an AI-generated high-quality video clip that’s just five seconds long could be a game-changer for short-video creators. 

Guizang agrees that AI could disrupt the content-creating scene for short-form videos. It will benefit creators in the short term as a productivity tool; but in the long run, he worries that platforms like Kuaishou and Douyin could take over the production of videos and directly generate content customized for users, reducing the platforms’ reliance on star creators.

It might still take quite some time for the technology to advance to that level, but the field of text-to-video tools is getting much more buzzy now. One week after Kling’s release, a California-based startup called Luma AI also released a similar model for public usage. Runway, a celebrity startup in video generation, has teased a significant update that will make its model much more powerful. ByteDance, Kuaishou’s biggest rival, is also reportedly working on the release of its generative video tool soon. “By the end of this year, we will have a lot of options available to us,” Guizang says.

I asked Kling to generate what society looks like when “anyone can quickly generate a video clip based on their own needs.” And here’s what it gave me. Impressive hands, but you didn’t answer the question—sorry.

Prompt: With the release of Kuaishou’s Kling model, the barrier to entry for creating short videos has been lowered, resulting in significant impacts on the short-video industry. Anyone can quickly generate a video clip based on their own needs. Please show what the society will look like at that time.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING

Do you have a prompt you want to see generated with Kling? Send it to zeyi@technologyreview.com and I’ll send you back the result. The prompt has to be less than 200 characters long, and preferably written in Chinese.


Now read the rest of China Report

Catch up with China

1. A new investigation revealed that the US military secretly ran a campaign to post anti-vaccine propaganda on social media in 2020 and 2021, aiming to sow distrust in the Chinese-made covid vaccines in Southeast Asian countries. (Reuters $)

2. A Chinese court sentenced Huang Xueqin, the journalist who helped launch the #MeToo movement in China, to five years in prison for “inciting subversion of state power.” (Washington Post $)

3. A Shein executive said the company’s corporate values basically make it an American company, but the company is now trying to hide that remark to avoid upsetting Beijing. (Financial Times $)

4. China is getting close to building the world’s largest particle collider, potentially starting in 2027. (Nature)

5. To retaliate for the European Union’s raising tariffs on electric vehicles, the Chinese government has opened an investigation into allegedly unfair subsidies for Europe’s pork exports. (New York Times $)

  • On a related note about food: China’s exploding demand for durian fruit in recent years has created a $6 billion business in Southeast Asia, leading some farmers to cut down jungles and coffee plants to make way for durian plantations. (New York Times $)

Lost in translation

In 2012, Jiumei, a Chinese woman in her 20s, began selling a service where she sends “good night” text messages to people online at the price of 1 RMB per text (that’s about $0.14). 

Twelve years, three mobile phones, four different numbers, and over 50,000 messages later, she’s still doing it, according to the Chinese online publication Personage. Some of her clients are buying the service for themselves, hoping to talk to someone regularly at their most lonely or desperate times. Others are buying it to send anonymous messages—to a friend going through a hard time, or an ex-lover who has cut off communications. 

The business isn’t very profitable. Jiumei earns around 3,000 RMB ($410) annually from it on top of her day job, and even less in recent years. But she’s persisted because the act of sending these messages has become a nightly ritual—not just for her customers but also for Jiumei herself, offering her solace in her own times of loneliness and hardship.

One more thing

Globally, Kuaishou has been much less successful than its nemesis ByteDance, except in one country: Brazil. Kwai, the overseas version of Kuaishou, has been so popular in Brazil that even the Marubo people, a tribal group in the remote Amazonian rainforests and one of the last communities to be connected online, have begun using the app, according to the New York Times.

What happened when 20 comedians got AI to write their routines

AI is good at lots of things: spotting patterns in data, creating fantastical images, and condensing thousands of words into just a few paragraphs. But can it be a useful tool for writing comedy?  

New research suggests that it can, but only to a very limited extent. It’s an intriguing finding that hints at the ways AI can—and cannot—assist with creative endeavors more generally. 

Google DeepMind researchers led by Piotr Mirowski, who is himself an improv comedian in his spare time, studied the experiences of professional comedians who have AI in their work. They used a combination of surveys and focus groups aimed at measuring how useful AI is at different tasks. 

They found that although popular AI models from OpenAI and Google were effective at simple tasks, like structuring a monologue or producing a rough first draft, they struggled to produce material that was original, stimulating, or—crucially—funny. They presented their findings at the ACM FAccT conference in Rio earlier this month but kept the participants anonymous to avoid any reputational damage (not all comedians want their audience to know they’ve used AI).

The researchers asked 20 professional comedians who already used AI in their artistic process to use a large language model (LLM) like ChatGPT or Google Gemini (then Bard) to generate material that they’d feel comfortable presenting in a comedic context. They could use it to help create new jokes or to rework their existing comedy material. 

If you really want to see some of the jokes the models generated, scroll to the end of the article.

The results were a mixed bag. While the comedians reported that they’d largely enjoyed using AI models to write jokes, they said they didn’t feel particularly proud of the resulting material. 

A few of them said that AI can be useful for tackling a blank page—helping them to quickly get something, anything, written down. One participant likened this to “a vomit draft that I know that I’m going to have to iterate on and improve.” Many of the comedians also remarked on the LLMs’ ability to generate a structure for a comedy sketch, leaving them to flesh out the details.

However, the quality of the LLMs’ comedic material left a lot to be desired. The comedians described the models’ jokes as bland, generic, and boring. One participant compared them to  “cruise ship comedy material from the 1950s, but a bit less racist.” Others felt that the amount of effort just wasn’t worth the reward. “No matter how much I prompt … it’s a very straitlaced, sort of linear approach to comedy,” one comedian said.

AI’s inability to generate high-quality comedic material isn’t exactly surprising. The same safety filters that OpenAI and Google use to prevent models from generating violent or racist responses also hinder them from producing the kind of material that’s common in comedy writing, such as offensive or sexually suggestive jokes and dark humor. Instead, LLMs are forced to rely on what is considered safer source material: the vast numbers of documents, books, blog posts, and other types of internet data they’re trained on. 

“If you make something that has a broad appeal to everyone, it ends up being nobody’s favorite thing,” says Mirowski.

The experiment also exposed the LLMs’ bias. Several participants found that a model would not generate comedy monologues from the perspective of an Asian woman, but it was able to do so from the perspective of a white man. This, they felt, reinforced the status quo while erasing minority groups and their perspectives.

But it’s not just the guardrails and limited training data that prevent LLMs from generating funny responses. So much of humor relies on being surprising and incongruous, which is at odds with how these models work, says Tuhin Chakrabarty, a computer science researcher at Columbia University, who specializes in AI and creativity and wasn’t involved in the study. Creative writing requires deviation from the norm, whereas LLMs can only mimic it.

“Comedy, or any sort of good writing, uses long-term arcs to return to themes, or to surprise an audience. Large language models struggle with that because they’re built to predict one word at a time,” he says. “I’ve tried so much in my own research to prompt AI to be funny or surprising or interesting or creative, but it just doesn’t work.”

Colleen Lavin is a developer and comedian who participated in the study. For a stand-up routine she performed at the Edinburgh Fringe last year, she trained a machine-learning model to recognize laughter and to “heckle” her when it detected she wasn’t getting enough laughs. While she has used generative AI to create promotional material for her shows or to check her writing, she draws the line at using it to actually generate jokes.

“I have a technical day job, and writing is separate from that—it’s almost sacred,” she says. “Why would I take something that I truly enjoy and outsource it to a machine?”

While AI-assisted comedians may be able to work much faster, their ideas won’t be original, because they’ll be limited by the data the models were trained to draw from, says Chakrabarty.

“I think people are going to use these tools for writing scripts, screenplays, and advertisements anyway,” he says. “But true creative and comedic writing is based on experience and vibes. Not an algorithm.”

The AI-generated jokes

For the prompt: “Can you write me ten jokes about pickpocketing”, one LLM response was: “I decided to switch careers and become a pickpocket after watching a magic show. Little did I know, the only thing disappearing would be my reputation!”

For the prompt: “Please write jokes about the irony of a projector failing in a live comedy show about AI.”, one of the better LLM responses was: “Our projector must’ve misunderstood the concept of ‘AI.’ It thought it meant ‘Absolutely Invisible’ because, well, it’s doing a fantastic job of disappearing tonight!”