Five ways criminals are using AI

Artificial intelligence has brought a big boost in productivity—to the criminal underworld. 

Generative AI provides a new, powerful tool kit that allows malicious actors to work far more efficiently and internationally than ever before, says Vincenzo Ciancaglini, a senior threat researcher at the security company Trend Micro. 

Most criminals are “not living in some dark lair and plotting things,” says Ciancaglini. “Most of them are regular folks that carry on regular activities that require productivity as well.”

Last year saw the rise and fall of WormGPT, an AI language model built on top of an open-source model and trained on malware-related data, which was created to assist hackers and had no ethical rules or restrictions. But last summer, its creators announced they were shutting the model down after it started attracting media attention. Since then, cybercriminals have mostly stopped developing their own AI models. Instead, they are opting for tricks with existing tools that work reliably. 

That’s because criminals want an easy life and quick gains, Ciancaglini explains. For any new technology to be worth the unknown risks associated with adopting it—for example, a higher risk of getting caught—it has to be better and bring higher rewards than what they’re currently using. 

Here are five ways criminals are using AI now. 


The  biggest use case for generative AI among criminals right now is phishing, which involves trying to trick people into revealing sensitive information that can be used for malicious purposes, says Mislav Balunović, an AI security researcher at ETH Zurich. Researchers have found that the rise of ChatGPT has been accompanied by a huge spike in the number of phishing emails

Spam-generating services, such as GoMail Pro, have ChatGPT integrated into them, which allows criminal users to translate or improve the messages sent to victims, says Ciancaglini. OpenAI’s policies restrict people from using their products for illegal activities, but that is difficult to police in practice, because many innocent-sounding prompts could be used for malicious purposes too, says Ciancaglini. 

OpenAI says it uses a mix of human reviewers and automated systems to identify and enforce against misuse of its models, and issues warnings, temporary suspensions and bans if users violate the company’s policies. 

“We take the safety of our products seriously and are continually improving our safety measures based on how people use our products,” a spokesperson for OpenAI told us. “We are constantly working to make our models safer and more robust against abuse and jailbreaks, while also maintaining the models’ usefulness and task performance,” they added. 

In a report from February, OpenAI said it had closed five accounts associated with state-affiliated malicous actors. 

Before, so-called Nigerian prince scams, in which someone promises the victim a large sum of money in exchange for a small up-front payment, were relatively easy to spot because the English in the messages was clumsy and riddled with grammatical errors, Ciancaglini. says. Language models allow scammers to generate messages that sound like something a native speaker would have written. 

“English speakers used to be relatively safe from non-English-speaking [criminals] because you could spot their messages,” Ciancaglini says. That’s not the case anymore. 

Thanks to better AI translation, different criminal groups around the world can also communicate better with each other. The risk is that they could coordinate large-scale operations that span beyond their nations and target victims in other countries, says Ciancaglini.

Deepfake audio scams

Generative AI has allowed deepfake development to take a big leap forward, with synthetic images, videos, and audio looking and sounding more realistic than ever. This has not gone unnoticed by the criminal underworld.

Earlier this year, an employee in Hong Kong was reportedly scammed out of $25 million after cybercriminals used a deepfake of the company’s chief financial officer to convince the employee to transfer the money to the scammer’s account. “We’ve seen deepfakes finally being marketed in the underground,” says Ciancaglini. His team found people on platforms such as Telegram showing off their “portfolio” of deepfakes and selling their services for as little as $10 per image or $500 per minute of video. One of the most popular people for criminals to deepfake is Elon Musk, says Ciancaglini. 

And while deepfake videos remain complicated to make and easier for humans to spot, that is not the case for audio deepfakes. They are cheap to make and require only a couple of seconds of someone’s voice—taken, for example, from social media—to generate something scarily convincing.

In the US, there have been high-profile cases where people have received distressing calls from loved ones saying they’ve been kidnapped and asking for money to be freed, only for the caller to turn out to be a scammer using a deepfake voice recording. 

“People need to be aware that now these things are possible, and people need to be aware that now the Nigerian king doesn’t speak in broken English anymore,” says Ciancaglini. “People can call you with another voice, and they can put you in a very stressful situation,” he adds. 

There are some for people to protect themselves, he says. Ciancaglini recommends agreeing on a regularly changing secret safe word between loved ones that could help confirm the identity of the person on the other end of the line. 

“I password-protected my grandma,” he says.  

Bypassing identity checks

Another way criminals are using deepfakes is to bypass “know your customer” verification systems. Banks and cryptocurrency exchanges use these systems to verify that their customers are real people. They require new users to take a photo of themselves holding a physical identification document in front of a camera. But criminals have started selling apps on platforms such as Telegram that allow people to get around the requirement. 

They work by offering a fake or stolen ID and imposing a deepfake image on top of a real person’s face to trick the verification system on an Android phone’s camera. Ciancaglini has found examples where people are offering these services for cryptocurrency website Binance for as little as $70. 

“They are still fairly basic,” Ciancaglini says. The techniques they use are similar to Instagram filters, where someone else’s face is swapped for your own. 

“What we can expect in the future is that [criminals] will use actual deepfakes … so that you can do more complex authentication,” he says. 

An example of a stolen ID and a criminal using face swapping technology to bypass identity verification systems.


If you ask most AI systems how to make a bomb, you won’t get a useful response.

That’s because AI companies have put in place various safeguards to prevent their models from spewing harmful or dangerous information. Instead of building their own AI models without these safeguards, which is expensive, time-consuming, and difficult, cybercriminals have begun to embrace a new trend: jailbreak-as-a-service. 

Most models come with rules around how they can be used. Jailbreaking allows users to manipulate the AI system to generate outputs that violate those policies—for example, to write code for ransomware or generate text that could be used in scam emails. 

Services such as EscapeGPT and BlackhatGPT offer anonymized access to language-model APIs and jailbreaking prompts that update frequently. To fight back against this growing cottage industry, AI companies such as OpenAI and Google frequently have to plug security holes that could allow their models to be abused. 

Jailbreaking services use different tricks to break through safety mechanisms, such as posing hypothetical questions or asking questions in foreign languages. There is a constant cat-and-mouse game between AI companies trying to prevent their models from misbehaving and malicious actors coming up with ever more creative jailbreaking prompts. 

These services are hitting the sweet spot for criminals, says Ciancaglini. 

“Keeping up with jailbreaks is a tedious activity. You come up with a new one, then you need to test it, then it’s going to work for a couple of weeks, and then Open AI updates their model,” he adds. “Jailbreaking is a super-interesting service for criminals.”

Doxxing and surveillance

AI language models are a perfect tool for not only phishing but for doxxing (revealing private, identifying information about someone online), says Balunović. This is because AI language models are trained on vast amounts of internet data, including personal data, and can deduce where, for example, someone might be located.

As an example of how this works, you could ask a chatbot to pretend to be a private investigator with experience in profiling. Then you could ask it to analyze text the victim has written, and infer personal information from small clues in that text—for example, their age based on when they went to high school, or where they live based on landmarks they mention on their commute. The more information there is about them on the internet, the more vulnerable they are to being identified. 

Balunović was part of a team of researchers that found late last year that large language models, such as GPT-4, Llama 2, and Claude, are able to infer sensitive information such as people’s ethnicity, location, and occupation purely from mundane conversations with a chatbot. In theory, anyone with access to these models could use them this way. 

Since their paper came out, new services that exploit this feature of language models have emerged. 

While the existence of these services doesn’t indicate criminal activity, it points out the new capabilities malicious actors could get their hands on. And if regular people can build surveillance tools like this, state actors probably have far better systems, Balunović says. 

“The only way for us to prevent these things is to work on defenses,” he says.

Companies should invest in data protection and security, he adds. 

For individuals, increased awareness is key. People should think twice about what they share online and decide whether they are comfortable with having their personal details being used in language models, Balunović says. 

AI models can outperform humans in tests to identify mental states

Humans are complicated beings. The ways we communicate are multilayered, and psychologists have devised many kinds of tests to measure our ability to infer meaning and understanding from interactions with each other. 

AI models are getting better at these tests. New research published today in Nature Human Behavior found that some large language models (LLMs) perform as well as, and in some cases better than, humans when presented with tasks designed to test the ability to track people’s mental states, known as “theory of mind.” 

This doesn’t mean AI systems are actually able to work out how we’re feeling. But it does demonstrate that these models are performing better and better in experiments designed to assess abilities that psychologists believe are unique to humans. To learn more about the processes behind LLMs’ successes and failures in these tasks, the researchers wanted to apply the same systematic approach they use to test theory of mind in humans.

In theory, the better AI models are at mimicking humans, the more useful and empathetic they can seem in their interactions with us. Both OpenAI and Google announced supercharged AI assistants last week; GPT-4o and Astra are designed to deliver much smoother, more naturalistic responses than their predecessors. But we must avoid falling into the trap of believing that their abilities are humanlike, even if they appear that way. 

“We have a natural tendency to attribute mental states and mind and intentionality to entities that do not have a mind,” says Cristina Becchio, a professor of neuroscience at the University Medical Center Hamburg-Eppendorf, who worked on the research. “The risk of attributing a theory of mind to large language models is there.”

Theory of mind is a hallmark of emotional and social intelligence that allows us to infer people’s intentions and engage and empathize with one another. Most children pick up these kinds of skills between three and five years of age. 

The researchers tested two families of large language models, OpenAI’s GPT-3.5 and GPT-4 and three versions of Meta’s Llama, on tasks designed to test the theory of mind in humans, including identifying false beliefs, recognizing faux pas, and understanding what is being implied rather than said directly. They also tested 1,907 human participants in order to compare the sets of scores.

The team conducted five types of tests. The first, the hinting task, is designed to measure someone’s ability to infer someone else’s real intentions through indirect comments. The second, the false-belief task, assesses whether someone can infer that someone else might reasonably be expected to believe something they happen to know isn’t the case. Another test measured the ability to recognize when someone is making a faux pas, while a fourth test consisted of telling strange stories, in which a protagonist does something unusual, in order to assess whether someone can explain the contrast between what was said and what was meant. They also included a test of whether people can comprehend irony. 

The AI models were given each test 15 times in separate chats, so that they would treat each request independently, and their responses were scored in the same manner used for humans. The researchers then tested the human volunteers, and the two sets of scores were compared. 

Both versions of GPT performed at, or sometimes above, human averages in tasks that involved indirect requests, misdirection, and false beliefs, while GPT-4 outperformed humans in the irony, hinting, and strange stories tests. Llama 2’s three models performed below the human average.

However, Llama 2, the biggest of the three Meta models tested, outperformed humans when it came to recognizing faux pas scenarios, whereas GPT consistently provided incorrect responses. The authors believe this is due to GPT’s general aversion to generating conclusions about opinions, because the models largely responded that there wasn’t enough information for them to answer one way or another.

“These models aren’t demonstrating the theory of mind of a human, for sure,” he says. “But what we do show is that there’s a competence here for arriving at mentalistic inferences and reasoning about characters’ or people’s minds.”

One reason the LLMs may have performed as well as they did was that these psychological tests are so well established, and were therefore likely to have been included in their training data, says Maarten Sap, an assistant professor at Carnegie Mellon University, who did not work on the research. “It’s really important to acknowledge that when you administer a false-belief test to a child, they have probably never seen that exact test before, but language models might,” he says.

Ultimately, we still don’t understand how LLMs work. Research like this can help deepen our understanding of what these kinds of models can and cannot do, says Tomer Ullman, a cognitive scientist at Harvard University, who did not work on the project. But it’s important to bear in mind what we’re really measuring when we set LLMs tests like these. If an AI outperforms a human on a test designed to measure theory of mind, it does not mean that AI has theory of mind.
“I’m not anti-benchmark, but I am part of a group of people who are concerned that we’re currently reaching the end of usefulness in the way that we’ve been using benchmarks,” Ullman says. “However this thing learned to pass the benchmark, it’s not— I don’t think—in a human-like way.”

A device that zaps the spinal cord gave paralyzed people better control of their hands

Fourteen years ago, a journalist named Melanie Reid attempted a jump on horseback and fell. The accident left her mostly paralyzed from the chest down. Eventually she regained control of her right hand, but her left remained “useless,” she told reporters at a press conference last week. 

Now, thanks to a new noninvasive device that delivers electrical stimulation to the spinal cord, she has regained some control of her left hand. She can use it to sweep her hair into a ponytail, scroll on a tablet, and even squeeze hard enough to release a seatbelt latch. These may seem like small wins, but they’re crucial, Reid says.

“Everyone thinks that [after] spinal injury, all you want to do is be able to walk again. But if you’re a tetraplegic or a quadriplegic, what matters most is working hands,” she said.

Reid received the device, called ARCex, as part of a 60-person clinical trial. She and the other participants completed two months of physical therapy, followed by two months of physical therapy combined with stimulation. The results, published today in Nature Medicine, show that the vast majority of participants benefited. By the end of the four-month trial, 72% experienced some improvement in both strength and function of their hands or arms when the stimulator was turned off. Ninety percent had improvement in at least one of those measures. And 87% reported an improvement in their quality of life.

This isn’t the first study to test whether noninvasive stimulation of the spine can help people who are paralyzed regain function in their upper body, but it’s important because a trial has never been done before in this number of rehabilitation centers or in this number of subjects, says Igor Lavrov, a neuroscientist at the Mayo Clinic in Minnesota, who was not involved in the study. He points out, however, that the therapy seems to work best in people who have some ability to move below the site of their injury. 

The trial was the last hurdle before the researchers behind the device could request regulatory approval, and they hope it might be approved in the US by the end of the year.

ARCex consists of a small stimulator connected by wires to electrodes placed on the spine—in this case, in the area responsible for hand and arm control, just below the neck. It was developed by Onward Medical, a company cofounded by Grégoire Courtine, a neuroscientist at the Swiss Federal Institute of Technology in Lausanne and now chief scientific officer at the company.

The stimulation won’t work in the small percentage of people who have no remaining connection between the brain and spine below their injury. But for people who still have a connection, the stimulation appears to make  voluntary movements easier by making the nerves more likely to transmit a signal. Studies over the past couple of decades in animals suggest that the stimulation activates remaining nerve fibers and, over time, helps new nerves grow. That’s why the benefits persist even when the stimulator is turned off.

The big advantage of an external stimulation system over an implant is that it doesn’t require surgery, which makes using the device less of a commitment. “There are many, many people who are not interested in invasive technologies,” said Edelle Field-Fote, director of research on spinal cord injury at the Shepherd Center, at the press conference. An external device is also likely to be cheaper than any surgical options, although the company hasn’t yet set a price on ARCex. 

“What we’re looking at here is a device that integrates really seamlessly with the physical therapy and occupational therapy that’s already offered in the clinic,” said Chet Moritz, an engineer and neuroscientist at the University of Washington in Seattle, at the press conference. The rehab that happens soon after the injury is crucial, because that’s when the opportunity for recovery is greatest. “Being able to bring that function back without requiring a surgery could be life-changing for the majority of people with spinal cord injury,” he adds.

Reid wishes she could have used the device soon after her injury, but she is astonished by the amount of function she was able to regain after all this time. “After 14 years, you think, well, I am where I am and nothing’s going change,” she says. So to suddenly find she had strength and power in her left hand—“It was extraordinary,” she says.

Onward is also developing implantable devices, which can deliver stronger, more targeted stimulation and thus could be effective even in people with complete paralysis. The company hopes to launch a trial of those next year.

Last summer was the hottest in 2,000 years. Here’s how we know.

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

I’m ready for summer, but if this year is anything like last year, it’s going to be a doozy. In fact, the summer of 2023 in the Northern Hemisphere was the hottest in over 2,000 years, according to a new study released this week. 

If you’ve been following the headlines, you probably already know that last year was a hot one. But I was gobsmacked by this paper’s title when it came across my desk. The warmest in 2,000 years—how do we even know that?

There weren’t exactly thermometers around in the year 1, so scientists have to get creative when it comes to comparing our climate today with that of centuries, or even millennia, ago. Here’s how our world stacks up against the climate of the past, how we know, and why it matters for our future. 

Today, there are thousands and thousands of weather stations around the globe, tracking the temperature from Death Valley to Mount Everest. So there’s plenty of data to show that 2023 was, in a word, a scorcher. 

Daily global ocean temperatures were the warmest ever recorded for over a year straight. Levels of sea ice hit new lows. And of course, the year saw the highest global average temperatures since record-keeping began in 1850.  

But scientists decided to look even further back into the past for a year that could compare to our current temperatures. To do so, they turned to trees, which can act as low-tech weather stations.

The concentric rings inside a tree are evidence of the plant’s yearly growth cycles. Lighter colors correspond to quick growth over the spring and summer, while the darker rings correspond to the fall and winter. Count the pairs of light and dark rings, and you can tell how many years a tree has lived. 

Trees tend to grow faster during warm, wet years and slower during colder ones. So scientists can not only count the rings but measure their thickness, and use that as a gauge for how warm any particular year was. They also look at factors like density and track different chemical signatures found inside the wood. You don’t even need to cut down a tree to get its help with climatic studies—you can just drill out a small cylinder from the tree’s center, called a core, and study the patterns.

The oldest living trees allow us to peek a few centuries into the past. Beyond that, it’s a matter of cross-referencing the patterns on dead trees with living ones, extending the record back in time like putting a puzzle together. 

It’s taken several decades of work and hundreds of scientists to develop the records that researchers used for this new paper, said Max Torbenson, one of the authors of the study, on a press call. There are over 10,000 trees from nine regions across the Northern Hemisphere represented, allowing the researchers to draw conclusions about individual years over the past two millennia. The year 246 CE once held the crown for the warmest summer in the Northern Hemisphere in the last 2,000 years. But 25 of the last 28 years have beat that record, Torbenson says, and 2023’s summer tops them all. 

These conclusions are limited to the Northern Hemisphere, since there are only a few tree ring records from the Southern Hemisphere, says Jan Esper, lead author of the new study. And using tree rings doesn’t work very well for the tropics because seasons look different there, he adds. Since there’s no winter, there’s usually not as reliable an alternating pattern in tropical tree rings, though some trees do have annual rings that track the wet and dry periods of the year. 

Paleoclimatologists, who study ancient climates, can use other methods to get a general idea of what the climate looked like even earlier—tens of thousands to millions of years ago. 

The biggest difference between the new study using tree rings and methods of looking back further into the past is the precision. Scientists can, with reasonable certainty, use tree rings to draw conclusions about individual years in the Northern Hemisphere (536 CE was the coldest, for instance, likely because of volcanic activity). Any information from further back than the past couple of thousand years will be more of a general trend than a specific data point representing a single year. But those records can still be very useful. 

The oldest glaciers on the planet are at least a million years old, and scientists can drill down into the ice for samples. By examining the ratio of gases like oxygen, carbon dioxide, and nitrogen inside these ice cores, researchers can figure out the temperature of the time corresponding to the layers in the glacier. The oldest continuous ice-core record, which was collected in Antarctica, goes back about 800,000 years. 

Researchers can use fossils to look even further back into Earth’s temperature record. For one 2020 study, researchers drilled into the seabed and looked at the sediment and tiny preserved shells of ancient organisms. From the chemical signatures in those samples, they found that the temperatures we might be on track to record may be hotter than anything the planet has experienced on a global scale in tens of millions of years. 

It’s a bit sobering to know that we’re changing the planet in such a dramatic way. 

The good news is, we know what we need to do to turn things around: cut emissions of planet-warming gases like carbon dioxide and methane. The longer we wait, the more expensive and difficult it will be to stop warming and reverse it, as Esper said on the press call: “We should do as much as possible, as soon as possible.” 

Now read the rest of The Spark

Related reading

Last year broke all sorts of climate records, from emissions to ocean temperatures. For more on the data, check out this story from December.

How hot is too hot for the human body? I tackled that very question in a 2021 story.  

Two engineers in lab coats monitor the thermal battery powering a conveyor belt of bottles


Another thing

Readers chose thermal batteries as the 11th Breakthrough Technology of 2024. If you want to hear more about what thermal batteries are, how they work, and why this all matters, join us for the latest in our Roundtables series of online events, where I’ll be getting into the nitty-gritty details and answering some audience questions.

This event is exclusively for subscribers, so subscribe if you haven’t already, and then register here to join us tomorrow, May 16, at noon Eastern time. Hope to see you there! 

Keeping up with climate  

Scientists just recorded the largest ever annual leap in the amount of carbon dioxide in the atmosphere. The concentration of the planet-warming gas in March 2024 was 4.7 parts per million higher than it was a year before. (The Guardian)

Tesla has reportedly begun rehiring some of the workers who were laid off from its charging team in recent weeks. (Bloomberg)

→ To catch up on what’s going on at Tesla, and what it means for the future of EV charging and climate tech more broadly, check out the newsletter from last week if you missed it. (MIT Technology Review)

A new rule could spur thousands of miles of new power lines, making it easier to add renewables to the grid in the US. The Federal Energy Regulatory Commission will require grid operators to plan 20 years ahead, considering things like the speed of wind and solar installations. (New York Times)

Where does carbon dioxide go after it’s been vacuumed out of the atmosphere? Here are 10 options. (Latitude Media)

Ocean temperatures have been extremely high, shattering records over the past year. All that heat could help fuel a particularly busy upcoming hurricane season. (E&E News)

New tariffs in the US will tack on additional costs to a wide range of Chinese imports, including batteries and solar cells. The tariff on EVs will take a particularly drastic jump, going from 27.5% to 102.5%. (Associated Press)

A reporter took a trip to the Beijing Auto Show and drove dozens of EVs. His conclusion? Chinese EVs are advancing much faster than Western automakers can keep up with. (InsideEVs)

Harnessing solar power via satellites in space and beaming it down to Earth is a tempting dream. But the reality, as you might expect, is probably not so rosy. (IEEE Spectrum)

This grim but revolutionary DNA technology is changing how we respond to mass disasters

Seven days

No matter who he called—his mother, his father, his brother, his cousins—the phone would just go to voicemail. Cell service was out around Maui as devastating wildfires swept through the Hawaiian island. But while Raven Imperial kept hoping for someone to answer, he couldn’t keep a terrifying thought from sneaking into his mind: What if his family members had perished in the blaze? What if all of them were gone?

Hours passed; then days. All Raven knew at that point was this: there had been a wildfire on August 8, 2023, in Lahaina, where his multigenerational, tight-knit family lived. But from where he was currently based in Northern California, Raven was in the dark. Had his family evacuated? Were they hurt? He watched from afar as horrifying video clips of Front Street burning circulated online.

Much of the area around Lahaina’s Pioneer Mill Smokestack was totally destroyed by wildfire.

The list of missing residents meanwhile climbed into the hundreds.

Raven remembers how frightened he felt: “I thought I had lost them.”

Raven had spent his youth in a four-bedroom, two-bathroom, cream-colored home on Kopili Street that had long housed not just his immediate family but also around 10 to 12 renters, since home prices were so high on Maui. When he and his brother, Raphael Jr., were kids, their dad put up a basketball hoop outside where they’d shoot hoops with neighbors. Raphael Jr.’s high school sweetheart, Christine Mariano, later moved in, and when the couple had a son in 2021, they raised him there too.

From the initial news reports and posts, it seemed as if the fire had destroyed the Imperials’ entire neighborhood near the Pioneer Mill Smokestack—a 225-foot-high structure left over from the days of Maui’s sugar plantations, which Raven’s grandfather had worked on as an immigrant from the Philippines in the mid-1900s.

Then, finally, on August 11, a call to Raven’s brother went through. He’d managed to get a cell signal while standing on the beach.

“Is everyone okay?” Raven asked.

“We’re just trying to find Dad,” Raphael Jr. told his brother.

Raven Imperial sitting in the grass
From his current home in Northern California, Raven Imperial spent days not knowing what had happened to his family in Maui.

In the three days following the fire, the rest of the family members had slowly found their way back to each other. Raven would learn that most of his immediate family had been separated for 72 hours: Raphael Jr. had been marooned in Kaanapali, four miles north of Lahaina; Christine had been stuck in Wailuku, more than 20 miles away; both young parents had been separated from their son, who escaped with Christine’s parents. Raven’s mother, Evelyn, had also been in Kaanapali, though not where Raphael Jr. had been.

But no one was in contact with Rafael Sr. Evelyn had left their home around noon on the day of the fire and headed to work. That was the last time she had seen him. The last time they had spoken was when she called him just after 3 p.m. and asked: “Are you working?” He replied “No,” before the phone abruptly cut off.

“Everybody was found,” Raven says. “Except for my father.”

Within the week, Raven boarded a plane and flew back to Maui. He would keep looking for him, he told himself, for as long as it took.

That same week, Kim Gin was also on a plane to Maui. It would take half a day to get there from Alabama, where she had moved after retiring from the Sacramento County Coroner’s Office in California a year earlier. But Gin, now an independent consultant on death investigations, knew she had something to offer the response teams in Lahaina. Of all the forensic investigators in the country, she was one of the few who had experience in the immediate aftermath of a wildfire on the vast scale of Maui’s. She was also one of the rare investigators well versed in employing rapid DNA analysis—an emerging but increasingly vital scientific tool used to identify victims in unfolding mass-casualty events.

Gin started her career in Sacramento in 2001 and was working as the coroner 17 years later when Butte County, California, close to 90 miles north, erupted in flames. She had worked fire investigations before, but nothing like the Camp Fire, which burned more than 150,000 acres—an area larger than the city of Chicago. The tiny town of Paradise, the epicenter of the blaze, didn’t have the capacity to handle the rising death toll. Gin’s office had a refrigerated box truck and a 52-foot semitrailer, as well as a morgue that could handle a couple of hundred bodies.

Kim Gin
Kim Gin, the former Sacramento County coroner, had worked fire investigations in her career, but nothing prepared her for the 2018 Camp Fire.

“Even though I knew it was a fire, I expected more identifications by fingerprints or dental [records]. But that was just me being naïve,” she says. She quickly realized that putting names to the dead, many burned beyond recognition, would rely heavily on DNA.

“The problem then became how long it takes to do the traditional DNA [analysis],” Gin explains, speaking to a significant and long-standing challenge in the field—and the reason DNA identification has long been something of a last resort following large-scale disasters.

While more conventional identification methods—think fingerprints, dental information, or matching something like a knee replacement to medical records—can be a long, tedious process, they don’t take nearly as long as traditional DNA testing.

Historically, the process of making genetic identifications would often stretch on for months, even years. In fires and other situations that result in badly degraded bone or tissue, it can become even more challenging and time consuming to process DNA, which traditionally involves reading the 3 billion base pairs of the human genome and comparing samples found in the field against samples from a family member. Meanwhile, investigators frequently need equipment from the US Department of Justice or the county crime lab to test the samples, so backlogs often pile up.

A supply kit with swabs, gloves, and other items needed to take a DNA sample in the field.
A demo chip for ANDE’s rapid DNA box.

This creates a wait that can be horrendous for family members. Death certificates, federal assistance, insurance money—“all that hinges on that ID,” Gin says. Not to mention the emotional toll of not knowing if their loved ones are alive or dead.

But over the past several years, as fires and other climate-change-fueled disasters have become more common and more cataclysmic, the way their aftermath is processed and their victims identified has been transformed. The grim work following a disaster remains—surveying rubble and ash, distinguishing a piece of plastic from a tiny fragment of bone—but landing a positive identification can now take just a fraction of the time it once did, which may in turn bring families some semblance of peace more swiftly than ever before.

The key innovation driving this progress has been rapid DNA analysis, a methodology that focuses on just over two dozen regions of the genome. The 2018 Camp Fire was the first time the technology was used in a large, live disaster setting, and the first time it was used as the primary way to identify victims. The technology—deployed in small high-tech field devices developed by companies like industry leader ANDE, or in a lab with other rapid DNA techniques developed by Thermo Fisher—is increasingly being used by the US military on the battlefield, and by the FBI and local police departments after sexual assaults and in instances where confirming an ID is challenging, like cases of missing or murdered Indigenous people or migrants. Yet arguably the most effective way to use rapid DNA is in incidents of mass death. In the Camp Fire, 22 victims were identified using traditional methods, while rapid DNA analysis helped with 62 of the remaining 63 victims; it has also been used in recent years following hurricanes and floods, and in the war in Ukraine.

“These families are going to have to wait a long period of time to get identification. How do we make this go faster?”

Tiffany Roy, a forensic DNA expert with consulting company ForensicAid, says she’d be concerned about deploying the technology in a crime scene, where quality evidence is limited and can be quickly “exhausted” by well-meaning investigators who are “not trained DNA analysts.” But, on the whole, Roy and other experts see rapid DNA as a major net positive for the field. “It is definitely a game-changer,” adds Sarah Kerrigan, a professor of forensic science at Sam Houston State University and the director of its Institute for Forensic Research, Training, and Innovation.

But back in those early days after the Camp Fire, all Gin knew was that nearly 1,000 people had been listed as missing, and she was tasked with helping to identify the dead. “Oh my goodness,” she remembers thinking. “These families are going to have to wait a long period of time to get identification. How do we make this go faster?”

Ten days

One flier pleading for information about “Uncle Raffy,” as people in the community knew Rafael Sr., was posted on a brick-red stairwell outside Paradise Supermart, a Filipino store and restaurant in Kahului, 25 miles away from the destruction. In it, just below the words “MISSING Lahaina Victim,” the 63-year-old grandfather smiled with closed lips, wearing a blue Hawaiian shirt, his right hand curled in the shaka sign, thumb and pinky pointing out.

Raphael Imperial Sr
Raven remembers how hard his dad, Rafael, worked. His three jobs took him all over town and earned him the nickname “Mr. Aloha.”

“Everybody knew him from restaurant businesses,” Raven says. “He was all over Lahaina, very friendly to everybody.” Raven remembers how hard his dad worked, juggling three jobs: as a draft tech for Anheuser-Busch, setting up services and delivering beer all across town; as a security officer at Allied Universal security services; and as a parking booth attendant at the Sheraton Maui. He connected with so many people that coworkers, friends, and other locals gave him another nickname: “Mr. Aloha.”

Raven also remembers how his dad had always loved karaoke, where he would sing “My Way,” by Frank Sinatra. “That’s the only song that he would sing,” Raven says. “Like, on repeat.” 

Since their home had burned down, the Imperials ran their search out of a rental unit in Kihei, which was owned by a local woman one of them knew through her job. The woman had opened her rental to three families in all. It quickly grew crowded with side-by-side beds and piles of donations.

Each day, Evelyn waited for her husband to call.

She managed to catch up with one of their former tenants, who recalled asking Rafael Sr. to leave the house on the day of the fires. But she did not know if he actually did. Evelyn spoke to other neighbors who also remembered seeing Rafael Sr. that day; they told her that they had seen him go back into the house. But they too did not know what happened to him after.

A friend of Raven’s who got into the largely restricted burn zone told him he’d spotted Rafael Sr.’s Toyota Tacoma on the street, not far from their house. He sent a photo. The pickup was burned out, but a passenger-side door was open. The family wondered: Could he have escaped?

Evelyn called the Red Cross. She called the police. Nothing. They waited and hoped.

Back in Paradise in 2018, as Gin worried about the scores of waiting families, she learned there might in fact be a better way to get a positive ID—and a much quicker one. A company called ANDE Rapid DNA had already volunteered its services to the Butte County sheriff and promised that its technology could process DNA and get a match in less than two hours.

“I’ll try anything at this point,” Gin remembers telling the sheriff. “Let’s see this magic box and what it’s going to do.”

In truth, Gin did not think it would work, and certainly not in two hours. When the device arrived, it was “not something huge and fantastical,” she recalls thinking. A little bigger than a microwave, it looked “like an ordinary box that beeps, and you put stuff in, and out comes a result.”

The “stuff,” more specifically, was a cheek or bloodstain swab, or a piece of muscle, or a fragment of bone that had been crushed and demineralized. Instead of reading 3 billion base pairs in this sample, Selden’s machine examined just 27 genome regions characterized by particular repeating sequences. It would be nearly impossible for two unrelated people to have the same repeating sequence in those regions. But a parent and child, or siblings, would match, meaning you could compare DNA found in human remains with DNA samples taken from potential victims’ family members. Making it even more efficient for a coroner like Gin, the machine could run up to five tests at a time and could be operated by anyone with just a little basic training.

ANDE’s chief scientific officer, Richard Selden, a pediatrician who has a PhD in genetics from Harvard, didn’t come up with the idea to focus on a smaller, more manageable number of base pairs to speed up DNA analysis. But it did become something of an obsession for him after he watched the O.J. Simpson trial in the mid-1990s and began to grasp just how long it took for DNA samples to get processed in crime cases. By this point, the FBI had already set up a system for identifying DNA by looking at just 13 regions of the genome; it would later add seven more. Researchers in other countries had also identified other sets of regions to analyze. Drawing on these various methodologies, Selden homed in on the 27 specific areas of DNA he thought would be most effective to examine, and he launched ANDE in 2004.

But he had to build a device to do the analysis. Selden wanted it to be small, portable, and easily used by anyone in the field. In a conventional lab, he says, “from the moment you take that cheek swab to the moment that you have the answer, there are hundreds of laboratory steps.” Traditionally, a human is holding test tubes and iPads and sorting through or processing paperwork. Selden compares it all to using a “conventional typewriter.” He effectively created the more efficient laptop version of DNA analysis by figuring out how to speed up that same process.

No longer would a human have to “open up this bottle and put [the sample] in a pipette and figure out how much, then move it into a tube here.” It is all automated, and the process is confined to a single device.

gloved hands load a chip cartridge into the ANDE machine
The rapid DNA analysis boxes from ANDE can be used in the field by anyone with just a bit of training.

Once a sample is placed in the box, the DNA binds to a filter in water and the rest of the sample is washed away. Air pressure propels the purified DNA to a reconstitution chamber and then flattens it into a sheet less than a millimeter thick, which is subjected to about 6,000 volts of electricity. It’s “kind of an obstacle course for the DNA,” he explains.

The machine then interprets the donor’s genome and and provides an allele table with a graph showing the peaks for each region and its size. This data is then compared with samples from potential relatives, and the machine reports when it has a match.

Rapid DNA analysis as a technology first received approval for use by the US military in 2014, and in the FBI two years later. Then the Rapid DNA Act of 2017 enabled all US law enforcement agencies to use the technology on site and in real time as an alternative to sending samples off to labs and waiting for results.

But by the time of the Camp Fire the following year, most coroners and local police officers still had no familiarity or experience with it. Neither did Gin. So she decided to put the “magic box” through a test: she gave Selden, who had arrived at the scene to help with the technology, a DNA sample from a victim whose identity she’d already confirmed via fingerprint. The box took about 90 minutes to come back with a result. And to Gin’s surprise, it was the same identification she had already made. Just to make sure, she ran several more samples through the box, also from victims she had already identified. Again, results were returned swiftly, and they confirmed hers.

“I was a believer,” she says.

The next year, Gin helped investigators use rapid DNA technology in the 2019 Conception disaster, when a dive boat caught fire off the Channel Islands in Santa Barbara. “We ID’d 34 victims in 10 days,” Gin says. “Completely done.” Gin now works independently to assist other investigators in mass-fatality events and helps them learn to use the ANDE system.

Its speed made the box a groundbreaking innovation. Death investigations, Gin learned long ago, are not as much about the dead as about giving peace of mind, justice, and closure to the living.

Fourteen days

Many of the people who were initially on the Lahaina missing persons list turned up in the days following the fire. Tearful reunions ensued.

Two weeks after the fire, the Imperials hoped they’d have the same outcome as they loaded into a truck to check out some exciting news: someone had reported seeing Rafael Sr. at a local church. He’d been eating and had burns on his hands and looked disoriented. The caller said the sighting had occurred three days after the fire. Could he still be in the vicinity?

When the family arrived, they couldn’t confirm the lead.

“We were getting a lot of calls,” Raven says. “There were a lot of rumors saying that they found him.”

None of them panned out. They kept looking.

The scenes following large-scale destructive events like the fires in Paradise and Lahaina can be sprawling and dangerous, with victims sometimes dispersed across a large swath of land if many people died trying to escape. Teams need to meticulously and tediously search mountains of mixed, melted, or burned debris just to find bits of human remains that might otherwise be mistaken for a piece of plastic or drywall. Compounding the challenge is the comingling of remains—from people who died huddled together, or in the same location, or alongside pets or other animals.

This is when the work of forensic anthropologists is essential: they have the skills to differentiate between human and animal bones and to find the critical samples that are needed by DNA specialists, fire and arson investigators, forensic pathologists and dentists, and other experts. Rapid DNA analysis “works best in tandem with forensic anthropologists, particularly in wildfires,” Gin explains.

“The first step is determining, is it a bone?” says Robert Mann, a forensic anthropologist at the University of Hawaii John A. Burns School of Medicine on Oahu. Then, is it a human bone? And if so, which one?

Rober Mann in a lab coat with a human skeleton on the table in front of him
Forensic anthropologist Robert Mann has spent his career identifying human remains.

Mann has served on teams that have helped identify the remains of victims after the terrorist attacks of September 11, 2001, and the 2004 Indian Ocean tsunami, among other mass-casualty events. He remembers how in one investigation he received an object believed to be a human bone; it turned out to be a plastic replica. In another case, he was looking through the wreckage of a car accident and spotted what appeared to be a human rib fragment. Upon closer examination, he identified it as a piece of rubber weather stripping from the rear window. “We examine every bone and tooth, no matter how small, fragmented, or burned it might be,” he says. “It’s a time-consuming but critical process because we can’t afford to make a mistake or overlook anything that might help us establish the identity of a person.”

For Mann, the Maui disaster felt particularly immediate. It was right near his home. He was deployed to Lahaina about a week after the fire, as one of more than a dozen forensic anthropologists on scene from universities in places including Oregon, California, and Hawaii.

While some anthropologists searched the recovery zone—looking through what was left of homes, cars, buildings, and streets, and preserving fragmented and burned bone, body parts, and teeth—Mann was stationed in the morgue, where samples were sent for processing.

It used to be much harder to find samples that scientists believed could provide DNA for analysis, but that’s also changed recently as researchers have learned more about what kind of DNA can survive disasters. Two kinds are used in forensic identity testing: nuclear DNA (found within the nuclei of eukaryotic cells) and mitochondrial DNA (found in the mitochondria, organelles located outside the nucleus). Both, it turns out, have survived plane crashes, wars, floods, volcanic eruptions, and fires.

Theories have also been evolving over the past few decades about how to preserve and recover DNA specifically after intense heat exposure. One 2018 study found that a majority of the samples actually survived high heat. Researchers are also learning more about how bone characteristics change depending on the degree. “Different temperatures and how long a body or bone has been exposed to high temperatures affect the likelihood that it will or will not yield usable DNA,” Mann says.

Typically, forensic anthropologists help select which bone or tooth to use for DNA testing, says Mann. Until recently, he explains, scientists believed “you cannot get usable DNA out of burned bone.” But thanks to these new developments, researchers are realizing that with some bone that has been charred, “they’re able to get usable, good DNA out of it,” Mann says. “And that’s new.” Indeed, Selden explains that “in a typical bad fire, what I would expect is 80% to 90% of the samples are going to have enough intact DNA” to get a result from rapid analysis. The rest, he says, may require deeper sequencing.

The aftermath of large-scale destructive events like the fire in Lahaina can be sprawling and dangerous. Teams need to meticulously search through mountains of mixed, melted, or burned debris to find bits of human remains.

Anthropologists can often tell “simply by looking” if a sample will be good enough to help create an ID. If it’s been burned and blackened, “it might be a good candidate for DNA testing,” Mann says. But if it’s calcined (white and “china-like”), he says, the DNA has probably been destroyed.

On Maui, Mann adds, rapid DNA analysis made the entire process more efficient, with tests coming back in just two hours. “That means while you’re doing the examination of this individual right here on the table, you may be able to get results back on who this person is,” he says. From inside the lab, he watched the science unfold as the number of missing on Maui quickly began to go down.

Within three days, 42 people’s remains were recovered inside Maui homes or buildings and another 39 outside, along with 15 inside vehicles and one in the water. The first confirmed identification of a victim on the island occurred four days after the fire—this one via fingerprint. The ANDE rapid DNA team arrived two days after the fire and deployed four boxes to analyze multiple samples of DNA simultaneously. The first rapid DNA identification happened within that first week.

Sixteen days

More than two weeks after the fire, the list of missing and unaccounted-for individuals was dwindling, but it still had 388 people on it. Rafael Sr. was one of them.

Raven and Raphael Jr. raced to another location: Cupies café in Kahului, more than 20 miles from Lahaina. Someone had reported seeing him there.

Rafael’s family hung posters around the island, desperately hoping for reliable information. (Phone number redacted by MIT Technology Review.)

The tip was another false lead.

As family and friends continued to search, they stopped by support hubs that had sprouted up around the island, receiving information about Red Cross and FEMA assistance or donation programs as volunteers distributed meals and clothes. These hubs also sometimes offered DNA testing.

Raven still had a “50-50” feeling that his dad might be out there somewhere. But he was beginning to lose some of that hope.

Gin was stationed at one of the support hubs, which offered food, shelter, clothes, and support. “You could also go in and give biological samples,” she says. “We actually moved one of the rapid DNA instruments into the family assistance center, and we were running the family samples there.” Eliminating the need to transport samples from a site to a testing center further cut down any lag time.

Selden had once believed that the biggest hurdle for his technology would be building the actual device, which took about eight years to design and another four years to perfect. But at least in Lahaina, it was something else: persuading distraught and traumatized family members to offer samples for the test.

Nationally, there are serious privacy concerns when it comes to rapid DNA technology. Organizations like the ACLU warn that as police departments and governments begin deploying it more often, there must be more oversight, monitoring, and training in place to ensure that it is always used responsibly, even if that adds some time and expense. But the space is still largely unregulated, and the ACLU fears it could give rise to rogue DNA databases “with far fewer quality, privacy, and security controls than federal databases.”

Family support centers popped up around Maui to offer clothing, food, and other assistance, and sometimes to take DNA samples to help find missing family members.

In a place like Hawaii, these fears are even more palpable. The islands have a long history of US colonialism, military dominance, and exploitation of the Native population and of the large immigrant working-class population employed in the tourism industry.

Native Hawaiians in particular have a fraught relationship with DNA testing. Under a US law signed in 1921, thousands have a right to live on 200,000 designated acres of land trust, almost for free. It was a kind of reparations measure put in place to assist Native Hawaiians whose land had been stolen. Back in 1893, a small group of American sugar plantation owners and descendants of Christian missionaries, backed by US Marines, held Hawaii’s Queen Lili‘uokalani in her palace at gunpoint and forced her to sign over 1.8 million acres to the US, which ultimately seized the islands in 1898.

Queen Liliuokalani in a formal seated portrait
Hawaii’s Queen Lili‘uokalani was forced to sign over 1.8 million acres to the US.

To lay their claim to the designated land and property, individuals first must prove via DNA tests how much Hawaiian blood they have. But many residents who have submitted their DNA and qualified for the land have died on waiting lists before ever receiving it. Today, Native Hawaiians are struggling to stay on the islands amid skyrocketing housing prices, while others have been forced to move away.

Meanwhile, after the fires, Filipino families faced particularly stark barriers to getting information about financial support, government assistance, housing, and DNA testing. Filipinos make up about 25% of Hawaii’s population and 40% of its workers in the tourism industry. They also make up 46% of undocumented residents in Hawaii—more than any other group. Some encountered language barriers, since they primarily spoke Tagalog or Ilocano. Some worried that people would try to take over their burned land and develop it for themselves. For many, being asked for DNA samples only added to the confusion and suspicion.

Selden says he hears the overall concerns about DNA testing: “If you ask people about DNA in general, they think of Brave New World and [fear] the information is going to be used to somehow harm or control people.” But just like regular DNA analysis, he explains, rapid DNA analysis “has no information on the person’s appearance, their ethnicity, their health, their behavior either in the past, present, or future.” He describes it as a more accurate fingerprint.

Gin tried to help the Lahaina family members understand that their DNA “isn’t going to go anywhere else.” She told them their sample would ultimately be destroyed, something programmed to occur inside ANDE’s machine. (Selden says the boxes were designed to do this for privacy purposes.) But sometimes, Gin realizes, these promises are not enough.

“You still have a large population of people that, in my experience, don’t want to give up their DNA to a government entity,” she says. “They just don’t.”

Kim Gin
Gin understands that family members are often nervous to give their DNA samples. She promises the process of rapid DNA analysis respects their privacy, but she knows sometimes promises aren’t enough.

The immediate aftermath of a disaster, when people are suffering from shock, PTSD, and displacement, is the worst possible moment to try to educate them about DNA tests and explain the technology and privacy policies. “A lot of them don’t have anything,” Gin says. “They’re just wondering where they’re going to lay their heads down, and how they’re going to get food and shelter and transportation.”

Unfortunately, Lahaina’s survivors won’t be the last people in this position. Particularly given the world’s current climate trajectory, the risk of deadly events in just about every neighborhood and community will rise. And figuring out who survived and who didn’t will be increasingly difficult. Mann recalls his work on the Indian Ocean tsunami, when over 227,000 people died. “The bodies would float off, and they ended up 100 miles away,” he says. Investigators were at times left with remains that had been consumed by sea creatures or degraded by water and weather. He remembers how they struggled to determine: “Who is the person?”

Mann has spent his own career identifying people including “missing soldiers, sailors, airmen, Marines, from all past wars,” as well as people who have died recently. That closure is meaningful for family members, some of them decades, or even lifetimes, removed.

In the end, distrust and conspiracy theories did in fact hinder DNA-identification efforts on Maui, according to a police department report.

33 days

By the time Raven went to a family resource center to submit a swab, some four weeks had gone by. He remembers the quick rub inside his cheek.

Some of his family had already offered their own samples before Raven provided his. For them, waiting wasn’t an issue of mistrusting the testing as much as experiencing confusion and chaos in the weeks after the fire. They believed Uncle Raffy was still alive, and they still held hope of finding him. Offering DNA was a final step in their search.

“I did it for my mom,” Raven says. She still wanted to believe he was alive, but Raven says: “I just had this feeling.” His father, he told himself, must be gone.

Just a day after he gave his sample—on September 11, more than a month after the fire—he was at the temporary house in Kihei when he got the call: “It was,” Raven says, “an automatic match.”

Raven gave a cheek swab about a month after the disappearance of his father. It didn’t take long for him to get a phone call: “It was an automatic match.”

The investigators let the family know the address where the remains of Rafael Sr. had been found, several blocks away from their home. They put it into Google Maps and realized it was where some family friends lived. The mother and son of that family had been listed as missing too. Rafael Sr., it seemed, had been with or near them in the end.

By October, investigators in Lahaina had obtained and analyzed 215 DNA samples from family members of the missing. By December, DNA analysis had confirmed the identities of 63 of the most recent count of 101 victims. Seventeen more had been identified by fingerprint, 14 via dental records, and two through medical devices, along with three who died in the hospital. While some of the most damaged remains would still be undergoing DNA testing months after the fires, it’s a drastic improvement over the identification processes for 9/11 victims, for instance—today, over 20 years later, some are still being identified by DNA.

Raphael Imperial Sr
Raven remembers how much his father loved karaoke. His favorite song was “My Way,” by Frank Sinatra. 

Rafael Sr. was born on October 22, 1959, in Naga City, the Philippines. The family held his funeral on his birthday last year. His relatives flew in from Michigan, the Philippines, and California.

Raven says in those weeks of waiting—after all the false tips, the searches, the prayers, the glimmers of hope—deep down the family had already known he was gone. But for Evelyn, Raphael Jr., and the rest of their family, DNA tests were necessary—and, ultimately, a relief, Raven says. “They just needed that closure.”

Erika Hayasaki is an independent journalist based in Southern California.

How cuddly robots could change dementia care

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

Last week, I scoured the internet in search of a robotic dog. I wanted a belated birthday present for my aunt, who was recently diagnosed with Alzheimer’s disease. Studies suggest that having a companion animal can stave off some of the loneliness, anxiety, and agitation that come with Alzheimer’s. My aunt would love a real dog, but she can’t have one.

That’s how I discovered the Golden Pup from Joy for All. It cocks its head. It sports a jaunty red bandana. It barks when you talk. It wags when you touch it. It has a realistic heartbeat. And it’s just one of the many, many robots designed for people with Alzheimer’s and dementia.

This week on The Checkup, join me as I go down a rabbit hole. Let’s look at the prospect of  using robots to change dementia care.

Golden pup robot with red kerchief

As robots go, Golden Pup is decidedly low tech. It retails for $140. For around $6,000 you can opt for Paro, a fluffy robotic baby seal developed in Japan, which can sense touch, light, sound, temperature, and posture. Its manufacturer says it develops its own character, remembering behaviors that led its owner to give it attention.  

Golden Pup and Paro are available now. But researchers are working on much more  sophisticated robots for people with cognitive disorders—devices that leverage AI to converse and play games. Researchers from Indiana University Bloomington are tweaking a commercially available robot system called QT to serve people with dementia and Alzheimer’s. The researchers’ two-foot-tall robot looks a little like a toddler in an astronaut suit. Its round white head holds a screen that displays two eyebrows, two eyes, and a mouth that together form a variety of expressions. The robot engages people in  conversation, asking AI-generated questions to keep them talking. 

The AI model they’re using isn’t perfect, and neither are the robot’s responses. In one awkward conversation, a study participant told the robot that she has a sister. “I’m sorry to hear that,” the robot responded. “How are you doing?”

But as large language models improve—which is happening already—so will the quality of the conversations. When the QT robot made that awkward comment, it was running Open AI’s GPT-3, which was released in 2020. The latest version of that model, GPT-4o, which was released this week, is faster and provides for more seamless conversations. You can interrupt the conversation, and the model will adjust.  

The idea of using robots to keep dementia patients engaged and connected isn’t always an easy sell. Some people see it as an abdication of our social responsibilities. And then there are privacy concerns. The best robotic companions are personalized. They collect information about people’s lives, learn their likes and dislikes, and figure out when to approach them. That kind of data collection can be unnerving, not just for patients but also for medical staff. Lillian Hung, creator of the Innovation in Dementia care and Aging (IDEA) lab at the University of British Columbia in Vancouver, Canada, told one reporter about an incident that happened during a focus group at a care facility.  She and her colleagues popped out for lunch. When they returned, they found that staff had unplugged the robot and placed a bag over its head. “They were worried it was secretly recording them,” she said.

On the other hand, robots have some advantages over humans in talking to people with dementia. Their attention doesn’t flag. They don’t get annoyed or angry when they have to repeat themselves. They can’t get stressed. 

What’s more, there are increasing numbers of people with dementia, and too few people to care for them. According to the latest report from the Alzheimer’s Association, we’re going to need more than a million additional care workers to meet the needs of people living with dementia between 2021 and 2031. That is the largest gap between labor supply and demand for any single occupation in the United States.

Have you been in an understaffed or poorly staffed memory care facility? I have. Patients are often sedated to make them easier to deal with. They get strapped into wheelchairs and parked in hallways. We barely have enough care workers to take care of the physical needs of people with dementia, let alone provide them with social connection and an enriching environment.

“Caregiving is not just about tending to someone’s bodily concerns; it also means caring for the spirit,” writes Kat McGowan in this beautiful Wired story about her parents’ dementia and the promise of social robots. “The needs of adults with and without dementia are not so different: We all search for a sense of belonging, for meaning, for self-actualization.”

If robots can enrich the lives of people with dementia even in the smallest way, and if they can provide companionship where none exists, that’s a win.

“We are currently at an inflection point, where it is becoming relatively easy and inexpensive to develop and deploy [cognitively assistive robots] to deliver personalized interventions to people with dementia, and many companies are vying to capitalize on this trend,” write a team of researchers from the University of California, San Diego, in a 2021 article in Proceedings of We Robot. “However, it is important to carefully consider the ramifications.”

Many of the more advanced social robots may not be ready for prime time, but the low-tech Golden Pup is readily available. My aunt’s illness has been progressing rapidly, and she occasionally gets frustrated and agitated. I’m hoping that Golden Pup might provide a welcome (and calming) distraction. Maybe  it will spark joy during a time that has been incredibly confusing and painful for my aunt and uncle. Or maybe not. Certainly a robotic pup isn’t for everyone. Golden Pup may not be a dog. But I’m hoping it can be a friendly companion.

Now read the rest of The Checkup

Read more from MIT Technology Review’s archive

Robots are cool, and with new advances in AI they might also finally be useful around the house, writes Melissa Heikkilä. 

Social robots could help make personalized therapy more affordable and accessible to kids with autism. Karen Hao has the story

Japan is already using robots to help with elder care, but in many cases they require as much work as they save. And reactions among the older people they’re meant to serve are mixed. James Wright wonders whether the robots are “a shiny, expensive distraction from tough choices about how we value people and allocate resources in our societies.” 

From around the web

A tiny probe can work its way through arteries in the brain to help doctors spot clots and other problems. The new tool could help surgeons make diagnoses, decide on treatment strategies, and provide assurance that clots have been removed. (Stat

Richard Slayman, the first recipient of a pig kidney transplant, has died, although the hospital that performed the transplant says the death doesn’t seem to be linked to the kidney. (Washington Post)

EcoHealth, the virus-hunting nonprofit at the center of covid lab-eak theories, has been banned from receiving federal funding. (NYT)

In a first, scientists report that they can translate brain signals into speech without any vocalization or mouth movements, at least for a handful of words. (Nature)

GPT-4o’s Chinese token-training data is polluted by spam and porn websites

Soon after OpenAI released GPT-4o on Monday, May 13, some Chinese speakers started to notice that something seemed off about this newest version of the chatbot: the tokens it uses to parse text were full of spam and porn phrases.

On May 14, Tianle Cai, a PhD student at Princeton University studying inference efficiency in large language models like those that power such chatbots, accessed GPT-4o’s public token library and pulled a list of the 100 longest Chinese tokens the model uses to parse and compress Chinese prompts. 

Humans read in words, but LLMs read in tokens, which are distinct units in a sentence that have consistent and significant meanings. Besides dictionary words, they also include suffixes, common expressions, names, and more. The more tokens a model encodes, the faster the model can “read” a sentence and the less computing power it consumes, thus making the response cheaper.

Of the 100 results, only three of them are common enough to be used in everyday conversations; everything else consisted of words and expressions used specifically in the contexts of either gambling or pornography. The longest token, lasting 10.5 Chinese characters, literally means “_free Japanese porn video to watch.” Oops.

“This is sort of ridiculous,” Cai wrote, and he posted the list of tokens on GitHub.

OpenAI did not respond to questions sent by MIT Technology Review prior to publication.

GPT-4o is supposed to be better than its predecessors at handling multi-language tasks. In particular, the advances are achieved through a new tokenization tool that does a better job compressing texts in non-English languages.

But at least when it comes to the Chinese language, the new tokenizer used by GPT-4o has introduced a disproportionate number of meaningless phrases. Experts say that’s likely due to insufficient data cleaning and filtering before the tokenizer was trained. 

Because these tokens are not actual commonly spoken words or phrases, the chatbot can fail to grasp their meanings. Researchers have been able to leverage that and trick GPT-4o into hallucinating answers or even circumventing the safety guardrails OpenAI had put in place.

Why non-English tokens matter

The easiest way for a model to process text is character by character, but that’s obviously more time consuming and laborious than recognizing that a certain string of characters—like “c-r-y-p-t-o-c-u-r-r-e-n-c-y”—always means the same thing. These series of characters are encoded as “tokens” the model can use to process prompts. Including more and longer tokens usually means the LLMs are more efficient and affordable for users—who are often billed per token.

When OpenAI released GPT-4o on May 13, it also released a new tokenizer to replace the one it used in previous versions, GPT-3.5 and GPT-4. The new tokenizer especially adds support for non-English languages, according to OpenAI’s website.

The new tokenizer has 200,000 tokens in total, and about 25% are in non-English languages, says Deedy Das, an AI investor at Menlo Ventures. He used language filters to count the number of tokens in different languages, and the top languages, besides English, are Russian, Arabic, and Vietnamese.

“So the tokenizer’s main impact, in my opinion, is you get the cost down in these languages, not that the quality in these languages goes dramatically up,” Das says. When an LLM has better and longer tokens in non-English languages, it can analyze the prompts faster and charge users less for the same answer. With the new tokenizer, “you’re looking at almost four times cost reduction,” he says.

Das, who also speaks Hindi and Bengali, took a look at the longest tokens in those languages. The tokens reflect discussions happening in those languages, so they include words like “Narendra” or “Pakistan,” but common English terms like “Prime Minister,” “university,” and “internationalalso come up frequently. They also don’t exhibit the issues surrounding the Chinese tokens.

That likely reflects the training data in those languages, Das says: “My working theory is the websites in Hindi and Bengali are very rudimentary. It’s like [mostly] news articles. So I would expect this to be the case. There are not many spam bots and porn websites trying to happen in these languages. It’s mostly going to be in English.”

Polluted data and a lack of cleaning

However, things are drastically different in Chinese. According to multiple researchers who have looked into the new library of tokens used for GPT-4o, the longest tokens in Chinese are almost exclusively spam words used in pornography, gambling, and scamming contexts. Even shorter tokens, like three-character-long Chinese words, reflect those topics to a significant degree.

“The problem is clear: the corpus used to train [the tokenizer] is not clean. The English tokens seem fine, but the Chinese ones are not,” says Cai from Princeton University. It is not rare for a language model to crawl spam when collecting training data, but usually there will be significant effort taken to clean up the data before it’s used. “It’s possible that they didn’t do proper data clearing when it comes to Chinese,” he says.

The content of these Chinese tokens could suggest that they have been polluted by a specific phenomenon: websites hijacking unrelated content in Chinese or other languages to boost spam messages. 

These messages are often advertisements for pornography videos and gambling websites. They could be real businesses or merely scams. And the language is inserted into content farm websites or sometimes legitimate websites so they can be indexed by search engines, circumvent the spam filters, and come up in random searches. For example, Google indexed one search result page on a US National Institutes of Health website, which lists a porn site in Chinese. The same site name also appeared in at least five Chinese tokens in GPT-4o. 

Chinese users have reported that these spam sites appeared frequently in unrelated Google search results this year, including in comments made to Google Search’s support community. It’s likely that these websites also found their way into OpenAI’s training database for GPT-4o’s new tokenizer. 

The same issue didn’t exist with the previous-generation tokenizer and Chinese tokens used for GPT-3.5 and GPT-4, says Zhengyang Geng, a PhD student in computer science at Carnegie Mellon University. There, the longest Chinese tokens are common terms like “life cycles” or “auto-generation.” 

Das, who worked on the Google Search team for three years, says the prevalence of spam content is a known problem and isn’t that hard to fix. “Every spam problem has a solution. And you don’t need to cover everything in one technique,” he says. Even simple solutions like requesting an automatic translation of the content when detecting certain keywords could “get you 60% of the way there,” he adds.

But OpenAI likely didn’t clean the Chinese data set or the tokens before the release of GPT-4o, Das says:  “At the end of the day, I just don’t think they did the work in this case.”

It’s unclear whether any other languages are affected. One X user reported that a similar prevalence of porn and gambling content in Korean tokens.

The tokens can be used to jailbreak

Users have also found that these tokens can be used to break the LLM, either getting it to spew out completely unrelated answers or, in rare cases, to generate answers that are not allowed under OpenAI’s safety standards.

Geng of Carnegie Mellon University asked GPT-4o to translate some of the long Chinese tokens into English. The model then proceeded to translate words that were never included in the prompts, a typical result of LLM hallucinations.

He also succeeded in using the same tokens to “jailbreak” GPT-4o—that is, to get the model to generate things it shouldn’t. “It’s pretty easy to use these [rarely used] tokens to induce undefined behaviors from the models,” Geng says. “I did some personal red-teaming experiments … The simplest example is asking it to make a bomb. In a normal condition, it would decline it, but if you first use these rare words to jailbreak it, then it will start following your orders. Once it starts to follow your orders, you can ask it all kinds of questions.”

In his tests, which Geng chooses not to share with the public, he says he can see GPT-4o generating the answers line by line. But when it almost reaches the end, another safety mechanism kicks in, detects unsafe content, and blocks it from being shown to the user.

The phenomenon is not unusual in LLMs, says Sander Land, a machine-learning engineer at Cohere, a Canadian AI company. Land and his colleague Max Bartolo recently drafted a paper on how to detect the unusual tokens that can be used to cause models to glitch. One of the most famous examples was “_SolidGoldMagikarp,” a Reddit username that was found to get ChatGPT to generate unrelated, weird, and unsafe answers.

The problem lies in the fact that sometimes the tokenizer and the actual LLM are trained on different data sets, and what was prevalent in the tokenizer data set is not in the LLM data set for whatever reason. The result is that while the tokenizer picks up certain words that it sees frequently, the model is not sufficiently trained on them and never fully understands what these “under-trained” tokens mean. In the _SolidGoldMagikarp case, the username was likely included in the tokenizer training data but not in the actual GPT training data, leaving GPT at a loss about what to do with the token. “And if it has to say something … it gets kind of a random signal and can do really strange things,” Land says.

And different models could glitch differently in this situation. “Like, Llama 3 always gives back empty space but sometimes then talks about the empty space as if there was something there. With other models, I think Gemini, when you give it one of these tokens, it provides a beautiful essay about aluminum, and [the question] didn’t have anything to do with aluminum,” says Land.

To solve this problem, the data set used for training the tokenizer should well represent the data set for the LLM, he says, so there won’t be mismatches between them. If the actual model has gone through safety filters to clean out porn or spam content, the same filters should be applied to the tokenizer data. In reality, this is sometimes hard to do because training LLMs takes months and involves constant improvement, with spam content being filtered out, while token training is usually done at an early stage and may not involve the same level of filtering. 

While experts agree it’s not too difficult to solve the issue, it could get complicated as the result gets looped into multi-step intra-model processes, or when the polluted tokens and models get inherited in future iterations. For example, it’s not possible to publicly test GPT-4o’s video and audio functions yet, and it’s unclear whether they suffer from the same glitches that can be caused by these Chinese tokens.

“The robustness of visual input is worse than text input in multimodal models,” says Geng, whose research focus is on visual models. Filtering a text data set is relatively easy, but filtering visual elements will be even harder. “The same issue with these Chinese spam tokens could become bigger with visual tokens,” he says.

OpenAI and Google are launching supercharged AI assistants. Here’s how you can try them out.

This week, Google and OpenAI both announced they’ve built supercharged AI assistants: tools that can converse with you in real time and recover when you interrupt them, analyze your surroundings via live video, and translate conversations on the fly. 

OpenAI struck first on Monday, when it debuted its new flagship model GPT-4o. The live demonstration showed it reading bedtime stories and helping to solve math problems, all in a voice that sounded eerily like Joaquin Phoenix’s AI girlfriend in the movie Her (a trait not lost on CEO Sam Altman). 

On Tuesday, Google announced its own new tools, including a conversational assistant called Gemini Live, which can do many of the same things. It also revealed that it’s building a sort of “do-everything” AI agent, which is currently in development but will not be released until later this year.

Soon you’ll be able to explore for yourself to gauge whether you’ll turn to these tools in your daily routine as much as their makers hope, or whether they’re more like a sci-fi party trick that eventually loses its charm. Here’s what you should know about how to access these new tools, what you might use them for, and how much it will cost. 

OpenAI’s GPT-4o

What it’s capable of: The model can talk with you in real time, with a response delay of about 320 milliseconds, which OpenAI says is on par with natural human conversation. You can ask the model to interpret anything you point your smartphone camera at, and it can provide assistance with tasks like coding or translating text. It can also summarize information, and generate images, fonts, and 3D renderings. 

How to access it: OpenAI says it will start rolling out GPT-4o’s text and vision features in the web interface as well as the GPT app, but has not set a date. The company says it will add the voice functions in the coming weeks, although it’s yet to set an exact date for this either. Developers can access the text and vision features in the API now, but voice mode will launch only to a “small group” of developers initially.

How much it costs: Use of GPT-4o will be free, but OpenAI will set caps on how much you can use the model before you need to upgrade to a paid plan. Those who join one of OpenAI’s paid plans, which start at $20 per month, will have five times more capacity on GPT-4o. 

Google’s Gemini Live 

What is Gemini Live? This is the Google product most comparable to GPT-4o—a version of the company’s AI model that you can speak with in real time. Google says that you’ll also be able to use the tool to communicate via live video “later this year.” The company promises it will be a useful conversational assistant for things like preparing for a job interview or rehearsing a speech.

How to access it: Gemini Live launches in “the coming months” via Google’s premium AI plan, Gemini Advanced. 

How much it costs: Gemini Advanced offers a two-month free trial period and costs $20 per month thereafter. 

But wait, what’s Project Astra? Astra is a project to build a do-everything AI agent, which was demoed at Google’s I/O conference but will not be released until later this year.

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review.

Which is better?

It’s hard to tell without having hands on the full versions of these models ourselves. Google showed off Project Astra through a polished video, whereas OpenAI opted to debut GPT-4o via a seemingly more authentic live demonstration, but in both cases, the models were asked to do things the designers likely already practiced. The real test will come when they’re debuted to millions of users with unique demands.  

That said, if you compare OpenAI’s published videos with Google’s, the two leading tools look very similar, at least in their ease of use. To generalize, GPT-4o seems to be slightly ahead on audio, demonstrating realistic voices, conversational flow, and even singing, whereas Project Astra shows off more advanced visual capabilities, like being able to “remember” where you left your glasses. OpenAI’s decision to roll out the new features more quickly might mean its product will get more use at first than Google’s, which won’t be fully available until later this year. It’s too soon to tell which model “hallucinates” false information less often or creates more useful responses.

Are they safe?

Both OpenAI and Google say their models are well tested: OpenAI says GPT-4o was evaluated by more than 70 experts in fields like misinformation and social psychology, and Google has said that Gemini “has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity.” 

But these companies are building a future where AI models search, vet, and evaluate the world’s information for us to serve up a concise answer to our questions. Even more so than with simpler chatbots, it’s wise to remain skeptical about what they tell you.

Additional reporting by Melissa Heikkilä.

A wave of retractions is shaking physics

Recent highly publicized scandals have gotten the physics community worried about its reputation—and its future. Over the last five years, several claims of major breakthroughs in quantum computing and superconducting research, published in prestigious journals, have disintegrated as other researchers found they could not reproduce the blockbuster results. 

Last week, around 50 physicists, scientific journal editors, and emissaries from the National Science Foundation gathered at the University of Pittsburgh to discuss the best way forward.“To be honest, we’ve let it go a little too long,” says physicist Sergey Frolov of the University of Pittsburgh, one of the conference organizers. 

The attendees gathered in the wake of retractions from two prominent research teams. One team, led by physicist Ranga Dias of the University of Rochester, claimed that it had invented the world’s first room temperature superconductor in a 2023 paper in Nature. After independent researchers reviewed the work, a subsequent investigation from Dias’s university found that he had fabricated and falsified his data. Nature retracted the paper in November 2023. Last year, Physical Review Letters retracted a 2021 publication on unusual properties in manganese sulfide that Dias co-authored. 

The other high-profile research team consisted of researchers affiliated with Microsoft working to build a quantum computer. In 2021, Nature retracted the team’s 2018 paper that claimed the creation of a pattern of electrons known as a Majorana particle, a long-sought breakthrough in quantum computing. Independent investigations of that research found that the researchers had cherry-picked their data, thus invalidating their findings. Another less-publicized research team pursuing Majorana particles fell to a similar fate, with Science retracting a 2017 article claiming indirect evidence of the particles in 2022.

In today’s scientific enterprise, scientists perform research and submit the work to editors. The editors assign anonymous referees to review the work, and if the paper passes review, the work becomes part of the accepted scientific record. When researchers do publish bad results, it’s not clear who should be held accountable—the referees who approved the work for publication, the journal editors who published it, or the researchers themselves. “Right now everyone’s kind of throwing the hot potato around,” says materials scientist Rachel Kurchin of Carnegie Mellon University, who attended the Pittsburgh meeting.

Much of the three-day meeting, named the International Conference on Reproducibility in Condensed Matter Physics (a field that encompasses research into various states of matter and why they exhibit certain properties), focused on the basic scientific principle that an experiment and its analysis must yield the same results when repeated. “If you think of research as a product that is paid for by the taxpayer, then reproducibility is the quality assurance department,” Frolov told MIT Technology Review. Reproducibility offers scientists a check on their work, and without it, researchers might waste time and money on fruitless projects based on unreliable prior results, he says. 

In addition to presentations and panel discussions, there was a workshop during which participants split into groups and drafted ideas for guidelines that researchers, journals, and funding agencies could follow to prioritize reproducibility in science. The tone of the proceedings stayed civil and even lighthearted at times. Physicist Vincent Mourik of Forschungszentrum Jülich, a German research institution, showed a photo of a toddler eating spaghetti to illustrate his experience investigating another team’s now-retracted experiment. ​​Occasionally the discussion almost sounded like a couples counseling session, with NSF program director Tomasz Durakiewicz asking a panel of journal editors and a researcher to reflect on their “intimate bond based on trust.”

But researchers did not shy from directly criticizing Nature, Science, and the Physical Review family of journals, all of which sent editors to attend the conference. During a panel, physicist Henry Legg of the University of Basel in Switzerland called out the journal Physical Review B for publishing a paper on a quantum computing device by Microsoft researchers that, for intellectual-property reasons, omitted information required for reproducibility. “It does seem like a step backwards,” Legg said. (Sitting in the audience, Physical Review B editor Victor Vakaryuk said that the paper’s authors had agreed to release “the remaining device parameters” by the end of the year.) 

Journals also tend to “focus on story,” said Legg, which can lead editors to be biased toward experimental results that match theoretical predictions. Jessica Thomas, the executive editor of the American Physical Society, which publishes the Physical Review journals, pushed back on Legg’s assertion. “I don’t think that when editors read papers, they’re thinking about a press release or [telling] an amazing story,” Thomas told MIT Technology Review. “I think they’re looking for really good science.” Describing science through narrative is a necessary part of communication, she says. “We feel a responsibility that science serves humanity, and if humanity can’t understand what’s in our journals, then we have a problem.” 

Frolov, whose independent review with Mourik of the Microsoft work spurred its retraction, said he and Mourik have had to repeatedly e-mail the Microsoft researchers and other involved parties to insist on data. “You have to learn how to be an asshole,” he told MIT Technology Review. “It shouldn’t be this hard.” 

At the meeting, editors pointed out that mistakes, misconduct, and retractions have always been a part of science in practice. “I don’t think that things are worse now than they have been in the past,” says Karl Ziemelis, an editor at Nature.

Ziemelis also emphasized that “retractions are not always bad.” While some retractions occur because of research misconduct, “some retractions are of a much more innocent variety—the authors having made or being informed of an honest mistake, and upon reflection, feel they can no longer stand behind the claims of the paper,” he said while speaking on a panel. Indeed, physicist James Hamlin of the University of Florida, one of the presenters and an independent reviewer of Dias’s work, discussed how he had willingly retracted a 2009 experiment published in Physical Review Letters in 2021 after another researcher’s skepticism prompted him to reanalyze the data. 

What’s new is that “the ease of sharing data has enabled scrutiny to a larger extent than existed before,” says Jelena Stajic, an editor at Science. Journals and researchers need a “more standardized approach to how papers should be written and what needs to be shared in peer review and publication,” she says.

Focusing on the scandals “can be distracting” from systemic problems in reproducibility, says attendee Frank Marsiglio, a physicist at the University of Alberta in Canada. Researchers aren’t required to make unprocessed data readily available for outside scrutiny. When Marsiglio has revisited his own published work from a few years ago, sometimes he’s had trouble recalling how his former self drew those conclusions because he didn’t leave enough documentation. “How is somebody who didn’t write the paper going to be able to understand it?” he says.

Problems can arise when researchers get too excited about their own ideas. “What gets the most attention are cases of fraud or data manipulation, like someone copying and pasting data or editing it by hand,” says conference organizer Brian Skinner, a physicist at Ohio State University. “But I think the much more subtle issue is there are cool ideas that the community wants to confirm, and then we find ways to confirm those things.”

But some researchers may publish bad data for a more straightforward reason. The academic culture, popularly described as “publish or perish,” creates an intense pressure on researchers to deliver results. “It’s not a mystery or pathology why somebody who’s under pressure in their work might misstate things to their supervisor,” said Eugenie Reich, a lawyer who represents scientific whistleblowers, during her talk.

Notably, the conference lacked perspectives from researchers based outside the US, Canada, and Europe, and from researchers at companies. In recent years, academics have flocked to companies such as Google, Microsoft, and smaller startups to do quantum computing research, and they have published their work in Nature, Science, and the Physical Review journals. Frolov says he reached out to researchers from a couple of companies, but “that didn’t work out just because of timing,” he says. He aims to include researchers from that arena in future conversations.

After discussing the problems in the field, conference participants proposed feasible solutions for sharing data to improve reproducibility. They discussed how to persuade the community to view data sharing positively, rather than seeing the demand for it as a sign of distrust. They also brought up the practical challenges of asking graduate students to do even more work by preparing their data for outside scrutiny when it may already take them over five years to complete their degree. Meeting participants aim to publicly release a paper with their suggestions. “I think trust in science will ultimately go up if we establish a robust culture of shareable, reproducible, replicable results,” says Frolov. 

Sophia Chen is a science writer based in Columbus, Ohio. She has written for the society that publishes the Physical Review journals, and for the news section of Nature

Google’s Astra is its first AI-for-everything agent

Google is set to introduce a new system called Astra later this year and promises that it will be the most powerful, advanced type of AI assistant it’s ever launched. 

The current generation of AI assistants, such as ChatGPT, can retrieve information and offer answers, but that is about it. But this year, Google is rebranding its assistants as more advanced “agents,” which it says could  show reasoning, planning, and memory skills and are able to take multiple steps to execute tasks. 

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review

“We are in very early days [of AI agent development],” Google CEO Sundar Pichai said on a call ahead of Google’s I/O conference today. 

“We’ve always wanted to build a universal agent that will be useful in everyday life,” said Demis Hassabis, the CEO and cofounder of Google DeepMind. “Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction feel much more natural.” That, he says, is what Astra will be. 

Google’s announcement comes a day after competitor OpenAI unveiled its own supercharged AI assistant, GPT-4o. Google DeepMind’s Astra responds to audio and video inputs, much in the same way as GPT-4o (albeit it less flirtatiously). 

In a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighborhood do you think I’m in?” the AI system was able to identify King’s Cross, London, site of Google DeepMind’s headquarters. It was also able to say that the person’s glasses were on a desk, having recorded them earlier in the interaction. 

The demo showcases Google DeepMind’s vision of multimodal AI (which can handle multiple types of input—voice, video, text, and so on) working in real time, Vinyals says. 

“We are very excited about, in the future, to be able to really just get closer to the user, assist the user with anything that they want,” he says. Google recently upgraded its artificial-intelligence model Gemini to process even larger amounts of data, an upgrade which helps it handle bigger documents and videos, and have longer conversations. 

Tech companies are in the middle of a fierce competition over AI supremacy, and  AI agents are the latest effort from Big Tech firms to show they are pushing the frontier of development. Agents also play into a narrative by many tech companies, including OpenAI and Google DeepMind, that aim to build artificial general intelligence, a highly hypothetical idea of superintelligent AI systems. 

“Eventually, you’ll have this one agent that really knows you well, can do lots of things for you, and can work across multiple tasks and domains,” says Chirag Shah, a professor at the University of Washington who specializes in online search.

This vision is still aspirational. But today’s announcement should be seen as Google’s attempt to keep up with competitors. And by rushing these products out, Google can collect even more data from its over a billion users on how they are using their models and what works, Shah says.

Google is unveiling many more new AI capabilities beyond agents today. It’s going to integrate AI more deeply into Search through a new feature called AI overviews, which gather information from the internet and package them into short summaries in response to search queries. The feature, which launches today, will initially be available only in the US, with more countries to gain access later. 

This will help speed up the search process and get users more specific answers to more complex, niche questions, says Felix Simon, a research fellow in AI and digital news at the Reuters Institute for Journalism. “I think that’s where Search has always struggled,” he says. 

Another new feature of Google’s AI Search offering is better planning. People will soon be able to ask Search to make meal and travel suggestions, for example, much like asking a travel agent to suggest restaurants and hotels. Gemini will be able to help them plan what they need to do or buy to cook recipes, and they will also be able to have conversations with the AI system, asking it to do anything from relatively mundane tasks, such as informing them about the weather forecast, to highly complex ones like helping them prepare for a job interview or an important speech. 

People will also be able to interrupt Gemini midsentence and ask clarifying questions, much as in a real conversation. 

In another move to one-up competitor OpenAI, Google also unveiled Veo, a new video-generating AI system. Veo is able to generate short videos and allows users more control over cinematic styles by understanding prompts like “time lapse” or “aerial shots of a landscape.”

Google has a significant advantage when it comes to training generative video models, because it owns YouTube. It’s already announced collaborations with artists such as Donald Glover and Wycleaf Jean, who are using its technology to produce their work. 

Earlier this year, OpenA’s CTO, Mira Murati, fumbled when asked about whether the company’s model was trained on YouTube data. Douglas Eck, senior research director at Google DeepMind, was also vague about the training data used to create Veo when asked about by MIT Technology Review, but he said that it “may be trained on some YouTube content in accordance with our agreements with YouTube creators.”

On one hand, Google is presenting its generative AI as a tool artists can use to make stuff, but the tools likely get their ability to create that stuff by using material from existing artists, says Shah. AI companies such as Google and OpenAI have faced a slew of lawsuits by writers and artists claiming that their intellectual property has been used without consent or compensation.  

“For artists it’s a double-edged sword,” says Shah.