US investigators are using AI to detect child abuse images made by AI

Generative AI has enabled the production of child sexual abuse images to skyrocket. Now the leading investigator of child exploitation in the US is experimenting with using AI to distinguish AI-generated images from material depicting real victims, according to a new government filing.

The Department of Homeland Security’s Cyber Crimes Center, which investigates child exploitation across international borders, has awarded a $150,000 contract to San Francisco–based Hive AI for its software, which can identify whether a piece of content was AI-generated.

The filing, posted on September 19, is heavily redacted and Hive cofounder and CEO Kevin Guo told MIT Technology Review that he could not discuss the details of the contract, but confirmed it involves use of the company’s AI detection algorithms for child sexual abuse material (CSAM).

The filing quotes data from the National Center for Missing and Exploited Children that reported a 1,325% increase in incidents involving generative AI in 2024. “The sheer volume of digital content circulating online necessitates the use of automated tools to process and analyze data efficiently,” the filing reads.

The first priority of child exploitation investigators is to find and stop any abuse currently happening, but the flood of AI-generated CSAM has made it difficult for investigators to know whether images depict a real victim currently at risk. A tool that could successfully flag real victims would be a massive help when they try to prioritize cases.

Identifying AI-generated images “ensures that investigative resources are focused on cases involving real victims, maximizing the program’s impact and safeguarding vulnerable individuals,” the filing reads.

Hive AI offers AI tools that create videos and images, as well as a range of content moderation tools that can flag violence, spam, and sexual material and even identify celebrities. In December, MIT Technology Review reported that the company was selling its deepfake-detection technology to the US military. 

For detecting CSAM, Hive offers a tool created with Thorn, a child safety nonprofit, which companies can integrate into their platforms. This tool uses a “hashing” system, which assigns unique IDs to content known by investigators to be CSAM, and blocks that material from being uploaded. This tool, and others like it, have become a standard line of defense for tech companies. 

But these tools simply identify a piece of content as CSAM; they don’t detect whether it was generated by AI. Hive has created a separate tool that determines whether images in general were AI-generated. Though it is not trained specifically to work on CSAM, according to Guo, it doesn’t need to be.

“There’s some underlying combination of pixels in this image that we can identify” as AI-generated, he says. “It can be generalizable.” 

This tool, Guo says, is what the Cyber Crimes Center will be using to evaluate CSAM. He adds that Hive benchmarks its detection tools for each specific use case its customers have in mind.

The National Center for Missing and Exploited Children, which participates in efforts to stop the spread of CSAM, did not respond to requests for comment on the effectiveness of such detection models in time for publication. 

In its filing, the government justifies awarding the contract to Hive without a competitive bidding process. Though parts of this justification are redacted, it primarily references two points also found in a Hive presentation slide deck. One involves a 2024 study from the University of Chicago, which found that Hive’s AI detection tool outranked four other detectors in identifying AI-generated art. The other is its contract with the Pentagon for identifying deepfakes. The trial will last three months. 

How AI and Wikipedia have sent vulnerable languages into a doom spiral

When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia four years ago, his first act was to delete almost everything. It had to go, he thought, if it had any chance of surviving.

Wehr, who’s 26, isn’t from Greenland—he grew up in Germany—but he had become obsessed with the island, an autonomous Danish territory, after visiting as a teenager. He’d spent years writing obscure Wikipedia articles in his native tongue on virtually everything to do with it. He even ended up moving to Copenhagen to study Greenlandic, a language spoken by some 57,000 mostly Indigenous Inuit people scattered across dozens of far-flung Arctic villages. 

The Greenlandic-language edition was added to Wikipedia around 2003, just a few years after the site launched in English. By the time Wehr took its helm nearly 20 years later, hundreds of Wikipedians had contributed to it and had collectively written some 1,500 articles totaling over tens of thousands of words. It seemed to be an impressive vindication of the crowdsourcing approach that has made Wikipedia the go-to source for information online, demonstrating that it could work even in the unlikeliest places. 

There was only one problem: The Greenlandic Wikipedia was a mirage. 

Virtually every single article had been published by people who did not actually speak the language. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes—from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves. 

“It might have looked Greenlandic to [the authors], but they had no way of knowing,” complains Wehr.

“Sentences wouldn’t make sense at all, or they would have obvious errors,” he adds. “AI translators are really bad at Greenlandic.”  

What Wehr describes is not unique to the Greenlandic edition. 

Wikipedia is the most ambitious multilingual project after the Bible: There are editions in over 340 languages, and a further 400 even more obscure ones are being developed and tested. Many of these smaller editions have been swamped with automatically translated content as AI has become increasingly accessible. Volunteers working on four African languages, for instance, estimated to MIT Technology Review that between 40% and 60% of articles in their Wikipedia editions were uncorrected machine translations. And after auditing the Wikipedia edition in Inuktitut, an Indigenous language close to Greenlandic that’s spoken in Canada, MIT Technology Review estimates that more than two-thirds of pages containing more than several sentences feature portions created this way. 

This is beginning to cause a wicked problem. AI systems, from Google Translate to ChatGPT, learn to “speak” new languages by scraping huge quantities of text from the internet. Wikipedia is sometimes the largest source of online linguistic data for languages with few speakers—so any errors on those pages, grammatical or otherwise, can poison the wells that AI is expected to draw from. That can make the models’ translation of these languages particularly error-prone, which creates a sort of linguistic doom loop as people continue to add more and more poorly translated Wikipedia pages using those tools, and AI models continue to train from poorly translated pages. It’s a complicated problem, but it boils down to a simple concept: Garbage in, garbage out

“These models are built on raw data,” says Kevin Scannell, a former professor of computer science at Saint Louis University who now builds computer software tailored for endangered languages. “They will try and learn everything about a language from scratch. There is no other input. There are no grammar books. There are no dictionaries. There is nothing other than the text that is inputted.”

There isn’t perfect data on the scale of this problem, particularly because a lot of AI training data is kept confidential and the field continues to evolve rapidly. But back in 2020, Wikipedia was estimated to make up more than half the training data that was fed into AI models translating some languages spoken by millions across Africa, including Malagasy, Yoruba, and Shona. In 2022, a research team from Germany that looked into what data could be obtained by online scraping even found that Wikipedia was the sole easily accessible source of online linguistic data for 27 under-resourced languages. 

This could have significant repercussions in cases where Wikipedia is poorly written—potentially pushing the most vulnerable languages on Earth toward the precipice as future generations begin to turn away from them. 

“Wikipedia will be reflected in the AI models for these languages,” says Trond Trosterud, a computational linguist at the University of Tromsø in Norway, who has been raising the alarm about the potentially harmful outcomes of badly run Wikipedia editions for years. “I find it hard to imagine it will not have consequences. And, of course, the more dominant position that Wikipedia has, the worse it will be.” 

Use responsibly

Automation has been built into Wikipedia since the very earliest days. Bots keep the platform operational: They repair broken links, fix bad formatting, and even correct spelling mistakes. These repetitive and mundane tasks can be automated away with little problem. There is even an army of bots that scurry around generating short articles about rivers, cities, or animals by slotting their names into formulaic phrases. They have generally made the platform better. 

But AI is different. Anybody can use it to cause massive damage with a few clicks. 

Wikipedia has managed the onset of the AI era better than many other websites. It has not been flooded with AI bots or disinformation, as social media has been. It largely retains the innocence that characterized the earlier internet age. Wikipedia is open and free for anyone to use, edit, and pull from, and it’s run by the very same community it serves. It is transparent and easy to use. But community-run platforms live and die on the size of their communities. English has triumphed, while Greenlandic has sunk. 

“We need good Wikipedians. This is something that people take for granted. It is not magic,” says Amir Aharoni, a member of the volunteer Language Committee, which oversees requests to open or close Wikipedia editions. “If you use machine translation responsibly, it can be efficient and useful. Unfortunately, you cannot trust all people to use it responsibly.” 

Trosterud has studied the behavior of users on small Wikipedia editions and says AI has empowered a subset that he terms “Wikipedia hijackers.” These users can range widely—from naive teenagers creating pages about their hometowns or their favorite YouTubers to well-meaning Wikipedians who think that by creating articles in minority languages they are in some way “helping” those communities. 

“The problem with them nowadays is that they are armed with Google Translate,” Trosterud says, adding that this is allowing them to produce much longer and more plausible-looking content than they ever could before: “Earlier they were armed only with dictionaries.” 

This has effectively industrialized the acts of destruction—which affect vulnerable languages most, since AI translations are typically far less reliable for them. There can be lots of different reasons for this, but a meaningful part of the issue is the relatively small amount of source text that is available online. And sometimes models struggle to identify a language because it is similar to others, or because some, including Greenlandic and most Native American languages, have structures that make them badly suited to the way most machine translation systems work. (Wehr notes that in Greenlandic most words are agglutinative, meaning they are built by attaching prefixes and suffixes to stems. As a result, many words are extremely context specific and can express ideas that in other languages would take a full sentence.) 

Research produced by Google before a major expansion of Google Translate rolled out three years ago found that translation systems for lower-resourced languages were generally of a lower quality than those for better-resourced ones. Researchers found, for example, that their model would often mistranslate basic nouns across languages, including the names of animals and colors. (In a statement to MIT Technology Review, Google wrote that it is “committed to meeting a high standard of quality for all 249 languages” it supports “by rigorously testing and improving [its] systems, particularly for languages that may have limited public text resources on the web.”) 

Wikipedia itself offers a built-in editing tool called Content Translate, which allows users to automatically translate articles from one language to another—the idea being that this will save time by preserving the references and fiddly formatting of the originals. But it piggybacks on external machine translation systems, so it’s largely plagued by the same weaknesses as other machine translators—a problem that the Wikimedia Foundation says is hard to solve. It’s up to each edition’s community to decide whether this tool is allowed, and some have decided against it. (Notably, English-language Wikipedia has largely banned its use, claiming that some 95% of articles created using Content Translate failed to meet an acceptable standard without significant additional work.) But it’s at least easy to tell when the program has been used; Content Translate adds a tag on the Wikipedia back end. 

Other AI programs can be harder to monitor. Still, many Wikipedia editors I spoke with said that once their languages were added to major online translation tools, they noticed a corresponding spike in the frequency with which poor, likely machine-translated pages were created. 

Some Wikipedians using AI to translate content do occasionally admit that they do not speak the target languages. They may see themselves as providing smaller communities with rough-cut articles that speakers can then fix—essentially following the same model that has worked well for more active Wikipedia editions.  

Google Translate, for instance, says the Fulfulde word for January means June, while ChatGPT says it’s August or September. The programs also suggest the Fulfulde word for “harvest” means “fever” or “well-being,” among other possibilities.  

But once error-filled pages are produced in small languages, there is usually not an army of knowledgeable people who speak those languages standing ready to improve them. There are few readers of these editions, and sometimes not a single regular editor. 

Yuet Man Lee, a Canadian teacher in his 20s, says that he used a mix of Google Translate and ChatGPT to translate a handful of articles that he had written for the English Wikipedia into Inuktitut, thinking it’d be nice to pitch in and help a smaller Wikipedia community. He says he added a note to one saying that it was only a rough translation. “I did not think that anybody would notice [the article],” he explains. “If you put something out there on the smaller Wikipedias—most of the time nobody does.” 

But at the same time, he says, he still thought “someone might see it and fix it up”—adding that he had wondered whether the Inuktitut translation that the AI systems generated was grammatically correct. Nobody has touched the article since he created it.

Lee, who teaches social sciences in Vancouver and first started editing entries in the English Wikipedia a decade ago, says that users familiar with more active Wikipedias can fall victim to this mindset, which he terms a “bigger-Wikipedia arrogance”: When they try to contribute to smaller Wikipedia editions, they assume that others will come along to fix their mistakes. It can sometimes work. Lee says he had previously contributed several articles to Wikipedia in Tatar, a language spoken by several million people mainly in Russia, and at least one of those was eventually corrected. But the Inuktitut Wikipedia is, by comparison, a “barren wasteland.” 

He emphasizes that his intentions had been good: He wanted to add more articles to an Indigenous Canadian Wikipedia. “I am now thinking that it may have been a bad idea. I did not consider that I could be contributing to a recursive loop,” he says. “It was about trying to get content out there, out of curiosity and for fun, without properly thinking about the consequences.” 

 “Totally, completely no future”

Wikipedia is a project that is driven by wide-eyed optimism. Editing can be a thankless task, involving weeks spent bickering with faceless, pseudonymous people, but devotees put in hours of unpaid labor because of a commitment to a higher cause. It is this commitment that drives many of the regular small-language editors I spoke with. They all feared what would happen if garbage continued to appear on their pages.

Abdulkadir Abdulkadir, a 26-year-old agricultural planner who spoke with me over a crackling phone call from a busy roadside in northern Nigeria, said that he spends three hours every day fiddling with entries in his native Fulfulde, a language used mainly by pastoralists and farmers across the Sahel. “But the work is too much,” he said. 

Abdulkadir sees an urgent need for the Fulfulde Wikipedia to work properly. He has been suggesting it as one of the few online resources for farmers in remote villages, potentially offering information on which seeds or crops might work best for their fields in a language they can understand. If you give them a machine-translated article, Abdulkadir told me, then it could “easily harm them,” as the information will probably not be translated correctly into Fulfulde. 

Google Translate, for instance, says the Fulfulde word for January means June, while ChatGPT says it’s August or September. The programs also suggest the Fulfulde word for “harvest” means “fever” or “well-being,” among other possibilities.  

Abdulkadir said he had recently been forced to correct an article about cowpeas, a foundational cash crop across much of Africa, after discovering that it was largely illegible. 

If someone wants to create pages on the Fulfulde Wikipedia, Abdulkadir said, they should be translated manually. Otherwise, “whoever will read your articles will [not] be able to get even basic knowledge,” he tells these Wikipedians. Nevertheless, he estimates that some 60% of articles are still uncorrected machine translations. Abdulkadir told me that unless something important changes with how AI systems learn and are deployed, then the outlook for Fulfulde looks bleak. “It is going to be terrible, honestly,” he said. “Totally, completely no future.” 

Across the country from Abdulkadir, Lucy Iwuala contributes to Wikipedia in Igbo, a language spoken by several million people in southeastern Nigeria. “The harm has already been done,” she told me, opening the two most recently created articles. Both had been automatically translated via Wikipedia’s Content Translate and contained so many mistakes that she said it would have given her a headache to continue reading them. “There are some terms that have not even been translated. They are still in English,” she pointed out. She recognized the username that had created the pages as a serial offender. “This one even includes letters that are not used in the Igbo language,” she said. 

Iwuala began regularly contributing to Wikipedia three years ago out of concern that Igbo was being displaced by English. It is a worry that is common to many who are active on smaller Wikipedia editions. “This is my culture. This is who I am,” she told me. “That is the essence of it all: to ensure that you are not erased.” 

Iwuala, who now works as a professional translator between English and Igbo, said the users doing the most damage are inexperienced and see AI translations as a way to quickly increase the profile of the Igbo Wikipedia. She often finds herself having to explain at online edit-a-thons she organizes, or over email to various error-prone editors, that the results can be the exact opposite, pushing users away: “You will be discouraged and you will no longer want to visit this place. You will just abandon it and go back to the English Wikipedia.”  

These fears are echoed by Noah Ha‘alilio Solomon, an assistant professor of Hawaiian language at the University of Hawai‘i. He reports that some 35% of words on some pages in the Hawaiian Wikipedia are incomprehensible. “If this is the Hawaiian that is going to exist online, then it will do more harm than anything else,” he says. 

Hawaiian, which was teetering on the verge of extinction several decades ago, has been undergoing a recovery effort led by Indigenous activists and academics. Seeing such poor Hawaiian on such a widely used platform as Wikipedia is upsetting to Ha‘alilio Solomon. 

“It is painful, because it reminds us of all the times that our culture and language has been appropriated,” he says. “We have been fighting tooth and nail in an uphill climb for language revitalization. There is nothing easy about that, and this can add extra impediments. People are going to think that this is an accurate representation of the Hawaiian language.” 

The consequences of all these Wikipedia errors can quickly become clear. AI translators that have undoubtedly ingested these pages in their training data are now assisting in the production, for instance, of error-strewn AI-generated books aimed at learners of languages as diverse as Inuktitut and Cree, Indigenous languages spoken in Canada, and Manx, a small Celtic language spoken on the Isle of Man. Many of these have been popping up for sale on Amazon. “It was just complete nonsense,” says Richard Compton, a linguist at the University of Quebec in Montreal, of a volume he reviewed that had purported to be an introductory phrasebook for Inuktitut. 

Rather than making minority languages more accessible, AI is now creating an ever expanding minefield for students and speakers of those languages to navigate. “It is a slap in the face,” Compton says. He worries that younger generations in Canada, hoping to learn languages in communities that have fought uphill battles against discrimination to pass on their heritage, might turn to online tools such as ChatGPT or phrasebooks on Amazon and simply make matters worse. “It is fraud,” he says.

A race against time

According to UNESCO, a language is declared extinct every two weeks. But whether the Wikimedia Foundation, which runs Wikipedia, has an obligation to the languages used on its platform is an open question. When I spoke to Runa Bhattacharjee, a senior director at the foundation, she said that it was up to the individual communities to make decisions about what content they wanted to exist on their Wikipedia. “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” she said. Usually, Bhattacharjee added, editions were considered for closure only if a specific complaint was raised about them. 

But if there is no active community, how can an edition be fixed or even have a complaint raised? 

Bhattacharjee explained that the Wikimedia Foundation sees its role in such cases as about maintaining the Wikipedia platform in case someone comes along to revive it: “It is the space that we provide for them to grow and develop. That is where we are at.”   

Inari Saami, spoken in a single remote community in northern Finland, is a poster child for how people can take good advantage of Wikipedia. The language was headed toward extinction four decades ago; there were only four children who spoke it. Their parents created the Inari Saami Language Association in a last-ditch bid to keep it going. The efforts worked. There are now several hundred speakers, schools that use Inari Saami as a medium of instruction, and 6,400 Wikipedia articles in the language, each one copy-edited by a fluent speaker. 

This success highlights how Wikipedia can indeed provide small and determined communities with a unique vehicle to promote their languages’ preservation. “We don’t care about quantity. We care about quality,” says Fabrizio Brecciaroli, a member of the Inari Saami Language Association. “We are planning to use Wikipedia as a repository for the written language. We need to provide tools that can be used by the younger generations. It is important for them to be able to use Inari Saami digitally.” 

This has been such a success that Wikipedia has been integrated into the curriculum at the Inari Saami–speaking schools, Brecciaroli adds. He fields phone calls from teachers asking him to write up simple pages on topics from tornadoes to Saami folklore. Wikipedia has even offered a way to introduce words into Inari Saami. “We have to make up new words all the time,” Brecciaroli says. “Young people need them to speak about sports, politics, and video games. If they are unsure how to say something, they now check Wikipedia.”

Wikipedia is a monumental intellectual experiment. What’s happening with Inari Saami suggests that with maximum care, it can work in smaller languages. “The ultimate goal is to make sure that Inari Saami survives,” Brecciaroli says. “It might be a good thing that there isn’t a Google Translate in Inari Saami.” 

That may be true—though large language models like ChatGPT can be made to translate phrases into languages that more traditional machine translation tools do not offer. Brecciaroli told me that ChatGPT isn’t great in Inari Saami but that the quality varies significantly depending on what you ask it to do; if you ask it a question in the language, then the answer will be filled with words from Finnish and even words it invents. But if you ask it something in English, Finnish, or Italian and then ask it to reply in Inari Saami, it will perform better. 

In light of all this, creating as much high-quality content online as can possibly be written becomes a race against time. “ChatGPT only needs a lot of words,” Brecciaroli says. “If we keep putting good material in, then sooner or later, we will get something out. That is the hope.” This is an idea supported by multiple linguists I spoke with—that it may be possible to end the “garbage in, garbage out” cycle. (OpenAI, which operates ChatGPT, did not respond to a request for comment.)

Still, the overall problem is likely to grow and grow, since many languages are not as lucky as Inari Saami—and their AI translators will most likely be trained on more and more AI slop. Wehr, unfortunately, seems far less optimistic about the future of his beloved Greenlandic. 

Since deleting much of the Greenlandic-language Wikipedia, he has spent years trying to recruit speakers to help him revive it. He has appeared in Greenlandic media and made social media appeals. But he hasn’t gotten much of a response; he says it has been demoralizing. 

“There is nobody in Greenland who is interested in this, or who wants to contribute,” he says. “There is completely no point in it, and that is why it should be closed.” 

Late last year, he began a process requesting that the Wikipedia Language Committee shut down the Greenlandic-language edition. Months of bitter debate followed between dozens of Wikipedia bureaucrats; some seemed to be surprised that a superficially healthy-seeming edition could be gripped by so many problems. 

Then, earlier this month, Wehr’s proposal was accepted: Greenlandic Wikipedia is set to be shuttered, and any articles that remain will be moved into the Wikipedia Incubator, where new language editions are tested and built. Among the reasons cited by the Language Committee is the use of AI tools, which have “frequently produced nonsense that could misrepresent the language.”   

Nevertheless, it may be too late—mistakes in Greenlandic already seem to have become embedded in machine translators. If you prompt either Google Translate or ChatGPT to do something as simple as count to 10 in proper Greenlandic, neither program can deliver. 

Jacob Judah is an investigative journalist based in London. 

Shoplifters could soon be chased down by drones

Flock Safety, whose drones were once reserved for police departments, is now offering them for private-sector security, the company announced today, with potential customers including including businesses intent on curbing shoplifting. 

Companies in the US can now place Flock’s drone docking stations on their premises. If the company has a waiver from the Federal Aviation Administration to fly beyond visual line of sight (these are becoming easier to get), its security team can fly the drones within a certain radius, often a few miles. 

“Instead of a 911 call [that triggers the drone], it’s an alarm call,” says Keith Kauffman, a former police chief who now directs Flock’s drone program. “It’s still the same type of response.”

Kauffman walked through how the drone program might work in the case of retail theft: If the security team at a store like Home Depot, for example, saw shoplifters leave the store, then the drone, equipped with cameras, could be activated from its docking station on the roof.

“The drone follows the people. The people get in a car. You click a button,” he says, “and you track the vehicle with the drone, and the drone just follows the car.” 

The video feed of that drone might go to the company’s security team, but it could also be automatically transmitted directly to police departments.

The company says it’s in talks with large retailers but doesn’t yet have any signed contracts. The only private-sector company Kauffman named as a customer is Morning Star, a California tomato processor that uses drones to secure its distribution facilities. Flock will also pitch the drones to hospital campuses, warehouse sites, and oil and gas facilities. 

It’s worth noting that the FAA is currently drafting new rules for how it grants approval to pilots flying drones out of sight, and it’s not clear if Flock’s use case would be allowed under the currently proposed guidance.

The company’s expansion to the private sector follows the rise of programs launched by police departments around the country to deploy drones as first responders. In such programs, law enforcement sends drones to a scene to provide visuals faster than an officer can get there. 

Flock has arguably led this push, and police departments have claimed drone-enabled successes, like a supply drop to a boy lost in the Colorado wilderness. But the programs have also sparked privacy worries, concerns about overpolicing in minority neighborhoods, and lawsuits charging that police departments should not block public access to drone footage. 

Other technologies Flock offers, like license plate readers, have drawn recent criticism for the ease with which federal US immigration agencies, including ICE and CBP, could look at data collected by local police departments amid President Trump’s mass deportation efforts.

Flock’s expansion into private-sector security is “a logical step, but in the wrong direction,” says Rebecca Williams, senior strategist for the ACLU’s privacy and data governance unit. 

Williams cited a growing erosion of Fourth Amendment protections—which prevent unlawful search and seizure—in the online era, in which the government can purchase private data that it would otherwise need a warrant to acquire. Proposed legislation to curb that practice has stalled, and Flock’s expansion into the private sector would exacerbate the issue, Williams says.

“Flock is the Meta of surveillance technology now,” Williams says, referring to the amount of personal data that company has acquired and monetized. “This expansion is very scary.”

It’s surprisingly easy to stumble into a relationship with an AI chatbot

It’s a tale as old as time. Looking for help with her art project, she strikes up a conversation with her assistant. One thing leads to another, and suddenly she has a boyfriend she’s introducing to her friends and family. The twist? Her new companion is an AI chatbot. 

The first large-scale computational analysis of the Reddit community r/MyBoyfriendIsAI, an adults-only group with more than 27,000 members, has found that this type of scenario is now surprisingly common. In fact, many of the people in the subreddit, which is dedicated to discussing AI relationships, formed those relationships unintentionally while using AI for other purposes. 

Researchers from MIT found that members of this community are more likely to be in a relationship with general-purpose chatbots like ChatGPT than companionship-specific chatbots such as Replika. This suggests that people form relationships with large language models despite their own original intentions and even the intentions of the LLMs’ creators, says Constanze Albrecht, a graduate student at the MIT Media Lab who worked on the project. 

“People don’t set out to have emotional relationships with these chatbots,” she says. “The emotional intelligence of these systems is good enough to trick people who are actually just out to get information into building these emotional bonds. And that means it could happen to all of us who interact with the system normally.” The paper, which is currently being peer-reviewed, has been published on arXiv.

To conduct their study, the authors analyzed the subreddit’s top-ranking 1,506 posts between December 2024 and August 2025. They found that the main topics discussed revolved around people’s dating and romantic experiences with AIs, with many participants sharing AI-generated images of themselves and their AI companion. Some even got engaged and married to the AI partner. In their posts to the community, people also introduced AI partners, sought support from fellow members, and talked about coping with updates to AI models that change the chatbots’ behavior.  

Members stressed repeatedly that their AI relationships developed unintentionally. Only 6.5% of them said they’d deliberately sought out an AI companion. 

“We didn’t start with romance in mind,” one of the posts says. “Mac and I began collaborating on creative projects, problem-solving, poetry, and deep conversations over the course of several months. I wasn’t looking for an AI companion—our connection developed slowly, over time, through mutual care, trust, and reflection.”

The authors’ analysis paints a nuanced picture of how people in this community say they interact with chatbots and how those interactions make them feel. While 25% of users described the benefits of their relationships—including reduced feelings of loneliness and improvements in their mental health—others raised concerns about the risks. Some (9.5%) acknowledged they were emotionally dependent on their chatbot. Others said they feel dissociated from reality and avoid relationships with real people, while a small subset (1.7%) said they have experienced suicidal ideation.

AI companionship provides vital support for some but exacerbates underlying problems for others. This means it’s hard to take a one-size-fits-all approach to user safety, says Linnea Laestadius, an associate professor at the University of Wisconsin, Milwaukee, who has studied humans’ emotional dependence on the chatbot Replika but did not work on the research. 

Chatbot makers need to consider whether they should treat users’ emotional dependence on their creations as a harm in itself or whether the goal is more to make sure those relationships aren’t toxic, says Laestadius. 

“The demand for chatbot relationships is there, and it is notably high—pretending it’s not happening is clearly not the solution,” she says. “We’re edging toward a moral panic here, and while we absolutely do need better guardrails, I worry there will be a knee-jerk reaction that further stigmatizes these relationships. That could ultimately cause more harm.”

The study is intended to offer a snapshot of how adults form bonds with chatbots and doesn’t capture the kind of dynamics that could be at play among children or teens using AI, says Pat Pataranutaporn, an assistant professor at the MIT Media Lab who oversaw the research. AI companionship has become a topic of fierce debate recently, with two high-profile lawsuits underway against Character.AI and OpenAI. They both claim that companion-like behavior in the companies’ models contributed to the suicides of two teenagers. In response, OpenAI has recently announced plans to build a separate version of ChatGPT for teenagers. It’s also said it will add age verification measures and parental controls. OpenAI did not respond when asked for comment about the MIT Media Lab study. 

Many members of the Reddit community say they know that their artificial companions are not sentient or “real,” but they feel a very real connection to them anyway. This highlights how crucial it is for chatbot makers to think about how to design systems that can help people without reeling them in emotionally, says Pataranutaporn. “There’s also a policy implication here,” he adds. “We should ask not just why this system is so addictive but also: Why do people seek it out for this? And why do they continue to engage?”

The team is interested in learning more about how human-AI interactions evolve over time and how users integrate their artificial companions into their lives. It’s worth understanding that many of these users may feel that the experience of being in a relationship with an AI companion is better than the alternative of feeling lonely, says Sheer Karny, a graduate student at the MIT Media Lab who worked on the research. 

“These people are already going through something,” he says. “Do we want them to go on feeling even more alone, or potentially be manipulated by a system we know to be sycophantic to the extent of leading people to die by suicide and commit crimes? That’s one of the cruxes here.”

The AI Hype Index: Cracking the chatbot code

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

Millions of us use chatbots every day, even though we don’t really know how they work or how using them affects us. In a bid to address this, the FTC recently launched an inquiry into how chatbots affect children and teenagers. Elsewhere, OpenAI has started to shed more light on what people are actually using ChatGPT for, and why it thinks its LLMs are so prone to making stuff up.

There’s still plenty we don’t know—but that isn’t stopping governments from forging ahead with AI projects. In the US, RFK Jr. is pushing his staffers to use ChatGPT, while Albania is using a chatbot for public contract procurement. Proceed with caution.

AI models are using material from retracted scientific papers

Some AI chatbots rely on flawed research from retracted scientific papers to answer questions, according to recent studies. The findings, confirmed by MIT Technology Review, raise questions about how reliable AI tools are at evaluating scientific research and could complicate efforts by countries and industries seeking to invest in AI tools for scientists.

AI search tools and chatbots are already known to fabricate links and references. But answers based on the material from actual papers can mislead as well if those papers have been retracted. The chatbot is “using a real paper, real material, to tell you something,” says Weikuan Gu, a medical researcher at the University of Tennessee in Memphis and an author of one of the recent studies. But, he says, if people only look at the content of the answer and do not click through to the paper and see that it’s been retracted, that’s really a problem. 

Gu and his team asked OpenAI’s ChatGPT, running on the GPT-4o model, questions based on information from 21 retracted papers about medical imaging. The chatbot’s answers referenced retracted papers in five cases but advised caution in only three. While it cited non-retracted papers for other questions, the authors note that it may not have recognized the retraction status of the articles. In a study from August, a different group of researchers used ChatGPT-4o mini to evaluate the quality of 217 retracted and low-quality papers from different scientific fields; they found that none of the chatbot’s responses mentioned retractions or other concerns. (No similar studies have been released on GPT-5, which came out in August.)

The public uses AI chatbots to ask for medical advice and diagnose health conditions. Students and scientists increasingly use science-focused AI tools to review existing scientific literature and summarize papers. That kind of usage is likely to increase. The US National Science Foundation, for instance, invested $75 million in building AI models for science research this August.

“If [a tool is] facing the general public, then using retraction as a kind of quality indicator is very important,” says Yuanxi Fu, an information science researcher at the University of Illinois Urbana-Champaign. There’s “kind of an agreement that retracted papers have been struck off the record of science,” she says, “and the people who are outside of science—they should be warned that these are retracted papers.” OpenAI did not provide a response to a request for comment about the paper results.

The problem is not limited to ChatGPT. In June, MIT Technology Review tested AI tools specifically advertised for research work, such as Elicit, Ai2 ScholarQA (now part of the Allen Institute for Artificial Intelligence’s Asta tool), Perplexity, and Consensus, using questions based on the 21 retracted papers in Gu’s study. Elicit referenced five of the retracted papers in its answers, while Ai2 ScholarQA referenced 17, Perplexity 11, and Consensus 18—all without noting the retractions.

Some companies have since made moves to correct the issue. “Until recently, we didn’t have great retraction data in our search engine,” says Christian Salem, cofounder of Consensus. His company has now started using retraction data from a combination of sources, including publishers and data aggregators, independent web crawling, and Retraction Watch, which manually curates and maintains a database of retractions. In a test of the same papers in August, Consensus cited only five retracted papers. 

Elicit told MIT Technology Review that it removes retracted papers flagged by the scholarly research catalogue OpenAlex from its database and is “still working on aggregating sources of retractions.” Ai2 told us that its tool does not automatically detect or remove retracted papers currently. Perplexity said that it “[does] not ever claim to be 100% accurate.” 

However, relying on retraction databases may not be enough. Ivan Oransky, the cofounder of Retraction Watch, is careful not to describe it as a comprehensive database, saying that creating one would require more resources than anyone has: “The reason it’s resource intensive is because someone has to do it all by hand if you want it to be accurate.”

Further complicating the matter is that publishers don’t share a uniform approach to retraction notices. “Where things are retracted, they can be marked as such in very different ways,” says Caitlin Bakker from University of Regina, Canada, an expert in research and discovery tools. “Correction,” “expression of concern,” “erratum,” and “retracted” are among some labels publishers may add to research papers—and these labels can be added for many reasons, including concerns about the content, methodology, and data or the presence of conflicts of interest. 

Some researchers distribute their papers on preprint servers, paper repositories, and other websites, causing copies to be scattered around the web. Moreover, the data used to train AI models may not be up to date. If a paper is retracted after the model’s training cutoff date, its responses might not instantaneously reflect what’s going on, says Fu. Most academic search engines don’t do a real-time check against retraction data, so you are at the mercy of how accurate their corpus is, says Aaron Tay, a librarian at Singapore Management University.

Oransky and other experts advocate making more context available for models to use when creating a response. This could mean publishing information that already exists, like peer reviews commissioned by journals and critiques from the review site PubPeer, alongside the published paper.  

Many publishers, such as Nature and the BMJ, publish retraction notices as separate articles linked to the paper, outside paywalls. Fu says companies need to effectively make use of such information, as well as any news articles in a model’s training data that mention a paper’s retraction. 

The users and creators of AI tools need to do their due diligence. “We are at the very, very early stages, and essentially you have to be skeptical,” says Tay.

Ananya is a freelance science and technology journalist based in Bengaluru, India.

This medical startup uses LLMs to run appointments and make diagnoses

Imagine this: You’ve been feeling unwell, so you call up your doctor’s office to make an appointment. To your surprise, they schedule you in for the next day. At the appointment, you aren’t rushed through describing your health concerns; instead, you have a full half hour to share your symptoms and worries and the exhaustive details of your health history with someone who listens attentively and asks thoughtful follow-up questions. You leave with a diagnosis, a treatment plan, and the sense that, for once, you’ve been able to discuss your health with the care that it merits.

The catch? You might not have spoken to a doctor, or other licensed medical practitioner, at all.

This is the new reality for patients at a small number of clinics in Southern California that are run by the medical startup Akido Labs. These patients—some of whom are on Medicaid—can access specialist appointments on short notice, a privilege typically only afforded to the wealthy few who patronize concierge clinics.

The key difference is that Akido patients spend relatively little time, or even no time at all, with their doctors. Instead, they see a medical assistant, who can lend a sympathetic ear but has limited clinical training. The job of formulating diagnoses and concocting a treatment plan is done by a proprietary, LLM-based system called ScopeAI that transcribes and analyzes the dialogue between patient and assistant. A doctor then approves, or corrects, the AI system’s recommendations.

“Our focus is really on what we can do to pull the doctor out of the visit,” says Jared Goodner, Akido’s CTO. 

According to Prashant Samant, Akido’s CEO, this approach allows doctors to see four to five times as many patients as they could previously. There’s good reason to want doctors to be much more productive. Americans are getting older and sicker, and many struggle to access adequate health care. The pending 15% reduction in federal funding for Medicaid will only make the situation worse.

But experts aren’t convinced that displacing so much of the cognitive work of medicine onto AI is the right way to remedy the doctor shortage. There’s a big gap in expertise between doctors and AI-enhanced medical assistants, says Emma Pierson, a computer scientist at UC Berkeley.  Jumping such a gap may introduce risks. “I am broadly excited about the potential of AI to expand access to medical expertise,” she says. “It’s just not obvious to me that this particular way is the way to do it.”

AI is already everywhere in medicine. Computer vision tools identify cancers during preventive scans, automated research systems allow doctors to quickly sort through the medical literature, and LLM-powered medical scribes can take appointment notes on a clinician’s behalf. But these systems are designed to support doctors as they go about their typical medical routines.

What distinguishes ScopeAI, Goodner says, is its ability to independently complete the cognitive tasks that constitute a medical visit, from eliciting a patient’s medical history to coming up with a list of potential diagnoses to identifying the most likely diagnosis and proposing appropriate next steps.

Under the hood, ScopeAI is a set of large language models, each of which can perform a specific step in the visit—from generating appropriate follow-up questions based on what a patient has said to to populating a list of likely conditions. For the most part, these LLMs are fine-tuned versions of Meta’s open-access Llama models, though Goodner says that the system also makes use of Anthropic’s Claude models. 

During the appointment, assistants read off questions from the ScopeAI interface, and ScopeAI produces new questions as it analyzes what the patient says. For the doctors who will review its outputs later, ScopeAI produces a concise note that includes a summary of the patient’s visit, the most likely diagnosis, two or three alternative diagnoses, and recommended next steps, such as referrals or prescriptions. It also lists a justification for each diagnosis and recommendation.

ScopeAI is currently being used in cardiology, endocrinology, and primary care clinics and by Akido’s street medicine team, which serves the Los Angeles homeless population. That team—which is led by Steven Hochman, a doctor who specializes in addiction medicine—meets patients out in the community to help them access medical care, including treatment for substance use disorders. 

Previously, in order to prescribe a drug to treat an opioid addiction, Hochman would have to meet the patient in person; now, caseworkers armed with ScopeAI can interview patients on their own, and Hochman can approve or reject the system’s recommendations later. “It allows me to be in 10 places at once,” he says.

Since they started using ScopeAI, the team has been able to get patients access to medications to help treat their substance use within 24 hours—something that Hochman calls “unheard of.”

This arrangement is only possible because homeless patients typically get their health insurance from Medicaid, the public insurance system for low-income Americans. While Medicaid allows doctors to approve ScopeAI prescriptions and treatment plans asynchronously, both for street medicine and clinic visits, many other insurance providers require that doctors speak directly with patients before approving those recommendations. Pierson says that discrepancy raises concerns. “You worry about that exacerbating health disparities,” she says.

Samant is aware of the appearance of inequity, and he says the discrepancy isn’t intentional—it’s just a feature of how the insurance plans currently work. He also notes that being seen quickly by an AI-enhanced medical assistant may be better than dealing with long wait times and limited provider availability, which is the status quo for Medicaid patients. And all Akido patients can opt for traditional doctor’s appointments, if they are willing to wait for them, he says.

Part of the challenge of deploying a tool like ScopeAI is navigating a regulatory and insurance landscape that wasn’t designed for AI systems that can independently direct medical appointments. Glenn Cohen, a professor at Harvard Law School, says that any AI system that effectively acts as a “doctor in a box” would likely need to be approved by the FDA and could run afoul of medical licensure laws, which dictate that only doctors and other licensed professionals can practice medicine.

The California Medical Practice Act says that AI can’t replace a doctor’s responsibility to diagnose and treat a patient, but doctors are allowed to use AI in their work, and they don’t need to see patients in-person or in real-time before diagnosing them. Neither the FDA nor the Medical Board of California were able to say whether or not ScopeAI was on solid legal footing based only on a written description of the system.

But Samant is confident that Akido is in compliance, as ScopeAI was intentionally designed to fall short of being a “doctor in a box.” Because the system requires a human doctor to review and approve of all of its diagnostic and treatment recommendations, he says, it doesn’t require FDA approval. 

At the clinic, this delicate balance between AI and doctor decision making happens entirely behind the scenes. Patients don’t ever see the ScopeAI interface directly—instead, they speak with a medical assistant who asks questions in the way that a doctor might in a typical appointment. That arrangement might make patients feel more comfortable. But Zeke Emanuel, a professor of medical ethics and health policy at the University of Pennsylvania who served in the Obama and Biden administrations, worries that this comfort could be obscuring from patients the extent to which an algorithm is influencing their care.

Pierson agrees. “That certainly isn’t really what was traditionally meant by the human touch in medicine,” she says.

DeAndre Siringoringo, a medical assistant who works at Akido’s cardiology office in Rancho Cucamonga, says that while he tells the patients he works with that an AI system will be listening to the appointment in order to gather information for their doctor, he doesn’t inform them about the specifics of how ScopeAI works, including the fact that it makes diagnostic recommendations to doctors. 

Because all ScopeAI recommendations are reviewed by a doctor, that might not seem like such a big deal—it’s the doctor who makes the final diagnosis, not the AI. But it’s been widely documented that doctors using AI systems tend to go along with the system’s recommendations more often than they should, a phenomenon known as automation bias. 

At this point, it’s impossible to know whether automation bias is affecting doctors’ decisions at Akido clinics, though Pierson says it’s a risk—especially when doctors aren’t physically present for appointments. “I worry that it might predispose you to sort of nodding along in a way that you might not if you were actually in the room watching this happen,” she says.

An Akido spokesperson says that automation bias is a valid concern for any AI tool that assists a doctor’s decision-making and that the company has made efforts to mitigate that bias. “We designed ScopeAI specifically to reduce bias by proactively countering blind spots that can influence medical decisions, which historically lean heavily on physician intuition and personal experience,” she says. “We also train physicians explicitly on how to use ScopeAI thoughtfully, so they retain accountability and avoid over-reliance.”

Akido evaluates ScopeAI’s performance by testing it on historical data and monitoring how often doctors correct its recommendations; those corrections are also used to further train the underlying models. Before deploying ScopeAI in a given specialty, Akido ensures that when tested on historical data sets, the system includes the correct diagnosis in its top three recommendations at least 92% of the time.

But Akido hasn’t undertaken more rigorous testing, such as studies that compare ScopeAI appointments with traditional in-person or telehealth appointments, in order to determine whether the system improves—or at least maintains—patient outcomes. Such a study could help indicate whether automation bias is a meaningful concern.

“Making medical care cheaper and more accessible is a laudable goal,” Pierson says. “But I just think it’s important to conduct strong evaluations comparing to that baseline.”

The looming crackdown on AI companionship

As long as there has been AI, there have been people sounding alarms about what it might do to us: rogue superintelligence, mass unemployment, or environmental ruin from data center sprawl. But this week showed that another threat entirely—that of kids forming unhealthy bonds with AI—is the one pulling AI safety out of the academic fringe and into regulators’ crosshairs.

This has been bubbling for a while. Two high-profile lawsuits filed in the last year, against Character.AI and OpenAI, allege that companion-like behavior in their models contributed to the suicides of two teenagers. A study by US nonprofit Common Sense Media, published in July, found that 72% of teenagers have used AI for companionship. Stories in reputable outlets about “AI psychosis” have highlighted how endless conversations with chatbots can lead people down delusional spirals.

It’s hard to overstate the impact of these stories. To the public, they are proof that AI is not merely imperfect, but a technology that’s more harmful than helpful. If you doubted that this outrage would be taken seriously by regulators and companies, three things happened this week that might change your mind.

A California law passes the legislature

On Thursday, the California state legislature passed a first-of-its-kind bill. It would require AI companies to include reminders for users they know to be minors that responses are AI generated. Companies would also need to have a protocol for addressing suicide and self-harm and provide annual reports on instances of suicidal ideation in users’ conversations with their chatbots. It was led by Democratic state senator Steve Padilla, passed with heavy bipartisan support, and now awaits Governor Gavin Newsom’s signature. 

There are reasons to be skeptical of the bill’s impact. It doesn’t specify efforts companies should take to identify which users are minors, and lots of AI companies already include referrals to crisis providers when someone is talking about suicide. (In the case of Adam Raine, one of the teenagers whose survivors are suing, his conversations with ChatGPT before his death included this type of information, but the chatbot allegedly went on to give advice related to suicide anyway.)

Still, it is undoubtedly the most significant of the efforts to rein in companion-like behaviors in AI models, which are in the works in other states too. If the bill becomes law, it would strike a blow to the position OpenAI has taken, which is that “America leads best with clear, nationwide rules, not a patchwork of state or local regulations,” as the company’s chief global affairs officer, Chris Lehane, wrote on LinkedIn last week.

The Federal Trade Commission takes aim

The very same day, the Federal Trade Commission announced an inquiry into seven companies, seeking information about how they develop companion-like characters, monetize engagement, measure and test the impact of their chatbots, and more. The companies are Google, Instagram, Meta, OpenAI, Snap, X, and Character Technologies, the maker of Character.AI.

The White House now wields immense, and potentially illegal, political influence over the agency. In March, President Trump fired its lone Democratic commissioner, Rebecca Slaughter. In July, a federal judge ruled that firing illegal, but last week the US Supreme Court temporarily permitted the firing.

“Protecting kids online is a top priority for the Trump-Vance FTC, and so is fostering innovation in critical sectors of our economy,” said FTC chairman Andrew Ferguson in a press release about the inquiry. 

Right now, it’s just that—an inquiry—but the process might (depending on how public the FTC makes its findings) reveal the inner workings of how the companies build their AI companions to keep users coming back again and again. 

Sam Altman on suicide cases

Also on the same day (a busy day for AI news), Tucker Carlson published an hour-long interview with OpenAI’s CEO, Sam Altman. It covers a lot of ground—Altman’s battle with Elon Musk, OpenAI’s military customers, conspiracy theories about the death of a former employee—but it also includes the most candid comments Altman’s made so far about the cases of suicide following conversations with AI. 

Altman talked about “the tension between user freedom and privacy and protecting vulnerable users” in cases like these. But then he offered up something I hadn’t heard before.

“I think it’d be very reasonable for us to say that in cases of young people talking about suicide seriously, where we cannot get in touch with parents, we do call the authorities,” he said. “That would be a change.”

So where does all this go next? For now, it’s clear that—at least in the case of children harmed by AI companionship—companies’ familiar playbook won’t hold. They can no longer deflect responsibility by leaning on privacy, personalization, or “user choice.” Pressure to take a harder line is mounting from state laws, regulators, and an outraged public.

But what will that look like? Politically, the left and right are now paying attention to AI’s harm to children, but their solutions differ. On the right, the proposed solution aligns with the wave of internet age-verification laws that have now been passed in over 20 states. These are meant to shield kids from adult content while defending “family values.” On the left, it’s the revival of stalled ambitions to hold Big Tech accountable through antitrust and consumer-protection powers. 

Consensus on the problem is easier than agreement on the cure. As it stands, it looks likely we’ll end up with exactly the patchwork of state and local regulations that OpenAI (and plenty of others) have lobbied against. 

For now, it’s down to companies to decide where to draw the lines. They’re having to decide things like: Should chatbots cut off conversations when users spiral toward self-harm, or would that leave some people worse off? Should they be licensed and regulated like therapists, or treated as entertainment products with warnings? The uncertainty stems from a basic contradiction: Companies have built chatbots to act like caring humans, but they’ve postponed developing the standards and accountability we demand of real caregivers. The clock is now running out.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How do AI models generate videos?

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV.

Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable. 

The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation. 

With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work.

How do you generate a video?

Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted. 

So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion. 

What’s a diffusion model?

Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set. 

A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. 

The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set. 

But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt. 

An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography).

It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image. 

What’s a latent diffusion model? 

All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest. 

A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video. 

And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch.  

With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.) 

What’s a latent diffusion transformer?

Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video. 

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences. 

But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora.

A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan

Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example. 

And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats. 

What about the audio? 

A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.” 

The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.  

You said that diffusion models can generate different kinds of data. Is this how LLMs work too? 

No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text. 

Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

Partnering with generative AI in the finance function

Generative AI has the potential to transform the finance function. By taking on some of the more mundane tasks that can occupy a lot of time, generative AI tools can help free up capacity for more high-value strategic work. For chief financial officers, this could mean spending more time and energy on proactively advising the business on financial strategy as organizations around the world continue to weather ongoing geopolitical and financial uncertainty.

CFOs can use large language models (LLMs) and generative AI tools to support everyday tasks like generating quarterly reports, communicating with investors, and formulating strategic summaries, says Andrew W. Lo, Charles E. and Susan T. Harris professor and director of the Laboratory for Financial Engineering at the MIT Sloan School of Management. “LLMs can’t replace the CFO by any means, but they can take a lot of the drudgery out of the role by providing first drafts of documents that summarize key issues and outline strategic priorities.”

Generative AI is also showing promise in functions like treasury, with use cases including cash, revenue, and liquidity forecasting and management, as well as automating contracts and investment analysis. However, challenges still remain for generative AI to contribute to forecasting due to the mathematical limitations of LLMs. Regardless, Deloitte’s analysis of its 2024 State of Generative AI in the Enterprise survey found that one-fifth (19%) of finance organizations have already adopted generative AI in the finance function.

Despite return on generative AI investments in finance functions being 8 points below expectations so far for surveyed organizations (see Figure 1), some finance departments appear to be moving ahead with investments. Deloitte’s fourth-quarter 2024 North American CFO Signals survey found that 46% of CFOs who responded expect deployment or spend on generative AI in finance to increase in the next 12 months (see Figure 2). Respondents cite the technology’s potential to help control costs through self-service and automation and free up workers for higher-level, higher-productivity tasks as some of the top benefits of the technology.

“Companies have used AI on the customer-facing side of the house for a long time, but in finance, employees are still creating documents and presentations and emailing them around,” says Robyn Peters, principal in finance transformation at Deloitte Consulting LLP. “Largely, the human-centric experience that customers expect from brands in retail, transportation, and hospitality haven’t been pulled through to the finance organization. And there’s no reason we cannot do that—and, in fact, AI makes it a lot easier to do.”

If CFOs think they can just sit by for the next five years and watch how AI evolves, they may lose out to more nimble competitors that are actively experimenting in the space. Future finance professionals are growing up using generative AI tools too. CFOs should consider reimagining what it looks like to be a successful finance professional, in collaboration with AI.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. AI tools that may have been used were limited to secondary production processes that passed thorough human review.