How do AI models generate videos?

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV.

Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable. 

The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation. 

With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work.

How do you generate a video?

Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted. 

So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion. 

What’s a diffusion model?

Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set. 

A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. 

The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set. 

But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt. 

An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography).

It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image. 

What’s a latent diffusion model? 

All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest. 

A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video. 

And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch.  

With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.) 

What’s a latent diffusion transformer?

Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video. 

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences. 

But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora.

A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan

Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example. 

And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats. 

What about the audio? 

A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.” 

The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.  

You said that diffusion models can generate different kinds of data. Is this how LLMs work too? 

No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text. 

Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

Partnering with generative AI in the finance function

Generative AI has the potential to transform the finance function. By taking on some of the more mundane tasks that can occupy a lot of time, generative AI tools can help free up capacity for more high-value strategic work. For chief financial officers, this could mean spending more time and energy on proactively advising the business on financial strategy as organizations around the world continue to weather ongoing geopolitical and financial uncertainty.

CFOs can use large language models (LLMs) and generative AI tools to support everyday tasks like generating quarterly reports, communicating with investors, and formulating strategic summaries, says Andrew W. Lo, Charles E. and Susan T. Harris professor and director of the Laboratory for Financial Engineering at the MIT Sloan School of Management. “LLMs can’t replace the CFO by any means, but they can take a lot of the drudgery out of the role by providing first drafts of documents that summarize key issues and outline strategic priorities.”

Generative AI is also showing promise in functions like treasury, with use cases including cash, revenue, and liquidity forecasting and management, as well as automating contracts and investment analysis. However, challenges still remain for generative AI to contribute to forecasting due to the mathematical limitations of LLMs. Regardless, Deloitte’s analysis of its 2024 State of Generative AI in the Enterprise survey found that one-fifth (19%) of finance organizations have already adopted generative AI in the finance function.

Despite return on generative AI investments in finance functions being 8 points below expectations so far for surveyed organizations (see Figure 1), some finance departments appear to be moving ahead with investments. Deloitte’s fourth-quarter 2024 North American CFO Signals survey found that 46% of CFOs who responded expect deployment or spend on generative AI in finance to increase in the next 12 months (see Figure 2). Respondents cite the technology’s potential to help control costs through self-service and automation and free up workers for higher-level, higher-productivity tasks as some of the top benefits of the technology.

“Companies have used AI on the customer-facing side of the house for a long time, but in finance, employees are still creating documents and presentations and emailing them around,” says Robyn Peters, principal in finance transformation at Deloitte Consulting LLP. “Largely, the human-centric experience that customers expect from brands in retail, transportation, and hospitality haven’t been pulled through to the finance organization. And there’s no reason we cannot do that—and, in fact, AI makes it a lot easier to do.”

If CFOs think they can just sit by for the next five years and watch how AI evolves, they may lose out to more nimble competitors that are actively experimenting in the space. Future finance professionals are growing up using generative AI tools too. CFOs should consider reimagining what it looks like to be a successful finance professional, in collaboration with AI.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Help! My therapist is secretly using ChatGPT

In Silicon Valley’s imagined future, AI models are so empathetic that we’ll use them as therapists. They’ll provide mental-health care for millions, unimpeded by the pesky requirements for human counselors, like the need for graduate degrees, malpractice insurance, and sleep. Down here on Earth, something very different has been happening. 

Last week, we published a story about people finding out that their therapists were secretly using ChatGPT during sessions. In some cases it wasn’t subtle; one therapist accidentally shared his screen during a virtual appointment, allowing the patient to see his own private thoughts being typed into ChatGPT in real time. The model then suggested responses that his therapist parroted. 

It’s my favorite AI story as of late, probably because it captures so well the chaos that can unfold when people actually use AI the way tech companies have all but told them to.

As the writer of the story, Laurie Clarke, points out, it’s not a total pipe dream that AI could be therapeutically useful. Early this year, I wrote about the first clinical trial of an AI bot built specifically for therapy. The results were promising! But the secretive use by therapists of AI models that are not vetted for mental health is something very different. I had a conversation with Clarke to hear more about what she found. 

I have to say, I was really fascinated that people called out their therapists after finding out they were covertly using AI. How did you interpret the reactions of these therapists? Were they trying to hide it?

In all the cases mentioned in the piece, the therapist hadn’t provided prior disclosure of how they were using AI to their patients. So whether or not they were explicitly trying to conceal it, that’s how it ended up looking when it was discovered. I think for this reason, one of my main takeaways from writing the piece was that therapists should absolutely disclose when they’re going to use AI and how (if they plan to use it). If they don’t, it raises all these really uncomfortable questions for patients when it’s uncovered and risks irrevocably damaging the trust that’s been built.

In the examples you’ve come across, are therapists turning to AI simply as a time-saver? Or do they think AI models can genuinely give them a new perspective on what’s bothering someone?

Some see AI as a potential time-saver. I heard from a few therapists that notes are the bane of their lives. So I think there is some interest in AI-powered tools that can support this. Most I spoke to were very skeptical about using AI for advice on how to treat a patient. They said it would be better to consult supervisors or colleagues, or case studies in the literature. They were also understandably very wary of inputting sensitive data into these tools.

There is some evidence AI can deliver more standardized, “manualized” therapies like CBT [cognitive behavioral therapy] reasonably effectively. So it’s possible it could be more useful for that. But that is AI specifically designed for that purpose, not general-purpose tools like ChatGPT.

What happens if this goes awry? What attention is this getting from ethics groups and lawmakers?

At present, professional bodies like the American Counseling Association advise against using AI tools to diagnose patients. There could also be more stringent regulations preventing this in future. Nevada and Illinois, for example, have recently passed laws prohibiting the use of AI in therapeutic decision-making. More states could follow.

OpenAI’s Sam Altman said last month that “a lot of people effectively use ChatGPT as a sort of therapist,” and that to him, that’s a good thing. Do you think tech companies are overpromising on AI’s ability to help us?

I think that tech companies are subtly encouraging this use of AI because clearly it’s a route through which some people are forming an attachment to their products. I think the main issue is that what people are getting from these tools isn’t really “therapy” by any stretch. Good therapy goes far beyond being soothing and validating everything someone says. I’ve never in my life looked forward to a (real, in-person) therapy session. They’re often highly uncomfortable, and even distressing. But that’s part of the point. The therapist should be challenging you and drawing you out and seeking to understand you. ChatGPT doesn’t do any of these things. 

Read the full story from Laurie Clarke

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Three big things we still don’t know about AI’s energy burden

Earlier this year, when my colleague Casey Crownhart and I spent six months researching the climate and energy burden of AI, we came to see one number in particular as our white whale: how much energy the leading AI models, like ChatGPT or Gemini, use up when generating a single response. 

This fundamental number remained elusive even as the scramble to power AI escalated to the White House and the Pentagon, and as projections showed that in three years AI could use as much electricity as 22% of all US households. 

The problem with finding that number, as we explain in our piece published in May, was that AI companies are the only ones who have it. We pestered Google, OpenAI, and Microsoft, but each company refused to provide its figure. Researchers we spoke to who study AI’s impact on energy grids compared it to trying to measure the fuel efficiency of a car without ever being able to drive it, making guesses based on rumors of its engine size and what it sounds like going down the highway.


This story is a part of MIT Technology Review’s series “Power Hungry: AI and our energy future,” on the energy demands and carbon costs of the artificial-intelligence revolution.


But then this summer, after we published, a strange thing started to happen. In June, OpenAI’s Sam Altman wrote that an average ChatGPT query uses 0.34 watt-hours of energy. In July, the French AI startup Mistral didn’t publish a number directly but released an estimate of the emissions generated. In August, Google revealed that answering a question to Gemini uses about 0.24 watt-hours of energy. The figures from Google and OpenAI were similar to what Casey and I estimated for medium-size AI models. 

So with this newfound transparency, is our job complete? Did we finally harpoon our white whale, and if so, what happens next for people studying the climate impact of AI? I reached out to some of our old sources, and some new ones, to find out.

The numbers are vague and chat-only

The first thing they told me is that there’s a lot missing from the figures tech companies published this summer. 

OpenAI’s number, for example, did not appear in a detailed technical paper but rather in a blog post by Altman that leaves lots of unanswered questions, such as which model he was referring to, how the energy use was measured, and how much it varies. Google’s figure, as Crownhart points out, refers to the median amount of energy per query, which doesn’t give us a sense of the more energy-demanding Gemini responses, like when it uses a reasoning model to “think” through a hard problem or generates a really long response. 

The numbers also refer only to interactions with chatbots, not the other ways that people are becoming increasingly reliant on generative AI. 

“As video and image becomes more prominent and used by more and more people, we need the numbers from different modalities and how they measure up,” says Sasha Luccioni, AI and climate lead at the AI platform Hugging Face. 

This is also important because the figures for asking a question to a chatbot are, as expected, undoubtedly small—the same amount of electricity used by a microwave in just seconds. That’s part of the reason AI and climate researchers don’t suggest that any one individual’s AI use creates a significant climate burden. 

A full accounting of AI’s energy demands—one that goes beyond what’s used to answer an individual query to help us understand its full net impact on the climate—would require application-specific information on how all this AI is being used. Ketan Joshi, an analyst for climate and energy groups, acknowledges that researchers don’t usually get such specific information from other industries but says it might be justified in this case.

“The rate of data center growth is inarguably unusual,” Joshi says. “Companies should be subject to significantly more scrutiny.”

We have questions about energy efficiency

Companies making billion-dollar investments into AI have struggled to square this growth in energy demand with their sustainability goals. In May, Microsoft said that its emissions have soared by over 23% since 2020, owing largely to AI, while the company has promised to be carbon negative by 2030. “It has become clear that our journey towards being carbon negative is a marathon, not a sprint,” Microsoft wrote.

Tech companies often justify this emissions burden by arguing that soon enough, AI itself will unlock efficiencies that will make it a net positive for the climate. Perhaps the right AI system, the thinking goes, could design more efficient heating and cooling systems for a building, or help discover the minerals required for electric-vehicle batteries. 

But there are no signs that AI has been usefully used to do these things yet. Companies have shared anecdotes about using AI to find methane emission hot spots, for example, but they haven’t been transparent enough to help us know if these successes outweigh the surges in electricity demand and emissions that Big Tech has produced in the AI boom. In the meantime, more data centers are planned, and AI’s energy demand continues to rise and rise. 

The ‘bubble’ question

One of the big unknowns in the AI energy equation is whether society will ever adopt AI at the levels that figure into tech companies’ plans. OpenAI has said that ChatGPT receives 2.5 billion prompts per day. It’s possible that this number, and the equivalent numbers for other AI companies, will continue to soar in the coming years. Projections released last year by the Lawrence Berkeley National Laboratory suggest that if they do, AI alone could consume as much electricity annually as 22% of all US households by 2028.

But this summer also saw signs of a slowdown that undercut the industry’s optimism. OpenAI’s launch of GPT-5 was largely considered a flop, even by the company itself, and that flop led critics to wonder if AI may be hitting a wall. When a group at MIT found that 95% of businesses are seeing no return on their massive AI investments, stocks floundered. The expansion of AI-specific data centers might be an investment that’s hard to recoup, especially as revenues for AI companies remain elusive. 

One of the biggest unknowns about AI’s future energy burden isn’t how much a single query consumes, or any other figure that can be disclosed. It’s whether demand will ever reach the scale companies are building for or whether the technology will collapse under its own hype. The answer will determine whether today’s buildout becomes a lasting shift in our energy system or a short-lived spike.

How Yichao “Peak” Ji became a global AI app hitmaker

Yichao “Peak” Ji is one of MIT Technology Review’s 2025 Innovators Under 35. Meet the rest of this year’s honorees. 

When Yichao Ji—also known as “Peak”—appeared in a launch video for Manus in March, he didn’t expect it to go viral. Speaking in fluent English, the 32-year-old introduced the AI agent built by Chinese startup Butterfly Effect, where he serves as chief scientist. 

The video was not an elaborate production—it was directed by cofounder Zhang Tao and filmed in a corner of their Beijing office. But something about Ji’s delivery, and the vision behind the product, cut through the noise. The product, then still an early preview available only through invite codes, spread across the Chinese internet to the world in a matter of days. Within a week of its debut, Manus had attracted a waiting list of around 2 million people. 

At first sight, Manus works like most chatbots: Users can ask it questions in a chat window. However, besides providing answers, it can also carry out tasks (for example, finding an apartment that meets specified criteria within a certain budget). It does this by breaking tasks down into steps, then using a cloud-based virtual machine equipped with a browser and other tools to execute them—perusing websites, filling in forms, and so on.

Ji is the technical core of the team. Now based in Singapore, he leads product and infrastructure development as the company pushes forward with its global expansion. 

Despite his relative youth, Ji has over a decade of experience building products that merge technical complexity with real-world usability. That earned him credibility among both engineers and investors—and put him at the forefront of a rising class of Chinese technologists with AI products and global ambitions. 

Serial builder

The son of a professor and an IT professional, Ji moved to Boulder, Colorado, at age four for his father’s visiting scholar post, returning to Beijing in second grade.

His fluent English set him apart early on, but it was an elementary school robotics team that sparked his interest in programming. By high school, he was running the computer club, teaching himself how to build operating systems, and drawing inspiration from Bill Gates, Linux, and open-source culture. He describes himself as a lifelong Apple devotee, and it was Apple’s launch of the App Store in 2008 that ignited his passion for development.

In 2010, as a high school sophomore, Ji created the Mammoth browser, a customizable third-party iPhone browser. It quickly became the most-downloaded third-party browser developed by an individual in China and earned him the Macworld Asia Grand Prize in 2011. International tech site AppAdvice called it a product that “redefined the way you browse the internet.” At age 20, he was on the cover of Forbes magazine and made its “30 Under 30” list. 

Meet the rest of this year’s 
Innovators Under 35.

During his teenage years, Ji developed several other iOS apps, including a budgeting tool designed for Hasbro’s Monopoly game, which sold well—until it attracted a legal notice for using the trademarked name. But Ji wasn’t put off a career in tech by that early brush with a multinational legal team. If anything, he says, it sharpened his instincts for both product and risk. 

In 2012, Ji launched his own company, Peak Labs, and later led the development of Magi, a search engine. The tool extracted information from across the web to answer queries—conceptually similar to today’s AI-powered search, but powered by a custom language model. 

​​Magi was briefly popular, drawing millions of users in its first month, but consumer adoption didn’t stick. It did, however, attract enterprise interest, and Ji adapted it for B2B use, before selling it in 2022. 

AI acumen 

Manus would become his next act—and a more ambitious one. His cofounders, Zhang Tao and Xiao Hong, complement Ji’s technical core with product know-how, storytelling, and organizational savvy. Both Xiao and Ji are serial entrepreneurs who have been backed by venture capital firm ZhenFund multiple times. Together, they represent the kind of long-term collaboration and international ambition that increasingly defines China’s next wave of entrepreneurs.

JULIANA TAN

People who have worked with Ji describe him as a clear thinker, a fast talker, and a tireless, deeply committed builder who thinks in systems, products, and user flows. He represents a new generation of Chinese technologists: equally at home coding or in pitch meetings, fluent in both building and branding. He’s also a product of open-source culture, and remains an active contributor whose projects regularly garner attention—and GitHub stars—across developer communities.

With new funding led by US venture capital firm Benchmark, Ji and his team are taking Manus to the wider world, relocating operations outside of China, to Singapore, and actively targeting consumers around the world. The product is built on US-based infrastructure, drawing on technologies like Claude Sonnet, Microsoft Azure, and open-source tools such as Browser Use. It’s a distinctly global setup: an AI agent developed by a Chinese team, powered by Western platforms, and designed for international users. That isn’t incidental; it reflects the more fluid nature of AI entrepreneurship today, where talent, infrastructure, and ambition move across borders just as quickly as the technology itself.

For Ji, the goal isn’t just building a global company—it’s building a legacy. “I hope Manus is the last product I’ll ever build,” Ji says. “Because if I ever have another wild idea—(I’ll just) leave it to Manus!”

Synthesia’s AI clones are more expressive than ever. Soon they’ll be able to talk back.

Earlier this summer, I walked through the glassy lobby of a fancy office in London, into an elevator, and then along a corridor into a clean, carpeted room. Natural light flooded in through its windows, and a large pair of umbrella-like lighting rigs made the room even brighter. I tried not to squint as I took my place in front of a tripod equipped with a large camera and a laptop displaying an autocue. I took a deep breath and started to read out the script.

I’m not a newsreader or an actor auditioning for a movie—I was visiting the AI company Synthesia to give it what it needed to create a hyperrealistic AI-generated avatar of me. The company’s avatars are a decent barometer of just how dizzying progress has been in AI over the past few years, so I was curious just how accurately its latest AI model, introduced last month, could replicate me. 

When Synthesia launched in 2017, its primary purpose was to match AI versions of real human faces—for example, the former footballer David Beckham—with dubbed voices speaking in different languages. A few years later, in 2020, it started giving the companies that signed up for its services the opportunity to make professional-level presentation videos starring either AI versions of staff members or consenting actors. But the technology wasn’t perfect. The avatars’ body movements could be jerky and unnatural, their accents sometimes slipped, and the emotions indicated by their voices didn’t always match their facial expressions.

Now Synthesia’s avatars have been updated with more natural mannerisms and movements, as well as expressive voices that better preserve the speaker’s accent—making them appear more humanlike than ever before. For Synthesia’s corporate clients, these avatars will make for slicker presenters of financial results, internal communications, or staff training videos.

I found the video demonstrating my avatar as unnerving as it is technically impressive. It’s slick enough to pass as a high-definition recording of a chirpy corporate speech, and if you didn’t know me, you’d probably think that’s exactly what it was. This demonstration shows how much harder it’s becoming to distinguish the artificial from the real. And before long, these avatars will even be able to talk back to us. But how much better can they get? And what might interacting with AI clones do to us?  

The creation process

When my former colleague Melissa visited Synthesia’s London studio to create an avatar of herself last year, she had to go through a long process of calibrating the system, reading out a script in different emotional states, and mouthing the sounds needed to help her avatar form vowels and consonants. As I stand in the brightly lit room 15 months later, I’m relieved to hear that the creation process has been significantly streamlined. Josh Baker-Mendoza, Synthesia’s technical supervisor, encourages me to gesture and move my hands as I would during natural conversation, while simultaneously warning me not to move too much. I duly repeat an overly glowing script that’s designed to encourage me to speak emotively and enthusiastically. The result is a bit as if if Steve Jobs had been resurrected as a blond British woman with a low, monotonous voice. 

It also has the unfortunate effect of making me sound like an employee of Synthesia.“I am so thrilled to be with you today to show off what we’ve been working on. We are on the edge of innovation, and the possibilities are endless,” I parrot eagerly, trying to sound lively rather than manic. “So get ready to be part of something that will make you go, ‘Wow!’ This opportunity isn’t just big—it’s monumental.”

Just an hour later, the team has all the footage it needs. A couple of weeks later I receive two avatars of myself: one powered by the previous Express-1 model and the other made with the latest Express-2 technology. The latter, Synthesia claims, makes its synthetic humans more lifelike and true to the people they’re modeled on, complete with more expressive hand gestures, facial movements, and speech. You can see the results for yourself below. 

COURTESY SYNTHESIA

Last year, Melissa found that her Express-1-powered avatar failed to match her transatlantic accent. Its range of emotions was also limited—when she asked her avatar to read a script angrily, it sounded more whiny than furious. In the months since, Synthesia has improved Express-1, but the version of my avatar made with the same technology blinks furiously and still struggles to synchronize body movements with speech.

By way of contrast, I’m struck by just how much my new Express-2 avatar looks like me: Its facial features mirror my own perfectly. Its voice is spookily accurate too, and although it gesticulates more than I do, its hand movements generally marry up with what I’m saying. 

But the tiny telltale signs of AI generation are still there if you know where to look. The palms of my hands are bright pink and as smooth as putty. Strands of hair hang stiffly around my shoulders instead of moving with me. Its eyes stare glassily ahead, rarely blinking. And although the voice is unmistakably mine, there’s something slightly off about my digital clone’s intonations and speech patterns. “This is great!” my avatar randomly declares, before slipping back into a saner register.

Anna Eiserbeck, a postdoctoral psychology researcher at the Humboldt University of Berlin who has studied how humans react to perceived deepfake faces, says she isn’t sure she’d have been able to identify my avatar as a deepfake at first glance.

But she would eventually have noticed something amiss. It’s not just the small details that give it away—my oddly static earring, the way my body sometimes moves in small, abrupt jerks. It’s something that runs much deeper, she explains.

“Something seemed a bit empty. I know there’s no actual emotion behind it— it’s not a conscious being. It does not feel anything,” she says. Watching the video gave her “this kind of uncanny feeling.” 

My digital clone, and Eiserbeck’s reaction to it, make me wonder how realistic these avatars really need to be. 

I realize that part of the reason I feel disconcerted by my avatar is that it behaves in a way I rarely have to. Its oddly upbeat register is completely at odds with how I normally speak; I’m a die-hard cynical Brit who finds it difficult to inject enthusiasm into my voice even when I’m genuinely thrilled or excited. It’s just the way I am. Plus, watching the videos on a loop makes me question if I really do wave my hands about that way, or move my mouth in such a weird manner. If you thought being confronted with your own face on a Zoom call was humbling, wait until you’re staring at a whole avatar of yourself. 

When Facebook was first taking off in the UK almost 20 years ago, my friends and I thought illicitly logging into each other’s accounts and posting the most outrageous or rage-inducing status updates imaginable was the height of comedy. I wonder if the equivalent will soon be getting someone else’s avatar to say something truly embarrassing: expressing support for a disgraced politician or (in my case) admitting to liking Ed Sheeran’s music. 

Express-2 remodels every person it’s presented with into a polished professional speaker with the body language of a hyperactive hype man. And while this makes perfect sense for a company focused on making glossy business videos, watching my avatar doesn’t feel like watching me at all. It feels like something else entirely.

How it works

The real technical challenge these days has less to do with creating avatars that match our appearance than with getting them to replicate our behavior, says Björn Schuller, a professor of artificial intelligence at Imperial College London. “There’s a lot to consider to get right; you have to have the right micro gesture, the right intonation, the sound of voice and the right word,” he says. “I don’t want an AI [avatar] to frown at the wrong moment—that could send an entirely different message.”

To achieve an improved level of realism, Synthesia developed a number of new audio and video AI models. The team created a voice cloning model to preserve the human speaker’s accent, intonation, and expressiveness—unlike other voice models, which can flatten speakers’ distinctive accents into generically American-sounding voices.

When a user uploads a script to Express-1, its system analyzes the words to infer the correct tone to use. That information is then fed into a diffusion model, which renders the avatar’s facial expressions and movements to match the speech. 

Alongside the voice model, Express-2 uses three other models to create and animate the avatars. The first generates an avatar’s gestures to accompany the speech fed into it by the Express-Voice model. A second evaluates how closely the input audio aligns with the multiple versions of the corresponding generated motion before selecting the best one. Then a final model renders the avatar with that chosen motion. 

This third rendering model is significantly more powerful than its Express-1 predecessor. Whereas the previous model had a few hundred million parameters, Express-2’s rendering model’s parameters number in the billions. This means it takes less time to create the avatar, says Youssef Alami Mejjati, Synthesia’s head of research and development:

“With Express-1, it needed to first see someone expressing emotions to be able to render them. Now, because we’ve trained it on much more diverse data and much larger data sets, with much more compute, it just learns these associations automatically without needing to see them.” 

Narrowing the uncanny valley

Although humanlike AI-generated avatars have been around for years, the recent boom in generative AI is making it increasingly easier and more affordable to create lifelike synthetic humans—and they’re already being put to work. Synthesia isn’t alone: AI avatar companies like Yuzu Labs, Creatify, Arcdads, and Vidyard give businesses the tools to quickly generate and edit videos starring either AI actors or artificial versions of members of staff, promising cost-effective ways to make compelling ads that audiences connect with. Similarly, AI-generated clones of livestreamers have exploded in popularity across China in recent years, partly because they can sell products 24/7 without getting tired or needing to be paid. 

For now at least, Synthesia is “laser focused” on the corporate sphere. But it’s not ruling out expanding into new sectors such as entertainment or education, says Peter Hill, the company’s chief technical officer. In an apparent step toward this, Synthesia recently partnered with Google to integrate Google’s powerful new generative video model Veo 3 into its platform, allowing users to directly generate and embed clips into Synthesia’s videos. It suggests that in the future, these hyperrealistic artificial humans could take up starring roles in detailed universes with ever-changeable backdrops. 

At present this could, for example, involve using Veo 3 to generate a video of meat-processing machinery, with a Synthesia avatar next to the machines talking about how to use them safely. But future versions of Synthesia’s technology could result in educational videos customizable to an individual’s level of knowledge, says Alex Voica, head of corporate affairs and policy at Synthesia. For example, a video about the evolution of life on Earth could be tweaked for someone with a biology degree or someone with high-school-level knowledge. “It’s going to be such a much more engaging and personalized way of delivering content that I’m really excited about,” he says. 

The next frontier, according to Synthesia, will be avatars that can talk back, “understanding” conversations with users and responding in real time Think ChatGPT, but with a lifelike digital human attached. 

Synthesia has already added an interactive element by letting users click through on-screen questions during quizzes presented by its avatars. But it’s also exploring making them truly interactive: Future users could ask their avatar to pause and expand on a point, or ask it a question. “We really want to make the best learning experience, and that means through video that’s entertaining but also personalized and interactive,” says Alami Mejjati. “This, for me, is the missing part in online learning experiences today. And I know we’re very close to solving that.”

We already know that humans can—and do—form deep emotional bonds with AI systems, even with basic text-based chatbots. Combining agentic technology—which is already capable of navigating the web, coding, and playing video games unsupervised—with a realistic human face could usher in a whole new kind of AI addiction, says Pat Pataranutaporn, an assistant professor at the MIT Media Lab.  

“If you make the system too realistic, people might start forming certain kinds of relationships with these characters,” he says. “We’ve seen many cases where AI companions have influenced dangerous behavior even when they are basically texting. If an avatar had a talking head, it would be even more addictive.”

Schuller agrees that avatars in the near future will be perfectly optimized to adjust their projected levels of emotion and charisma so that their human audiences will stay engaged for as long as possible. “It will be very hard [for humans] to compete with charismatic AI of the future; it’s always present, always has an ear for you, and is always understanding,” he says. “Al will change that human-to-human connection.”

As I pause and replay my Express-2 avatar, I imagine holding conversations with it—this uncanny, permanently upbeat, perpetually available product of pixels and algorithms that looks like me and sounds like me, but fundamentally isn’t me. Virtual Rhiannon has never laughed until she’s cried, or fallen in love, or run a marathon, or watched the sun set in another country. 

But, I concede, she could deliver a damned good presentation about why Ed Sheeran is the greatest musician ever to come out of the UK. And only my closest friends and family would know that it’s not the real me.

Imagining the future of banking with agentic AI

Agentic AI is coming of age. And with it comes new opportunities in the financial services sector. Banks are increasingly employing agentic AI to optimize processes, navigate complex systems, and sift through vast quantities of unstructured data to make decisions and take actions—with or without human involvement. “With the maturing of agentic AI, it is becoming a lot more technologically possible for large-scale process automation that was not possible with rules-based approaches like robotic process automation before,” says Sameer Gupta, Americas financial services AI leader at EY. “That moves the needle in terms of cost, efficiency, and customer experience impact.”

From responding to customer services requests, to automating loan approvals, adjusting bill payments to align with regular paychecks, or extracting key terms and conditions from financial agreements, agentic AI has the potential to transform the customer experience—and how financial institutions operate too.

Adapting to new and emerging technologies like agentic AI is essential for an organization’s survival, says Murli Buluswar, head of US personal banking analytics at Citi. “A company’s ability to adopt new technical capabilities and rearchitect how their firm operates is going to make the difference between the firms that succeed and those that get left behind,” says Buluswar. “Your people and your firm must recognize that how they go about their work is going to be meaningfully different.”

The emerging landscape

Agentic AI is already being rapidly adopted in the banking sector. A 2025 survey of 250 banking executives by MIT Technology Review Insights found that 70% of leaders say their firm uses agentic AI to some degree, either through existing deployments (16%) or pilot projects (52%). And it is already proving effective in a range of different functions. More than half of executives say agentic AI systems are highly capable of improving fraud detection (56%) and security (51%). Other strong use cases include reducing cost and increasing efficiency (41%) and improving the customer experience (41%).

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Building the AI-enabled enterprise of the future

Artificial intelligence is fundamentally reshaping how the world operates. With its potential to automate repetitive tasks, analyze vast datasets, and augment human capabilities, the use of AI technologies is already driving changes across industries.

In health care and pharmaceuticals, machine learning and AI-powered tools are advancing disease diagnosis, reducing drug discovery timelines by as much as 50%, and heralding a new era of personalized medicine. In supply chain and logistics, AI models can help prevent or mitigate disruptions, allowing businesses to make informed decisions and enhance resilience amid geopolitical uncertainty. Across sectors, AI in research and development cycles may reduce time-to-market by 50% and lower costs in industries like automotive and aerospace by as much as 30%.

“This is one of those inflection points where I don’t think anybody really has a full view of the significance of the change this is going to have on not just companies but society as a whole,” says Patrick Milligan, chief information security officer at Ford, which is making AI an important part of its transformation efforts and expanding its use across company operations.

Given its game-changing potential—and the breakneck speed with which it is evolving—it is perhaps not surprising that companies are feeling the pressure to deploy AI as soon as possible: 98% say they feel an increased sense of urgency in the last year. And 85% believe they have less than 18 months to deploy an AI strategy or they will see negative business effects.

Companies that take a “wait and see” approach will fall behind, says Jeetu Patel, president and chief product officer at Cisco. “If you wait for too long, you risk becoming irrelevant,” he says. “I don’t worry about AI taking my job, but I definitely worry about another person that uses AI better than me or another company that uses AI better taking my job or making my company irrelevant.”

But despite the urgency, just 13% of companies globally say they are ready to leverage AI to its full potential. IT infrastructure is an increasing challenge as workloads grow ever larger. Two-thirds (68%) of organizations say their infrastructure is moderately ready at best to adopt and scale AI technologies.

Essential capabilities include adequate compute power to process complex AI models, optimized network performance across the organization and in data centers, and enhanced cybersecurity capabilities to detect and prevent sophisticated attacks. This must be combined with observability, which ensures the reliable and optimized performance of infrastructure, models, and the overall AI system by providing continuous monitoring and analysis of their behavior. Good quality, well-managed enterprise-wide data is also essential—after all, AI is only as good as the data it draws on. All of this must be supported by AI-focused company culture and talent development.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

The connected customer

As brands compete for increasingly price conscious consumers, customer experience (CX) has become a decisive differentiator. Yet many struggle to deliver, constrained by outdated systems, fragmented data, and organizational silos that limit both agility and consistency.

The current wave of artificial intelligence, particularly agentic AI that can reason and act across workflows, offers a powerful opportunity to reshape service delivery. Organizations can now provide fast, personalized support at scale while improving workforce productivity and satisfaction. But realizing that potential requires more than isolated tools; it calls for a unified platform that connects people, data, and decisions across the service lifecycle. This report explores how leading organizations are navigating that shift, and what it takes to move from AI potential to CX impact.

Key findings include:

  • AI is transforming customer experience (CX). Customer service has evolved from the era of voicebased support through digital commerce and cloud to today’s AI revolution. Powered by large language models (LLMs) and a growing pool of data, AI can handle more diverse customer queries, produce highly personalized communication at scale, and help staff and senior management with decision support. Customers are also warming to AI-powered platforms as performance and reliability improves. Early adopters report improvements including more satisfied customers, more productive staff, and richer performance insights.
  • Legacy infrastructure and data fragmentation are hindering organizations from maximizing the value of AI. While customer service and IT departments are early adopters of AI, the broader organizations across industries are often riddled with outdated infrastructure. This impinges the ability of autonomous AI tools to move freely across workflows and data repositories to deliver goal-based tasks. Creating a unified platform and orchestration architecture will be key to unlock AI’s potential. The transition can be a catalyst for streamlining and rationalizing the business as a whole.
  • High-performing organizations use AI without losing the human touch. While consumers are warming to AI, rollout should include some discretion. Excessive personalization could make customers uncomfortable about their personal data, while engineered “empathy” from bots may be received as insincere. Organizations should not underestimate the unique value their workforce offers. Sophisticated adopters strike the right balance between human and machine capabilities. Their leaders are proactive in addressing job displacement worries through transparent communication, comprehensive training, and clear delineation between AI and human roles. The most effective organizations treat AI as a collaborative tool that enhances rather than replaces human connection and expertise.

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

This content was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Therapists are secretly using ChatGPT. Clients are triggered.

Declan would never have found out his therapist was using ChatGPT had it not been for a technical mishap. The connection was patchy during one of their online sessions, so Declan suggested they turn off their video feeds. Instead, his therapist began inadvertently sharing his screen.

“Suddenly, I was watching him use ChatGPT,” says Declan, 31, who lives in Los Angeles. “He was taking what I was saying and putting it into ChatGPT, and then summarizing or cherry-picking answers.”

Declan was so shocked he didn’t say anything, and for the rest of the session he was privy to a real-time stream of ChatGPT analysis rippling across his therapist’s screen. The session became even more surreal when Declan began echoing ChatGPT in his own responses, preempting his therapist. 

“I became the best patient ever,” he says, “because ChatGPT would be like, ‘Well, do you consider that your way of thinking might be a little too black and white?’ And I would be like, ‘Huh, you know, I think my way of thinking might be too black and white,’ and [my therapist would] be like, ‘Exactly.’ I’m sure it was his dream session.”

Among the questions racing through Declan’s mind was, “Is this legal?” When Declan raised the incident with his therapist at the next session—“It was super awkward, like a weird breakup”—the therapist cried. He explained he had felt they’d hit a wall and had begun looking for answers elsewhere. “I was still charged for that session,” Declan says, laughing.

The large language model (LLM) boom of the past few years has had unexpected ramifications for the field of psychotherapy, mostly due to the growing number of people substituting the likes of ChatGPT for human therapists. But less discussed is how some therapists themselves are integrating AI into their practice. As in many other professions, generative AI promises tantalizing efficiency savings, but its adoption risks compromising sensitive patient data and undermining a relationship in which trust is paramount.

Suspicious sentiments

Declan is not alone, as I can attest from personal experience. When I received a recent email from my therapist that seemed longer and more polished than usual, I initially felt heartened. It seemed to convey a kind, validating message, and its length made me feel that she’d taken the time to reflect on all of the points in my (rather sensitive) email.

On closer inspection, though, her email seemed a little strange. It was in a new font, and the text displayed several AI “tells,” including liberal use of the Americanized em dash (we’re both from the UK), the signature impersonal style, and the habit of addressing each point made in the original email line by line.

My positive feelings quickly drained away, to be replaced by disappointment and mistrust, once I realized ChatGPT likely had a hand in drafting the message—which my therapist confirmed when I asked her.

Despite her assurance that she simply dictates longer emails using AI, I still felt uncertainty over the extent to which she, as opposed to the bot, was responsible for the sentiments expressed. I also couldn’t entirely shake the suspicion that she might have pasted my highly personal email wholesale into ChatGPT.

When I took to the internet to see whether others had had similar experiences, I found plenty of examples of people receiving what they suspected were AI-generated communiqués from their therapists. Many, including Declan, had taken to Reddit to solicit emotional support and advice.

So had Hope, 25, who lives on the east coast of the US, and had direct-messaged her therapist about the death of her dog. She soon received a message back. It would have been consoling and thoughtful—expressing how hard it must be “not having him by your side right now”—were it not for the reference to the AI prompt accidentally preserved at the top: “Here’s a more human, heartfelt version with a gentle, conversational tone.”

Hope says she felt “honestly really surprised and confused.” “It was just a very strange feeling,” she says. “Then I started to feel kind of betrayed. … It definitely affected my trust in her.” This was especially problematic, she adds, because “part of why I was seeing her was for my trust issues.”

Hope had believed her therapist to be competent and empathetic, and therefore “never would have suspected her to feel the need to use AI.” Her therapist was apologetic when confronted, and she explained that because she’d never had a pet herself, she’d turned to AI for help expressing the appropriate sentiment. 

A disclosure dilemma 

Betrayal or not, there may be some merit to the argument that AI could help therapists better communicate with their clients. A 2025 study published in PLOS Mental Health asked therapists to use ChatGPT to respond to vignettes describing problems of the kind patients might raise in therapy. Not only was a panel of 830 participants unable to distinguish between the human and AI responses, but AI responses were rated as conforming better to therapeutic best practice. 

However, when participants suspected responses to have been written by ChatGPT, they ranked them lower. (Responses written by ChatGPT but misattributed to therapists received the highest ratings overall.) 

Similarly, Cornell University researchers found in a 2023 study that AI-generated messages can increase feelings of closeness and cooperation between interlocutors, but only if the recipient remains oblivious to the role of AI. The mere suspicion of its use was found to rapidly sour goodwill.

“People value authenticity, particularly in psychotherapy,” says Adrian Aguilera, a clinical psychologist and professor at the University of California, Berkeley. “I think [using AI] can feel like, ‘You’re not taking my relationship seriously.’ Do I ChatGPT a response to my wife or my kids? That wouldn’t feel genuine.”

In 2023, in the early days of generative AI, the online therapy service Koko conducted a clandestine experiment on its users, mixing in responses generated by GPT-3 with ones drafted by humans. They discovered that users tended to rate the AI-generated responses more positively. The revelation that users had unwittingly been experimented on, however, sparked outrage.

The online therapy provider BetterHelp has also been subject to claims that its therapists have used AI to draft responses. In a Medium post, photographer Brendan Keen said his BetterHelp therapist admitted to using AI in their replies, leading to “an acute sense of betrayal” and persistent worry, despite reassurances, that his data privacy had been breached. He ended the relationship thereafter. 

A BetterHelp spokesperson told us the company “prohibits therapists from disclosing any member’s personal or health information to third-party artificial intelligence, or using AI to craft messages to members to the extent it might directly or indirectly have the potential to identify someone.”

All these examples relate to undisclosed AI usage. Aguilera believes time-strapped therapists can make use of LLMs, but transparency is essential. “We have to be up-front and tell people, ‘Hey, I’m going to use this tool for X, Y, and Z’ and provide a rationale,” he says. People then receive AI-generated messages with that prior context, rather than assuming their therapist is “trying to be sneaky.”

Psychologists are often working at the limits of their capacity, and levels of burnout in the profession are high, according to 2023 research conducted by the American Psychological Association. That context makes the appeal of AI-powered tools obvious. 

But lack of disclosure risks permanently damaging trust. Hope decided to continue seeing her therapist, though she stopped working with her a little later for reasons she says were unrelated. “But I always thought about the AI Incident whenever I saw her,” she says.

Risking patient privacy

Beyond the transparency issue, many therapists are leery of using LLMs in the first place, says Margaret Morris, a clinical psychologist and affiliate faculty member at the University of Washington.

“I think these tools might be really valuable for learning,” she says, noting that therapists should continue developing their expertise over the course of their career. “But I think we have to be super careful about patient data.” Morris calls Declan’s experience “alarming.” 

Therapists need to be aware that general-purpose AI chatbots like ChatGPT are not approved by the US Food and Drug Administration and are not HIPAA compliant, says Pardis Emami-Naeini, assistant professor of computer science at Duke University, who has researched the privacy and security implications of LLMs in a health context. (HIPAA is a set of US federal regulations that protect people’s sensitive health information.)

“This creates significant risks for patient privacy if any information about the patient is disclosed or can be inferred by the AI,” she says.

In a recent paper, Emami-Naeini found that many users wrongly believe ChatGPT is HIPAA compliant, creating an unwarranted sense of trust in the tool. “I expect some therapists may share this misconception,” she says.

As a relatively open person, Declan says, he wasn’t completely distraught to learn how his therapist was using ChatGPT. “Personally, I am not thinking, ‘Oh, my God, I have deep, dark secrets,’” he said. But it did still feel violating: “I can imagine that if I was suicidal, or on drugs, or cheating on my girlfriend … I wouldn’t want that to be put into ChatGPT.”

When using AI to help with email, “it’s not as simple as removing obvious identifiers such as names and addresses,” says Emami-Naeini. “Sensitive information can often be inferred from seemingly nonsensitive details.”

She adds, “Identifying and rephrasing all potential sensitive data requires time and expertise, which may conflict with the intended convenience of using AI tools. In all cases, therapists should disclose their use of AI to patients and seek consent.” 

A growing number of companies, including Heidi Health, Upheal, Lyssn, and Blueprint, are marketing specialized tools to therapists, such as AI-assisted note-taking, training, and transcription services. These companies say they are HIPAA compliant and store data securely using encryption and pseudonymization where necessary. But many therapists are still wary of the privacy implications—particularly of services that necessitate the recording of entire sessions.

“Even if privacy protections are improved, there is always some risk of information leakage or secondary uses of data,” says Emami-Naeini.

A 2020 hack on a Finnish mental health company, which resulted in tens of thousands of clients’ treatment records being accessed, serves as a warning. People on the list were blackmailed, and subsequently the entire trove was publicly released, revealing extremely sensitive details such as peoples’ experiences of child abuse and addiction problems.

What therapists stand to lose

In addition to violation of data privacy, other risks are involved when psychotherapists consult LLMs on behalf of a client. Studies have found that although some specialized therapy bots can rival human-delivered interventions, advice from the likes of ChatGPT can cause more harm than good.

A recent Stanford University study, for example, found that chatbots can fuel delusions and psychopathy by blindly validating a user rather than challenging them, as well as suffer from biases and engage in sycophancy. The same flaws could make it risky for therapists to consult chatbots on behalf of their clients. They could, for example, baselessly validate a therapist’s hunch, or lead them down the wrong path.

Aguilera says he has played around with tools like ChatGPT while teaching mental health trainees, such as by entering hypothetical symptoms and asking the AI chatbot to make a diagnosis. The tool will produce lots of possible conditions, but it’s rather thin in its analysis, he says. The American Counseling Association recommends that AI not be used for mental health diagnosis at present.

A study published in 2024 of an earlier version of ChatGPT similarly found it was too vague and general to be truly useful in diagnosis or devising treatment plans, and it was heavily biased toward suggesting people seek cognitive behavioral therapy as opposed to other types of therapy that might be more suitable.

Daniel Kimmel, a psychiatrist and neuroscientist at Columbia University, conducted experiments with ChatGPT where he posed as a client having relationship troubles. He says he found the chatbot was a decent mimic when it came to “stock-in-trade” therapeutic responses, like normalizing and validating, asking for additional information, or highlighting certain cognitive or emotional associations.

However, “it didn’t do a lot of digging,” he says. It didn’t attempt “to link seemingly or superficially unrelated things together into something cohesive … to come up with a story, an idea, a theory.”

“I would be skeptical about using it to do the thinking for you,” he says. Thinking, he says, should be the job of therapists.

Therapists could save time using AI-powered tech, but this benefit should be weighed against the needs of patients, says Morris: “Maybe you’re saving yourself a couple of minutes. But what are you giving away?”