Researchers taught robots to run. Now they’re teaching them to walk

We’ve all seen videos over the past few years demonstrating how agile humanoid robots have become, running and jumping with ease. We’re no longer surprised by this kind of agility—in fact, we’ve grown to expect it.

The problem is, these shiny demos lack real-world applications. When it comes to creating robots that are useful and safe around humans, the fundamentals of movement are more important. As a result, researchers are using the same techniques to train humanoid robots to achieve much more modest goals. 

Alan Fern, a professor of computer science at Oregon State University, and a team of researchers have successfully trained a humanoid robot called Digit V3 to stand, walk, pick up a box, and move it from one location to another. Meanwhile, a separate group of researchers from the University of California, Berkeley, have focused on teaching Digit to walk in unfamiliar environments while carrying different loads, without toppling over. Their research is published in a paper in Science Robotics today. 

Both groups are using an AI technique called sim-to-real reinforcement learning, a burgeoning method of training two-legged robots like Digit. Researchers believe it will lead to more robust, reliable two-legged machines capable of interacting with their surroundings more safely—as well as learning much more quickly.

Sim-to-real reinforcement learning involves training AI models to complete certain tasks in simulated environments billions of times before a robot powered by the model attempts to complete them in the real world. What would take years for a robot to learn in real life can take just days thanks to repeated trial-and-error testing in simulations.

A neural network guides the robot using a mathematical reward function, a technique that rewards the robot with a large number every time it moves closer to its target location or completes its goal behavior. If it does something it’s not supposed to do, like falling down, it’s “punished” with a negative number, so it learns to avoid these motions over time.

In previous projects, researchers from the University of Oregon had used the same reinforcement learning technique to teach a two-legged robot named Cassie to run. The approach paid off—Cassie became the first robot to run an outdoor 5K before setting a Guinness World Record for the fastest bipedal robot to run 100 meters and mastering the ability to jump from one location to another with ease.

Training robots to behave in athletic ways requires them to develop really complex skills in very narrow environments, says Ilija Radosavovic, a PhD student at Berkleley who trained Digit to carry a wide range of loads and stabilize itself when poked with a stick. “We’re sort of the opposite—focusing on fairly simple skills in broad environments.”

This new wave of research in humanoid robotics is less concerned with speed and ability, and more focused on making machines robust and able to adapt—which is ultimately what’s needed to make them useful in the real world. Humanoid robots remain a relative rarity in work environments, as they often struggle to balance while carrying heavy objects. This is why most robots designed to lift objects of varying weights in factories and warehouses tend to have four legs or larger, more stable bases. But researchers hope to change that by making humanoid robots more reliable using AI techniques. 

Reinforcement learning will usher in a “new, much more flexible and faster way for training these types of manipulation skills,” Fern says. He and his team are due to present their findings at ICRA, the International Conference on Robotics and Automation, in Japan next month.

The ultimate goal is for a human to be able to show the robot a video of the desired task, like picking up a box from one shelf and pushing it onto another higher shelf, and then have the robot do it without requiring any further instruction, says Fern.

Getting robots to observe, copy, and quickly learn these kinds of behaviors would be really useful, but it still remains a challenge, says Lerrel Pinto, an assistant professor of computer science at New York University, who was not involved in the research. “If that could be done, I would be very impressed by that,” he says. “These are hard problems.”

Is robotics about to have its own ChatGPT moment?

Silent. Rigid. Clumsy.

Henry and Jane Evans are used to awkward houseguests. For more than a decade, the couple, who live in Los Altos Hills, California, have hosted a slew of robots in their home. 

In 2002, at age 40, Henry had a massive stroke, which left him with quadriplegia and an inability to speak. Since then, he’s learned how to communicate by moving his eyes over a letter board, but he is highly reliant on caregivers and his wife, Jane. 

Henry got a glimmer of a different kind of life when he saw Charlie Kemp on CNN in 2010. Kemp, a robotics professor at Georgia Tech, was on TV talking about PR2, a robot developed by the company Willow Garage. PR2 was a massive two-armed machine on wheels that looked like a crude metal butler. Kemp was demonstrating how the robot worked, and talking about his research on how health-care robots could help people. He showed how the PR2 robot could hand some medicine to the television host.    

“All of a sudden, Henry turns to me and says, ‘Why can’t that robot be an extension of my body?’ And I said, ‘Why not?’” Jane says. 

There was a solid reason why not. While engineers have made great progress in getting robots to work in tightly controlled environments like labs and factories, the home has proved difficult to design for. Out in the real, messy world, furniture and floor plans differ wildly; children and pets can jump in a robot’s way; and clothes that need folding come in different shapes, colors, and sizes. Managing such unpredictable settings and varied conditions has been beyond the capabilities of even the most advanced robot prototypes. 

That seems to finally be changing, in large part thanks to artificial intelligence. For decades, roboticists have more or less focused on controlling robots’ “bodies”—their arms, legs, levers, wheels, and the like—via purpose-­driven software. But a new generation of scientists and inventors believes that the previously missing ingredient of AI can give robots the ability to learn new skills and adapt to new environments faster than ever before. This new approach, just maybe, can finally bring robots out of the factory and into our homes. 

Progress won’t happen overnight, though, as the Evanses know far too well from their many years of using various robot prototypes. 

PR2 was the first robot they brought in, and it opened entirely new skills for Henry. It would hold a beard shaver and Henry would move his face against it, allowing him to shave and scratch an itch by himself for the first time in a decade. But at 450 pounds (200 kilograms) or so and $400,000, the robot was difficult to have around. “It could easily take out a wall in your house,” Jane says. “I wasn’t a big fan.”

More recently, the Evanses have been testing out a smaller robot called Stretch, which Kemp developed through his startup Hello Robot. The first iteration launched during the pandemic with a much more reasonable price tag of around $18,000. 

Stretch weighs about 50 pounds. It has a small mobile base, a stick with a camera dangling off it, and an adjustable arm featuring a gripper with suction cups at the ends. It can be controlled with a console controller. Henry controls Stretch using a laptop, with a tool that that tracks his head movements to move a cursor around. He is able to move his thumb and index finger enough to click a computer mouse. Last summer, Stretch was with the couple for more than a month, and Henry says it gave him a whole new level of autonomy. “It was practical, and I could see using it every day,” he says. 

a robot arm holds a brush over the head of Henry Evans which rests on a pillow
Henry Evans used the Stretch robot to brush his hair, eat, and even
play with his granddaughter.
PETER ADAMS

Using his laptop, he could get the robot to brush his hair and have it hold fruit kebabs for him to snack on. It also opened up Henry’s relationship with his granddaughter Teddie. Before, they barely interacted. “She didn’t hug him at all goodbye. Nothing like that,” Jane says. But “Papa Wheelie” and Teddie used Stretch to play, engaging in relay races, bowling, and magnetic fishing. 

Stretch doesn’t have much in the way of smarts: it comes with some pre­installed software, such as the web interface that Henry uses to control it, and other capabilities such as AI-enabled navigation. The main benefit of Stretch is that people can plug in their own AI models and use them to do experiments. But it offers a glimpse of what a world with useful home robots could look like. Robots that can do many of the things humans do in the home—tasks such as folding laundry, cooking meals, and cleaning—have been a dream of robotics research since the inception of the field in the 1950s. For a long time, it’s been just that: “Robotics is full of dreamers,” says Kemp.

But the field is at an inflection point, says Ken Goldberg, a robotics professor at the University of California, Berkeley. Previous efforts to build a useful home robot, he says, have emphatically failed to meet the expectations set by popular culture—think the robotic maid from The Jetsons. Now things are very different. Thanks to cheap hardware like Stretch, along with efforts to collect and share data and advances in generative AI, robots are getting more competent and helpful faster than ever before. “We’re at a point where we’re very close to getting capability that is really going to be useful,” Goldberg says. 

Folding laundry, cooking shrimp, wiping surfaces, unloading shopping baskets—today’s AI-powered robots are learning to do tasks that for their predecessors would have been extremely difficult. 

Missing pieces

There’s a well-known observation among roboticists: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Called Moravec’s paradox, it was first articulated in the 1980s by Hans Moravec, thena roboticist at the Robotics Institute of Carnegie Mellon University. A robot can play chess or hold an object still for hours on end with no problem. Tying a shoelace, catching a ball, or having a conversation is another matter. 

There are three reasons for this, says Goldberg. First, robots lack precise control and coordination. Second, their understanding of the surrounding world is limited because they are reliant on cameras and sensors to perceive it. Third, they lack an innate sense of practical physics. 

“Pick up a hammer, and it will probably fall out of your gripper, unless you grab it near the heavy part. But you don’t know that if you just look at it, unless you know how hammers work,” Goldberg says. 

On top of these basic considerations, there are many other technical things that need to be just right, from motors to cameras to Wi-Fi connections, and hardware can be prohibitively expensive. 

Mechanically, we’ve been able to do fairly complex things for a while. In a video from 1957, two large robotic arms are dexterous enough to pinch a cigarette, place it in the mouth of a woman at a typewriter, and reapply her lipstick. But the intelligence and the spatial awareness of that robot came from the person who was operating it. 

In a video from 1957, a man operates two large robotic arms and uses the machine to apply a woman’s lipstick. Robots
have come a long way since.
“LIGHTER SIDE OF THE NEWS –ATOMIC ROBOT A HANDY GUY” (1957) VIA YOUTUBE

“The missing piece is: How do we get software to do [these things] automatically?” says Deepak Pathak, an assistant professor of computer science at Carnegie Mellon.  

Researchers training robots have traditionally approached this problem by planning everything the robot does in excruciating detail. Robotics giant Boston Dynamics used this approach when it developed its boogying and parkouring humanoid robot Atlas. Cameras and computer vision are used to identify objects and scenes. Researchers then use that data to make models that can be used to predict with extreme precision what will happen if a robot moves a certain way. Using these models, roboticists plan the motions of their machines by writing a very specific list of actions for them to take. The engineers then test these motions in the laboratory many times and tweak them to perfection. 

This approach has its limits. Robots trained like this are strictly choreographed to work in one specific setting. Take them out of the laboratory and into an unfamiliar location, and they are likely to topple over. 

Compared with other fields, such as computer vision, robotics has been in the dark ages, Pathak says. But that might not be the case for much longer, because the field is seeing a big shake-up. Thanks to the AI boom, he says, the focus is now shifting from feats of physical dexterity to building “general-purpose robot brains” in the form of neural networks. Much as the human brain is adaptable and can control different aspects of the human body, these networks can be adapted to work in different robots and different scenarios. Early signs of this work show promising results. 

Robots, meet AI 

For a long time, robotics research was an unforgiving field, plagued by slow progress. At the Robotics Institute at Carnegie Mellon, where Pathak works, he says, “there used to be a saying that if you touch a robot, you add one year to your PhD.” Now, he says, students get exposure to many robots and see results in a matter of weeks.

What separates this new crop of robots is their software. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. At the same time, new, cheaper hardware, such as off-the-shelf components and robots like Stretch, is making this sort of experimentation more accessible. 

Broadly speaking, there are two popular ways researchers are using AI to train robots. Pathak has been using reinforcement learning, an AI technique that allows systems to improve through trial and error, to get robots to adapt their movements in new environments. This is a technique that Boston Dynamics has also started using  in its robot “dogs” called Spot.

Deepak Pathak’s team at Carnegie Mellon has used an AI technique called reinforcement learning to create a robotic dog that can do extreme parkour with minimal pre-programming.

In 2022, Pathak’s team used this method to create four-legged robot “dogs” capable of scrambling up steps and navigating tricky terrain. The robots were first trained to move around in a general way in a simulator. Then they were set loose in the real world, with a single built-in camera and computer vision software to guide them. Other similar robots rely on tightly prescribed internal maps of the world and cannot navigate beyond them.

Pathak says the team’s approach was inspired by human navigation. Humans receive information about the surrounding world from their eyes, and this helps them instinctively place one foot in front of the other to get around in an appropriate way. Humans don’t typically look down at the ground under their feet when they walk, but a few steps ahead, at a spot where they want to go. Pathak’s team trained its robots to take a similar approach to walking: each one used the camera to look ahead. The robot was then able to memorize what was in front of it for long enough to guide its leg placement. The robots learned about the world in real time, without internal maps, and adjusted their behavior accordingly. At the time, experts told MIT Technology Review the technique was a “breakthrough in robot learning and autonomy” and could allow researchers to build legged robots capable of being deployed in the wild.   

Pathak’s robot dogs have since leveled up. The team’s latest algorithm allows a quadruped robot to do extreme parkour. The robot was again trained to move around in a general way in a simulation. But using reinforcement learning, it was then able to teach itself new skills on the go, such as how to jump long distances, walk on its front legs, and clamber up tall boxes twice its height. These behaviors were not something the researchers programmed. Instead, the robot learned through trial and error and visual input from its front camera. “I didn’t believe it was possible three years ago,” Pathak says. 

In the other popular technique, called imitation learning, models learn to perform tasks by, for example, imitating the actions of a human teleoperating a robot or using a VR headset to collect data on a robot. It’s a technique that has gone in and out of fashion over decades but has recently become more popular with robots that do manipulation tasks, says Russ Tedrake, vice president of robotics research at the Toyota Research Institute and an MIT professor.

By pairing this technique with generative AI, researchers at the Toyota Research Institute, Columbia University, and MIT have been able to quickly teach robots to do many new tasks. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

The idea is to start with a human, who manually controls the robot to demonstrate behaviors such as whisking eggs or picking up plates. Using a technique called diffusion policy, the robot is then able to use the data fed into it to learn skills. The researchers have taught robots more than 200 skills, such as peeling vegetables and pouring liquids, and say they are working toward teaching 1,000 skills by the end of the year. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks. 

The Toyota Research Institute team hopes this will one day lead to “large behavior models,” which are analogous to large language models, says Tedrake. “A lot of people think behavior cloning is going to get us to a ChatGPT moment for robotics,” he says. 

In a similar demonstration, earlier this year a team at Stanford managed to use a relatively cheap off-the-shelf robot costing $32,000 to do complex manipulation tasks such as cooking shrimp and cleaning stains. It learned those new skills quickly with AI. 

Called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), the robot learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks, such as tearing off a paper towel or piece of tape. The Stanford researchers found that AI can help robots acquire transferable skills: training on one task can improve its performance for others.

While the current generation of generative AI works with images and language, researchers at the Toyota Research Institute, Columbia University, and MIT believe the approach can extend to the domain of robot motion.

This is all laying the groundwork for robots that can be useful in homes. Human needs change over time, and teaching robots to reliably do a wide range of tasks is important, as it will help them adapt to us. That is also crucial to commercialization—first-generation home robots will come with a hefty price tag, and the robots need to have enough useful skills for regular consumers to want to invest in them. 

For a long time, a lot of the robotics community was very skeptical of these kinds of approaches, says Chelsea Finn, an assistant professor of computer science and electrical engineering at Stanford University and an advisor for the Mobile ALOHA project. Finn says that nearly a decade ago, learning-based approaches were rare at robotics conferences and disparaged in the robotics community. “The [natural-language-processing] boom has been convincing more of the community that this approach is really, really powerful,” she says. 

There is one catch, however. In order to imitate new behaviors, the AI models need plenty of data. 

More is more

Unlike chatbots, which can be trained by using billions of data points hoovered from the internet, robots need data specifically created for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded, says Lerrel Pinto, an assistant professor of computer science at New York University. Right now that data is very scarce, and it takes a long time for humans to collect.

top frame shows a person recording themself opening a kitchen drawer with a grabber, and the bottom shows a robot attempting the same action

“ON BRINGING ROBOTS HOME,” NUR MUHAMMAD (MAHI) SHAFIULLAH, ET AL.

Some researchers are trying to use existing videos of humans doing things to train robots, hoping the machines will be able to copy the actions without the need for physical demonstrations. 

Pinto’s lab has also developed a neat, cheap data collection approach that connects robotic movements to desired actions. Researchers took a reacher-grabber stick, similar to ones used to pick up trash, and attached an iPhone to it. Human volunteers can use this system to film themselves doing household chores, mimicking the robot’s view of the end of its robotic arm. Using this stand-in for Stretch’s robotic arm and an open-source system called DOBB-E, Pinto’s team was able to get a Stretch robot to learn tasks such as pouring from a cup and opening shower curtains with just 20 minutes of iPhone data.  

But for more complex tasks, robots would need even more data and more demonstrations.  

The requisite scale would be hard to reach with DOBB-E, says Pinto, because you’d basically need to persuade every human on Earth to buy the reacher-­grabber system, collect data, and upload it to the internet. 

A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.  

Sergey Levine, a computer scientist at UC Berkeley who participated in the project, says the goal was to create a “robot internet” by collecting data from labs around the world. This would give researchers access to bigger, more scalable, and more diverse data sets. The deep-learning revolution that led to the generative AI of today started in 2012 with the rise of ImageNet, a vast online data set of images. The Open X-Embodiment Collaboration is an attempt by the robotics community to do something similar for robot data. 

Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. 

When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.

“I don’t think anybody saw that coming,” says Vincent Vanhoucke, Google DeepMind’s head of robotics. “Suddenly there is a path to basically leveraging all these other sources of data to bring about very intelligent behaviors in robotics.”

Many roboticists think that large vision-language models, which are able to analyze image and language data, might offer robots important hints as to how the surrounding world works, Vanhoucke says. They offer semantic clues about the world and could help robots with reasoning, deducing things, and learning by interpreting images. To test this, researchers took a robot that had been trained on the larger model and asked it to point to a picture of Taylor Swift. The researchers had not shown the robot pictures of Swift, but it was still able to identify the pop star because it had a web-scale understanding of who she was even without photos of her in its data set, says Vanhoucke.

RT-2, a recent model for robotic control, was trained on online text
and images as well as interactions with the real world.
KELSEY MCCLELLAN

Vanhoucke says Google DeepMind is increasingly using techniques similar to those it would use for machine translation to translate from English to robotics. Last summer, Google introduced a vision-language-­action model called RT-2. This model gets its general understanding of the world from online text and images it has been trained on, as well as its own interactions in the real world. It translates that data into robotic actions. Each robot has a slightly different way of translating English into action, he adds.  

“We increasingly feel like a robot is essentially a chatbot that speaks robotese,” Vanhoucke says. 

Baby steps

Despite the fast pace of development, robots still face many challenges before they can be released into the real world. They are still way too clumsy for regular consumers to justify spending tens of thousands of dollars on them. Robots also still lack the sort of common sense that would allow them to multitask. And they need to move from just picking things up and placing them somewhere to putting things together, says Goldberg—for example, putting a deck of cards or a board game back in its box and then into the games cupboard. 

But to judge from the early results of integrating AI into robots, roboticists are not wasting their time, says Pinto. 

“I feel fairly confident that we will see some semblance of a general-purpose home robot. Now, will it be accessible to the general public? I don’t think so,” he says. “But in terms of raw intelligence, we are already seeing signs right now.” 

Building the next generation of robots might not just assist humans in their everyday chores or help people like Henry Evans live a more independent life. For researchers like Pinto, there is an even bigger goal in sight.

Home robotics offers one of the best benchmarks for human-level machine intelligence, he says. The fact that a human can operate intelligently in the home environment, he adds, means we know this is a level of intelligence that can be reached. 

“It’s something which we can potentially solve. We just don’t know how to solve it,” he says. 

Evans in the foreground with computer screen.  A table with playing cards separates him from two other people in the room
Thanks to Stretch, Henry Evans was able to hold his own playing cards
for the first time in two decades.
VY NGUYEN

For Henry and Jane Evans, a big win would be to get a robot that simply works reliably. The Stretch robot that the Evanses experimented with is still too buggy to use without researchers present to troubleshoot, and their home doesn’t always have the dependable Wi-Fi connectivity Henry needs in order to communicate with Stretch using a laptop.

Even so, Henry says, one of the greatest benefits of his experiment with robots has been independence: “All I do is lay in bed, and now I can do things for myself that involve manipulating my physical environment.”

Thanks to Stretch, for the first time in two decades, Henry was able to hold his own playing cards during a match. 

“I kicked everyone’s butt several times,” he says. 

“Okay, let’s not talk too big here,” Jane says, and laughs.

This US startup makes a crucial chip material and is taking on a Japanese giant

It can be dizzying to try to understand all the complex components of a single computer chip: layers of microscopic components linked to one another through highways of copper wires, some barely wider than a few strands of DNA. Nestled between those wires is an insulating material called a dielectric, ensuring that the wires don’t touch and short out. Zooming in further, there’s one particular dielectric placed between the chip and the structure beneath it; this material, called dielectric film, is produced in sheets as thin as white blood cells. 

For 30 years, a single Japanese company called Ajinomoto has made billions producing this particular film. Competitors have struggled to outdo them, and today Ajinomoto has more than 90% of the market in the product, which is used in everything from laptops to data centers. 

But now, a startup based in Berkeley, California, is embarking on a herculean effort to dethrone Ajinomoto and bring this small slice of the chipmaking supply chain back to the US.

Thintronics is promising a product purpose-built for the computing demands of the AI era—a suite of new materials that the company claims have higher insulating properties and, if adopted, could mean data centers with faster computing speeds and lower energy costs. 

The company is at the forefront of a coming wave of new US-based companies, spurred by the $280 billion CHIPS and Science Act, that is seeking to carve out a portion of the semiconductor sector, which has become dominated by just a handful of international players. But to succeed, Thintronics and its peers will have to overcome a web of challenges—solving technical problems, disrupting long-standing industry relationships, and persuading global semiconductor titans to accommodate new suppliers. 

“Inventing new materials platforms and getting them into the world is very difficult,” Thintronics founder and CEO Stefan Pastine says. It is “not for the faint of heart.”

The insulator bottleneck

If you recognize the name Ajinomoto, you’re probably surprised to hear it plays a critical role in the chip sector: the company is better known as the world’s leading supplier of MSG seasoning powder. In the 1990s, Ajinomoto discovered that a by-product of MSG made a great insulator, and it has enjoyed a near monopoly in the niche material ever since. 

But Ajinomoto doesn’t make any of the other parts that go into chips. In fact, the insulating materials in chips rely on dispersed supply chains: one layer uses materials from Ajinomoto, another uses material from another company, and so on, with none of the layers optimized to work in tandem. The resulting system works okay when data is being transmitted over short paths, but over longer distances, like between chips, weak insulators act as a bottleneck, wasting energy and slowing down computing speeds. That’s recently become a growing concern, especially as the scale of AI training gets more expensive and consumes eye-popping amounts of energy. (Ajinomoto did not respond to requests for comment.) 

None of this made much sense to Pastine, a chemist who sold his previous company, which specialized in recycling hard plastics, to an industrial chemicals company in 2019. Around that time, he started to believe that the chemicals industry could be slow to innovate, and he thought the same pattern was keeping chipmakers from finding better insulating materials. In the chip industry, he says, insulators have “kind of been looked at as the redheaded stepchild”—they haven’t seen the progress made with transistors and other chip components. 

He launched Thintronics that same year, with the hope that cracking the code on a better insulator could provide data centers with faster computing speeds at lower costs. That idea wasn’t groundbreaking—new insulators are constantly being researched and deployed—but Pastine believed that he could find the right chemistry to deliver a breakthrough. 

Thintronics says it will manufacture different insulators for all layers of the chip, for a system designed to swap into existing manufacturing lines. Pastine tells me the materials are now being tested with a number of industry players. But he declined to provide names, citing nondisclosure agreements, and similarly would not share details of the formula. 

Without more details, it’s hard to say exactly how well the Thintronics materials compare with competing products. The company recently tested its materials’ Dk values, which are a measure of how effective an insulator a material is. Venky Sundaram, a researcher who has founded multiple semiconductor startups but is not involved with Thintronics, reviewed the results. Some of Thintronics’ numbers were fairly average, he says, but their most impressive Dk value is far better than anything available today.

A rocky road ahead

Thintronics’ vision has already garnered some support. The company received a $20 million Series A funding round in March, led by venture capital firms Translink and Maverick, as well as a grant from the US National Science Foundation. 

The company is also seeking funding from the CHIPS Act. Signed into law by President Joe Biden in 2022, it’s designed to boost companies like Thintronics in order to bring semiconductor manufacturing back to American companies and reduce reliance on foreign suppliers. A year after it became law, the administration said that more than 450 companies had submitted statements of interest to receive CHIPS funding for work across the sector. 

The bulk of funding from the legislation is destined for large-scale manufacturing facilities, like those operated by Intel in New Mexico and Taiwan Semiconductor Manufacturing Corporation (TSMC) in Arizona. But US Secretary of Commerce Gina Raimondo has said she’d like to see smaller companies receive funding as well, especially in the materials space. In February, applications opened for a pool of $300 million earmarked specifically for materials innovation. While Thintronics declined to say how much funding it was seeking or from which programs, the company does see the CHIPS Act as a major tailwind.

But building a domestic supply chain for chips—a product that currently depends on dozens of companies around the globe—will mean reversing decades of specialization by different countries. And industry experts say it will be difficult to challenge today’s dominant insulator suppliers, who have often had to adapt to fend off new competition. 

“Ajinomoto has been a 90-plus-percent-market-share material for more than two decades,” says Sundaram. “This is unheard-of in most businesses, and you can imagine they didn’t get there by not changing.”

One big challenge is that the dominant manufacturers have decades-long relationships with chip designers like Nvidia or Advanced Micro Devices, and with manufacturers like TSMC. Asking these players to swap out materials is a big deal.

“The semiconductor industry is very conservative,” says Larry Zhao, a semiconductor researcher who has worked in the dielectrics industry for more than 25 years. “They like to use the vendors they already know very well, where they know the quality.” 

Another obstacle facing Thintronics is technical: insulating materials, like other chip components, are held to manufacturing standards so precise they are difficult to comprehend. The layers where Ajinomoto dominates are thinner than a human hair. The material must also be able to accept tiny holes, which house wires running vertically through the film. Every new iteration is a massive R&D effort in which incumbent companies have the upper hand given their years of experience, says Sundaram.

If all this is completed successfully in a lab, yet another hurdle lies ahead: the material has to retain those properties in a high-volume manufacturing facility, which is where Sundaram has seen past efforts fail.

“I have advised several material suppliers over the years that tried to break into [Ajinomoto’s] business and couldn’t succeed,” he says. “They all ended up having the problem of not being as easy to use in a high-volume production line.” 

Despite all these challenges, one thing may be working in Thintronics’ favor: US-based tech giants like Microsoft and Meta are making headway in designing their own chips for the first time. The plan is to use these chips for in-house AI training as well as for the cloud computing capacity that they rent out to customers, both of which would reduce the industry’s reliance on Nvidia. 

Though Microsoft, Google, and Meta declined to comment on whether they are pursuing advancements in materials like insulators, Sundaram says these firms could be more willing to work with new US startups rather than defaulting to the old ways of making chips: “They have a lot more of an open mind about supply chains than the existing big guys.”

Taking AI to the next level in manufacturing

Few technological advances have generated as much excitement as AI. In particular, generative AI seems to have taken business discourse to a fever pitch. Many manufacturing leaders express optimism: Research conducted by MIT Technology Review Insights found ambitions for AI development to be stronger in manufacturing than in most other sectors.

image of the report cover

Manufacturers rightly view AI as integral to the creation of the hyper-automated intelligent factory. They see AI’s utility in enhancing product and process innovation, reducing cycle time, wringing ever more efficiency from operations and assets, improving maintenance, and strengthening security, while reducing carbon emissions. Some manufacturers that have invested to develop AI capabilities are still striving to achieve their objectives.

This study from MIT Technology Review Insights seeks to understand how manufacturers are generating benefits from AI use cases—particularly in engineering and design and in factory operations. The survey included 300 manufacturers that have begun working with AI. Most of these (64%) are currently researching or experimenting with AI. Some 35% have begun to put AI use cases into production. Many executives that responded to the survey indicate they intend to boost AI spending significantly during the next two years. Those who haven’t started AI in production are moving gradually. To facilitate use-case development and scaling, these manufacturers must address challenges with talents, skills, and data.

Following are the study’s key findings:

  • Talent, skills, and data are the main constraints on AI scaling. In both engineering and design and factory operations, manufacturers cite a deficit of talent and skills as their toughest challenge in scaling AI use cases. The closer use cases get to production, the harder this deficit bites. Many respondents say inadequate data quality and governance also hamper use-case development. Insufficient access to cloud-based compute power is another oft-cited constraint in engineering and design.
  • The biggest players do the most spending, and have the highest expectations. In engineering and design, 58% of executives expect their organizations to increase AI spending by more than 10% during the next two years. And 43% say the same when it comes to factory operations. The largest manufacturers are far more likely to make big increases in investment than those in smaller—but still large—size categories.
  • Desired AI gains are specific to manufacturing functions. The most common use cases deployed by manufacturers involve product design, conversational AI, and content creation. Knowledge management and quality control are those most frequently cited at pilot stage. In engineering and design, manufacturers chiefly seek AI gains in speed, efficiency, reduced failures, and security. In the factory, desired above all is better innovation, along with improved safety and a reduced carbon footprint.
  • Scaling can stall without the right data foundations. Respondents are clear that AI use-case development is hampered by inadequate data quality (57%), weak data integration (54%), and weak governance (47%). Only about one in five manufacturers surveyed have production assets with data ready for use in existing AI models. That figure dwindles as manufacturers put use cases into production. The bigger the manufacturer, the greater the problem of unsuitable data is.
  • Fragmentation must be addressed for AI to scale. Most manufacturers find some modernization of data architecture, infrastructure, and processes is needed to support AI, along with other technology and business priorities. A modernization strategy that improves interoperability of data systems between engineering and design and the factory, and between operational technology (OT) and information technology (IT), is a sound priority.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Open-sourcing generative AI

The views expressed in this video are those of the speakers, and do not represent any endorsement or sponsorship.

Is the open-source approach, which has democratized access to software, ensured transparency, and improved security for decades, now poised to have a similar impact on AI? We dissect the balance between collaboration and control, legal ramifications, ethical considerations, and innovation barriers as the AI industry seeks to democratize the development of large language models.

Explore more from Booz Allen Hamilton on the future of AI


About the speakers

Alison Smith, Director of Generative AI, Booz Allen Hamilton

Alison Smith is a Director of Generative AI at Booz Allen Hamilton where she helps clients address their missions with innovative solutions. Leading Booz Allen’s investments in Generative AI and grounding them in real business needs, Alison employs a pragmatic approach to designing, implementing, and deploying Generative AI that blends existing tools with additional customization. She is also responsible for disseminating best practices and key solutions throughout the firm to ensure that all teams are up-to-date on the latest available tools, solutions, and approaches to common client problems.

In addition to her role at Booz Allen which balances technical solutions and business growth, Alison also enjoys staying connected to and serving her local community. From 2017-2021, Alison served on the board of a non-profit, DC Open Government Coalition (DCOGC), a group that seeks to enhance public access to government information and ensure transparent government operations; in November 2021, Alison was recognized as a Power Woman in Code by DCFemTech.

Alison has an MBA from The University of Chicago Booth School of Business and a BA from Middlebury College.

Tackling AI risks: Your reputation is at stake

Forget Skynet: One of the biggest risks of AI is your organization’s reputation. That means it’s time to put science-fiction catastrophizing to one side and begin thinking seriously about what AI actually means for us in our day-to-day work.

This isn’t to advocate for navel-gazing at the expense of the bigger picture: It’s to urge technologists and business leaders to recognize that if we’re to address the risks of AI as an industry—maybe even as a society—we need to closely consider its immediate implications and outcomes. If we fail to do that, taking action will be practically impossible.

Risk is all about context

Risk is all about context. In fact, one of the biggest risks is failing to acknowledge or understand your context: That’s why you need to begin there when evaluating risk.

This is particularly important in terms of reputation. Think, for instance, about your customers and their expectations. How might they feel about interacting with an AI chatbot? How damaging might it be to provide them with false or misleading information? Maybe minor customer inconvenience is something you can handle, but what if it has a significant health or financial impact?

Even if implementing AI seems to make sense, there are clearly some downstream reputation risks that need to be considered. We’ve spent years talking about the importance of user experience and being customer-focused: While AI might help us here, it could also undermine those things as well.

There’s a similar question to be asked about your teams. AI may have the capacity to drive efficiency and make people’s work easier, but used in the wrong way it could seriously disrupt existing ways of working. The industry is talking a lot about developer experience recently—it’s something I wrote about for this publication—and the decisions organizations make about AI need to improve the experiences of teams, not undermine them.

In the latest edition of the Thoughtworks Technology Radar—a biannual snapshot of the software industry based on our experiences working with clients around the world—we talk about precisely this point. We call out AI team assistants as one of the most exciting emerging areas in software engineering, but we also note that the focus has to be on enabling teams, not individuals. “You should be looking for ways to create AI team assistants to help create the ‘10x team,’ as opposed to a bunch of siloed AI-assisted 10x engineers,” we say in the latest report.

Failing to heed the working context of your teams could cause significant reputational damage. Some bullish organizations might see this as part and parcel of innovation—it’s not. It’s showing potential employees—particularly highly technical ones—that you don’t really understand or care about the work they do.

Tackling risk through smarter technology implementation

There are lots of tools that can be used to help manage risk. Thoughtworks helped put together the Responsible Technology Playbook, a collection of tools and techniques that organizations can use to make more responsible decisions about technology (not just AI).

However, it’s important to note that managing risks—particularly those around reputation—requires real attention to the specifics of technology implementation. This was particularly clear in work we did with an assortment of Indian civil society organizations, developing a social welfare chatbot that citizens can interact with in their native languages. The risks here were not unlike those discussed earlier: The context in which the chatbot was being used (as support for accessing vital services) meant that inaccurate or “hallucinated” information could stop people from getting the resources they depend on.

This contextual awareness informed technology decisions. We implemented a version of something called retrieval-augmented generation to reduce the risk of hallucinations and improve the accuracy of the model the chatbot was running on.

Retrieval-augmented generation features on the latest edition of the Technology Radar. It might be viewed as part of a wave of emerging techniques and tools in this space that are helping developers tackle some of the risks of AI. These range from NeMo Guardrails—an open-source tool that puts limits on chatbots to increase accuracy—to the technique of running large language models (LLMs) locally with tools like Ollama, to ensure privacy and avoid sharing data with third parties. This wave also includes tools that aim to improve transparency in LLMs (which are notoriously opaque), such as Langfuse.

It’s worth pointing out, however, that it’s not just a question of what you implement, but also what you avoid doing. That’s why, in this Radar, we caution readers about the dangers of overenthusiastic LLM use and rushing to fine-tune LLMs.

Rethinking risk

A new wave of AI risk assessment frameworks aim to help organizations consider risk. There is also legislation (including the AI Act in Europe) that organizations must pay attention to. But addressing AI risk isn’t just a question of applying a framework or even following a static set of good practices. In a dynamic and changing environment, it’s about being open-minded and adaptive, paying close attention to the ways that technology choices shape human actions and social outcomes on both a micro and macro scale.

One useful framework is Dominique Shelton Leipzig’s traffic light framework. A red light signals something prohibited—such as discriminatory surveillance—while a green light signals low risk and a yellow light signals caution. I like the fact it’s so lightweight: For practitioners, too much legalese or documentation can make it hard to translate risk to action.

However, I also think it’s worth flipping the framework, to see risks as embedded in contexts, not in the technologies themselves. That way, you’re not trying to make a solution adapt to a given situation, you’re responding to a situation and addressing it as it actually exists. If organizations take that approach to AI—and, indeed, to technology in general—that will ensure they’re meeting the needs of stakeholders and keep their reputations safe.

This content was produced by Thoughtworks. It was not written by MIT Technology Review’s editorial staff.

Scaling customer experiences with data and AI

Today, interactions matter more than ever. According to data compiled by NICE, once a consumer makes a buying decision for a product or service, 80% of their decision to keep doing business with that brand hinges on the quality of their customer service experience, according to NICE research. Enter AI.

“I think AI is becoming a really integral part of every business today because it is finding that sweet spot in allowing businesses to grow while finding key efficiencies to manage that bottom line and really do that at scale,” says vice president of product marketing at NICE, Andy Traba.

When many think of AI and customer experiences, chatbots that give customers more headaches than help often come to mind. However, emerging AI use cases are enabling greater efficiencies than ever. From sentiment analysis to co-pilots to integration throughout the entire customer journey, the evolving era of AI is reducing friction and building better relationships between enterprises and both their employees and customers.

“When we think about bolstering AI capabilities, it’s really about getting the right data to train my models on so that they have those best outcomes.”

Deploying any technology requires a delicate balance between delivering quality solutions without compromising the bottom line. AI integration offers investment returns by scaling customer and employee capabilities, automating tedious and redundant tasks, and offering consistent experiences based on collected and specialized data.

“I think as you’re hopefully venturing into leveraging AI more to improve your business, the key recommendation I would provide is just to focus on those crystal clear high-probability use cases and get those early wins and then reinvest back into the business,” says Traba.

While artificial intelligence has increasingly grabbed headlines in recent years, augmented intelligence—where AI tools are used to enhance human capabilities rather than automate them—is worthy of similar buzz for its potential in the customer experience space, says Traba.

Currently, the customer experience landscape is highly reactive. Looking ahead, Traba foresees a shift to proactive and predictive customer experiences that blend both AI and augmented intelligence. Say a customer’s device is reaching its end-of-life state. Rather than the customer reaching out to a chatbot or contact center, AI tools would flag the device’s issue early and direct the customer to a live chat with a representative, offering both the efficiency of automation and personalized help from a human representative.

“Where I see the future evolving in terms of customer experiences, is being much more proactive with the convergence of data, these advancements of technology, and certainly generative AI,” says Traba.

This episode of Business Lab is produced in partnership with NICE.

Full Transcript

Laurel Ruma: From MIT Technology Review, I’m Laurel Ruma and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

Our topic is building better customer and employee experiences with artificial intelligence. Integrating data and AI solutions into everyday business can help provide insights, create efficiencies, and free up time for employees to work on more complicated issues. And all of this builds a better experience for customers.

Two words for you: augmented intelligence.

My guest is Andy Traba, vice president of product marketing at NICE.

This podcast is produced in partnership with NICE.

Welcome Andy.

Andy Traba: Hi Laurel. Thanks for having me.

Laurel: Well, thanks for being here. So to set some context, could you describe the current state of AI within customer experience? Common use cases that come to mind are chatbots, but what are some other applications for AI in this space?

Andy: Thank you. I think it’s a great question to get started, and I think first and foremost, the use of AI is growing everywhere. Certainly, we had this big boom last year where everybody started talking about AI thanks to ChatGPT and a lot of the advancements with generative AI, and we’re certainly seeing a lot more doing now, moving beyond just talking. So just growing a use case of trying to apply AI everywhere to improve experiences. One of the more popular ones, and this technology has been around for some time, is sentiment analysis. So instead of just proactively surveying customers to ask how are they doing, what was their experience like, using AI models to analyze the conversations they’re having with brands and automatically determine that. And it’s also a good use case, I think, to emphasize the importance of data that goes into the training of AI models.

As you think about sentiment analysis, you want to train those models based on the actual customer experience conversations, maybe past records or even surveys. What you want to avoid is training a sentiment model maybe based on movie reviews or Amazon reviews, something that’s not really well connected. So certainly sentiment analysis is a very popular use case that goes beyond just chatbots.

Two other ones I’ll bring up are co-pilots. We’ve seen, certainly, a lot of recent news with the launch of Microsoft Copilot and other forms for copilots within the contact center and certainly helping customer service agents. It’s a very popular use case that we see. The reason driving that demand is the types of conversations that are getting to agents today are much more complex. AI has done a good job of taking away the easy stuff. We no longer have to call into a contact center to reset our passwords, so what’s left over for the agents is much more difficult types of interactions.So being able to assist them in real time with prompts and guidance and recommending knowledge articles to make their job easier and more effective is really popular.

And then the third and final one just on this question is the really kind of rise of AI-driven journeys. Many, many years ago, you and I would call into a contact center, and the only channel we could use was voice. Today, those channels have exploded. There’s social media, there’s messaging, there’s voice, there’s AI assistance that we can chat with. So being able to orchestrate or navigate a customer effectively through that journey and recommend the next best action or the next best channel for them to reduce that complexity is really in demand as well. And how can I even get to a point where I can proactively engage with them on the channel of their choice at the time of day that we’re likely to get a response is certainly an area that we see AI playing an important role today, but even more so in the future those three really sentiment analysis, the rise of co-pilots and then using AI across the entire customer journey.

Laurel: So as AI becomes more popular across enterprises and across industries, why is integrating AI and customer experience then so crucial for today’s business landscape?

Andy: I think it’s so crucial today because it’s finding this sweet spot in terms of business decision-making. When we think of business decision-making, we are often challenged with, am I going to focus on revenue or cost cutting? Am I going to focus on building new products or perfecting my existing products? And rarely has there been a technology that has allowed a business to achieve all of those at once. But we’re seeing that today with AI finding a sweet spot where I can improve revenue and keep customers happy and renewing or even gain new ones without having to spend additional money. I could even do that in a more efficient way with AI. Within AI, I can take a very innovative approach and produce new products that my customers demand and save time and money through efficiencies in making my current products better. I think AI is becoming a really integral part of every business today because it is finding that sweet spot in allowing businesses to grow while finding key efficiencies to manage that bottom line and really do that at scale.

Laurel: And speaking of those efficiencies, employee experience lays that foundation for the customer. But based on your time at NICE and within business operations, how does employee experience affect the overall experience then for customers?

Andy: I think what we’ve seen at NICE is really that customer experience and employee experience are hand in glove. They’re one and the same. They have tremendous correlation between each other. Some examples, just to give some anecdotes, and this is customer experience really happening everywhere. If you go into a car dealership for a Tesla or a BMW, a high-end product, but you are interacting with a salesperson who’s a little pushy or maybe just having a bad day, it’s going to deteriorate the overall customer experience, so that bad employee experience causes a negative effect. Same thing if you go to your favorite local restaurant, but you maybe have a new server who’s not really well-trained or is still figuring out the menu and the logistics that’s going to have a negative spillover effect. And then even on the flip side of that, you can see employee experience having a positive effect on their overall customer experience.

If employees are engaged and they have the right information and the right tools, they can turn a negative into a positive. Think of airlines, a very commoditized industry right now, but if you have a problem with your flight and it got canceled and you have a critical moment of need, that employee from that airline could really turn that experience around by finding a new flight, booking you, making sure that you are on your trip and meeting your destination on time or without very little delay. So I think when we think about experiences at large and the employee and the customer outcomes are very much tied together, we’ve done research here at NICE on this exact topic, and what we found was once a consumer makes a buying decision for a particular product or service, after that point, 80% of that consumer’s decision to continue doing business with that brand is based on the quality of their interactions.

So how those conversations play out, plays a very, very important part of whether or not they will continue doing business with that brand. Today, interactions matter more than ever. To conclude on this question, one of my favorite quotes, customer experience today isn’t just part of the business, it is the business. And I think employees play a really important front role in achieving that.

Laurel: That certainly makes sense. 80% is a huge number, and I think of that in my own experiences, but could you explain the difference between artificial intelligence and augmented intelligence and also how they overlap?

Andy: Yeah, it’s a great question. I think today artificial intelligence is certainly capturing all of the buzz, but what I think is just as buzzworthy is augmented intelligence. So let’s start by defining the two. So artificial intelligence refers to machines mimicking human cognition. And when we think about customer experience, there’s really no better example of that than chatbots or virtual assistants. Technology that allows you to interact with the brand 365 24/7 at any time that you need, and it’s mimicking the conversations that you would normally have with a live human customer service representative. Augmented intelligence on the other hand, is really about AI enhancing human capabilities, increasing the cognitive load of an individual, allowing them to do more with less, saving them time. I think in the domain of customer experience, co-pilots are becoming a very popular example here. How can co-pilots make recommendations, generate responses, automate a lot of the mundane tasks that humans just don’t like to do and frankly aren’t good at?

So I think there’s a clear distinction then between artificial intelligence, really those machines taking on the human capabilities 100% versus augmented, not replacing humans, but lifting them up, allowing them to do more. And where there’s overlap, and I think we’re going to see this trend really start accelerating in the years to come in customer experiences is the blend between those two as we’re interacting with a brand. And what I mean by that is maybe starting out by having a conversation with an intelligent virtual agent, a chatbot, and then seamlessly blending into a human live customer representative to play a specialized role. So maybe as I’m researching a new product to buy such as a cell phone online, I can be able to ask the chatbot some questions and it’s referring to its knowledge base and its past interactions to answer those. But when it’s time to ask a very specific question, I might be elevated to a customer service representative for that brand, just might choose to say, “Hey, when it’s time to buy, I want to ensure you’re speaking to a live individual.” So I think there’s going to be a blend or a continuum, if you will, of these types of interactions you have. And I think we’re going to get to a point where very soon we might not even know is it a human on the other end of that digital interaction or just a machine chatting back and forth? But I think those two concepts, artificial intelligence and augmented intelligence are certainly here to stay and driving improvements in customer experience at scale with brands.

Laurel: Well, there’s the customer journey, but then there’s also the AI journey, and most of those journeys start with data. So internally, what is the process of bolstering AI capabilities in terms of data, and how does data play a role in enhancing both employee and customer experiences?

Andy: I think in today’s age, it’s common understanding really that AI is only as good as the data it’s trained on. Quick anecdote, if I’m an AI engineer and I’m trying to predict what movies people will watch, so I can drive engagement into my movie app, I’m going to want data. What movies have people watched in the past and what did they like? Similarly in customer experience, if I’m trying to predict the best outcome of that interaction, I want CX data. I want to know what’s gone well in the past on these interactions, what’s gone poorly or wrong? I don’t want data that’s just available on the public internet. I need specialized CX data for my AI models. When we think about bolstering AI capabilities, it’s really about getting the right data to train my models on so that they have those best outcomes.

And going back to the example I brought in around sentiment, I think that reinforces the need to ensure that when we’re training AI models for customer experience, it’s done off of rich CX datasets and not just publicly available information like some of the more popular large language models are using.

And I think about how data plays a role in enhancing employee and customer experiences. There’s a strategy that’s important to derive new information or derive new data from those unstructured data sets that often these contact centers and experience centers have. So when we think about a conversation, it’s very open-ended, right? It could go many ways. It is not often predictable and it’s very hard to understand it at the surface where AI and advanced machine learning techniques can help though is deriving new information from those conversations such as what was the consumer’s sentiment level at the beginning of the conversation versus the end. What actions did the agent take that either drove positive trends in that sentiment or negative trends? How did all of these elements play out? And very quickly you can go from taking large unstructured data sets that might not have a lot of information or signals in them to very large data sets that are rich and contain a lot of signals and deriving that new information or understanding, how I like to think of it, the chemistry of that conversation is playing a very critical role I think in AI powering customer experiences today to ensure that those experiences are trusted, they’re done right, and they’re built on consumer data that can be trusted, not public information that doesn’t really help drive a positive customer experience.

Laurel: Getting back to your idea of customer experience is the business. One of the major questions that most organizations face with technology deployment is how to deliver quality customer experiences without compromising the bottom line. So how can AI move the needle in this way in that positive territory?

Andy: Yeah, I think if there’s one word to think about when it comes to AI moving the bottom line, it’s scale. I think how we think of things is really all about scale, allowing humans or employees to do more, whether that’s by increasing their cognitive load, saving them time, allowing things to be more efficient. Again, that’s referring back to that augmented intelligence. And then when we go through artificial intelligence thinking all about automation. So how can we offer customer experience 365 24/7? How can allowing consumers to reach out to a brand at any time that’s convenient boost that customer experience? So doing both of those tactics in a way that moves the bottom line and drives results is important. I think there’s a third one though that isn’t receiving enough attention, and that’s consistency. So we can allow employees to do more. We can automate their tasks to provide more capacity, but we also have to provide consistent, positive experiences.

And where AI and machine learning really help here is finding areas of variability, finding not only the areas of variability but then also the root cause or the driver of those variabilities to close those gaps. And a brand I’ll give a shout out to who I think does this incredibly well is Starbucks. I can go to a Starbucks in any location around the world and order an iced caramel macchiato, and I’m going to get that same drink experience regardless of the thousands of Starbucks locations. And I think that consistency plays a really powerful role in the overall customer experience of Starbucks’ brand. And when you think about the logistics of doing that at scale, it’s incredibly complex and challenging. If you have the data and you have the right tools and the AI, finding those gaps and offering more consistent experiences is incredibly powerful.

Laurel: So could you share some practical strategies and best practices for organizations to leverage AI to empower employees, foster positive and productive work environments, and then also all of this would ultimately improve customer interactions?

Andy: Yeah, I think the overall positive, going back to earlier in our conversation is there are many use cases. AI has a tremendous opportunity in this space. The recommendation I would provide is to focus first on a crystal clear, high-probability use case for your business. Auto summary or the automated note-taking of agents after call work is becoming an increasingly popular one that we’re seeing in the space. And I think the reasons for it are really clear. It’s a win-win-win for the employee, the customer, and the business. It’s a win for the employee because AI is going to automate something that is mundane for them or very procedural. If you think of a customer service representative, they’re taking 40, 50 maybe in upwards of 60 conversations a day during their job, taking notes of what was talked about. What are action items? Very complicated, mundane, tiresome even. They don’t like doing it.

So AI can offload that activity from them, which is a win for the employee. It’s a win for the customer as a lot of times the agents are great at note-taking, especially when they’re doing that so often, which can lead to that unfortunate experience where you have to call back as a consumer and repeat yourself because the agent you’re now talking to can’t understand or doesn’t have good information about what you called or interacted with previously. So from a consumer experience, it helps them because they have to repeat themselves less often. The agent that they’re currently speaking with can offer a more personalized service because they have better notes or history of past interactions.

And then finally, the third win, it’s really good for the business because you’re saving time and money that the agents no longer have to manually do something. We see that 30 to 60 seconds of note-taking at a business with 1,000 employees adds up to be millions of dollars every year. So there’s a clear-cut business case for the business to achieve results, improve customer experience, and improve employee experience at the same time. I think as you’re hopefully venturing into leveraging AI more to improve your business, the key recommendation I would provide is just to focus on those crystal clear high-probability use cases and get those early wins and then reinvest back into the business.

Laurel: Yeah, I think those are the positive aspects of that, but concerns about job loss due to automation tend to crop up with AI deployment. So what are the opportunities that AI integration can provide for organizations and their employees so it’s a win-win for everybody?

Andy: And certainly empathetic to this topic. As with all new technologies, whenever there’s excitement around them, there’s also this uncertainty of what will those long-term outcomes be? But I think when we historically look back, all transformative technologies have boosted GDP and they’ve created more jobs. And so I see no reason to believe this time around will be different. Now those jobs might be different and new roles will emerge. When it comes to customer experience and the employee experience one interesting theory I’m following is, if you think about Apple, they had a really revolutionary model where they branded their employees geniuses. So you’d go into an Apple store and you would speak to a genius, and that model carried through all of their physical flagship stores. A very positive model. Back in the day, people would actually pay money to go speak to a genius or get a priority customer service slot but a model that’s really hard to scale and a model that hasn’t been successful in a virtual environment.

I think when we see AI and a lot of these new technology advancements though, that’s a prime example of maybe a new job that does emerge where if AI is offloading a lot of the interactions to chatbots, what do customer service agents do? Maybe they become geniuses where they’re playing a more proactive, high-value add back to consumers and overall improving the service and the experience there. So I do think that AI will have job shifts, but overall there’ll be a net positive just like there has been with all past transformative technologies.

Laurel: Continuing that look ahead, how do you see the era of AI evolving in terms of customer and employee experience? What excites you about the future in this space?

Andy: This is actually what I’m most excited about is when we think about customer experience today, it’s highly reactive. As a consumer, if I have a problem, I search your website, I interact with your chatbot, I end up talking to a live customer service representative. The consumer is the driving force of everything and the business or the brand is having to be reactive to them. Where I see the future evolving in terms of customer experiences, is being much more proactive with the convergence of data, these advancements of technology, and certainly generative AI. I do see AI becoming smarter and being more predictive and proactive to alert that there is going to be a problem before the consumer actually is experiencing it and to take action on that proactively before that problem manifests itself.

And just a quick example of maybe there’s a media or a cable company where a device is reaching its end-of-life state, so rather than it have it go on the fritz the day of the Super Bowl, reach out, be proactive, contact that individual, give them specific instructions to follow. And I think that’s really where we see the advancements of not only big data, AI, but just the abundance of the ability to reach out in preferred channels, whether that’s a simple SMS or a high-touch service representative reaching out really where the future of customer experience moves to a much more proactive state from its reactive state today.

Laurel: Well, thank you so much, Andy. I appreciate your time, and thank you for joining us on the Business Lab today.

Andy: Thanks. This was an excellent conversation, Laurel, and thanks again for having me.

Laurel: That was Andy Traba, who is the vice president of product marketing at NICE, who I spoke with from Cambridge Massachusetts, the home of MIT and MIT Technology Review.

That’s it for this episode of Business Lab. I’m your host, Laurel Ruma. I’m the global director of Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology, and you can find us in print on the web and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

This show is available wherever you get your podcasts. If you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review. This episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Four things you need to know about China’s AI talent pool 

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday.

In 2019, MIT Technology Review covered a report that shined a light on how fast China’s AI talent pool was growing. Its main finding was pretty interesting: the number of elite AI scholars with Chinese origins had multiplied by 10 in the previous decade, but relatively few of them stayed in China for their work. The majority moved to the US. 

Now the think tank behind the report has published an updated analysis, showing how the makeup of global AI talent has changed since—during a critical period when the industry has shifted significantly and become the hottest technology sector. 

The team at MacroPolo, the think tank of the Paulson Institute, an organization that focuses on US-China relations, studied the national origin, educational background, and current work affiliation of top researchers who gave presentations and had papers accepted at NeurIPS, a top academic conference on AI. Their analysis of the 2019 conference resulted in the first iteration of the Global AI Talent Tracker. They’ve analyzed the December 2022 NeurIPS conference for an update three years later.

I recommend you read the original report, which has a very well-designed infographic that shows the talent flow across countries. But to save you some time, I also talked to the authors and highlighted what I think are the most surprising or important takeaways from the new report. Here are the four main things you need to know about the global AI talent landscape today. 

1.  China has become an even more important country for training AI talent.

Even in 2019, Chinese researchers were already a significant part of the global AI community, making up one-tenth of the most elite AI researchers. In 2022, they accounted for 26%, almost dethroning the US (American researchers accounted for 28%). 

Two pie charts showing the countries of origin of AI researchers in 2019 and 2022.

“Timing matters,” says Ruihan Huang, senior research associate at MacroPolo and one of the lead authors. “The last three years have seen China dramatically expand AI programs across its university system—now there are some 2,000 AI majors—because it was also building an AI industry to absorb that talent.” 

As a result of these university and industry efforts, many more students in computer science or other STEM majors have joined the AI industry, making Chinese researchers the backbone of cutting-edge AI research.

2. AI researchers now tend to stay in the country where they receive their graduate degree. 

This is perhaps intuitive, but the numbers are still surprisingly high: 80% of AI researchers who went to a graduate school in the US stayed to work in the US, while 90% of their peers who went to a graduate school in China stayed in China.

In a world where major countries are competing with each other to take the lead in AI development, this finding suggests a trick they could use to expand their research capacity: invest in graduate-level institutions and attract overseas students to come. 

This is particularly important in the US-China context, where the souring of the relationship between the two countries has affected the academic field. According to news reports, quite a few Chinese graduate students have been interrogated at the US border or even denied entry in recent years, as a Trump-era policy persisted. Along with the border restrictions imposed during the pandemic years, this hostility could have prevented more Chinese AI experts from coming to the US to learn and work. 

3. The US still overwhelmingly attracts the most AI talent, but China is catching up.

In both 2019 and 2022, the United States topped the rankings in terms of where elite AI researchers work. But it’s also clear that the distance between the US and other countries, particularly China, has shortened. In 2019, almost three-fifths of top AI researchers worked in the US; only two-fifths worked here in 2022. 

“The thing about elite talent is that they generally want to work at the most cutting-edge and dynamic places. They want to do incredible work and be rewarded for it,” says AJ Cortese, a senior research associate at MacroPolo and another of the main authors. “So far, the United States still leads the way in having that AI ecosystem—from leading institutions to companies—that appeals to top talent.”

Two pie charts showing the leading countries where AI researchers work in 2019 and 2022.

In 2022, 28% of the top AI researchers were working in China. This significant portion speaks to the growth of the domestic AI sector in China and the job opportunities it has created. Compared with 2019, three more Chinese universities and one company (Huawei) made it into the top tier of institutions that produce AI research. 

It’s true that most Chinese AI companies are still considered to lag behind their US peers—for example, China usually trails the US by a few months in releasing comparable generative AI models. However, it seems like they have started catching up.

4. Top-tier AI researchers now are more willing to work in their home countries.

This is perhaps the biggest and also most surprising change in the data, in my opinion. Like their Chinese peers, more Indian AI researchers ended up staying in their home country for work.

In fact, this seems to be a broader pattern across the board: it used to be that more than half of AI researchers worked in a country different from their home. Now, the balance has tipped in favor of working in their own countries. 

Two pie charts showing the portion of AI researchers choosing to work abroad vs. at home in 2019 and 2022.

This is good news for countries trying to catch up with the US research lead in AI. “It goes without saying most countries would prefer ‘brain gain’ over ‘brain drain’—especially when it comes to a highly complex and technical discipline like AI,” Cortese says. 

It’s not easy to create an environment and culture that not only retains its own talents but manages to pull scholars from other countries, but lots of countries are now working on it. I can only begin to imagine what the report might look like in a few years.  

Did anything else stand out to you in the report? Let me know your thoughts by writing to zeyi@technologyreview.com.


Now read the rest of China Report

Catch up with China

1. The Dutch prime minister will visit China this week to discuss with Chinese president Xi Jinping whether the Dutch chipmaking equipment company ASML can keep servicing Chinese clients. (Reuters $)

  • Here’s an inside look into ASML’s factory and how it managed to dominate advanced chipmaking. (MIT Technology Review)

2. Hong Kong passed a tough national security law that makes it more dangerous to protest Beijing’s rule. (BBC)

3. A new bill in France suggests imposing hefty fines on Shein and similar ultrafast-fashion companies for their negative environmental impact—as much as $11 per item that they sell in France. (Nikkei Asia)

4. Huawei filed a patent to make more advanced chips with a low-tech workaround. (Bloomberg $)

  • Meanwhile, a US official accused the Chinese chip foundry SMIC of breaking US law by making a chip for Huawei. (South China Morning Post $)

5. Instead of the usual six and a half days a week, Tesla has instructed its Shanghai factory to reduce production to five days a week. The slowdown of EV sales in China could be the reason. (Bloomberg $)

6. TikTok is still having plenty of troubles. A new political TV ad (paid for by a mysterious new nonprofit), playing in three US swing states, attacks Zhang Fuping, a ByteDance vice president that very few people have heard of. (Punchbowl News)

  • As TikTok still hasn’t reached a licensing deal with Universal Music Group, users have had to get creative to find alternative soundtracks for their videos. (Billboard)

7. China launched a communications satellite that will help relay signals for missions to explore the dark side of the moon. (Reuters $)

Lost in translation

The most-hyped generative AI app in China these days is Kimi, according to the Chinese publication Sina Tech. Released by Moonshot AI, a Chinese “unicorn” startup, Kimi made headlines last week when it announced it had started supporting inputting text using over 2 million Chinese characters. (For comparison, OpenAI’s GPT-4 Turbo currently supports inputting 100,000 Chinese characters, while Claude3-200K supports about 160,000 characters.)

While some of the app’s virality can be credited to a marketing push that intensified recently. Chinese users are now busy feeding popular and classic books to the model and testing how well it can understand the context. Feeling threatened, other Chinese AI apps owned by tech giants like Baidu and Alibaba have followed suit, announcing that they will soon support 5 million or even 10 million Chinese characters. But processing large amounts of text, while impressive, is very costly in the generative AI age—and some observers worry this isn’t the commercial direction that companies ought to head in.

One more thing

Fluffy pajamas, sweatpants, outdated attire: young Chinese people are dressing themselves in “gross outfits” to work—an intentional provocation to their bosses and an expression of silent resistance to the trend that glorifies career hustle. “I just don’t think it’s worth spending money to dress up for work, since I’m just sitting there,” one of them told the New York Times.

Update: The story has been updated to clarify the affiliation of the report authors.

What’s next for generative video

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

When OpenAI revealed its new generative video model, Sora, last month, it invited a handful of filmmakers to try it out. This week the company published the results: seven surreal short films that leave no doubt that the future of generative video is coming fast. 

The first batch of models that could turn text into video appeared in late 2022, from companies including Meta, Google, and video-tech startup Runway. It was a neat trick, but the results were grainy, glitchy, and just a few seconds long.

Fast-forward 18 months, and the best of Sora’s high-definition, photorealistic output is so stunning that some breathless observers are predicting the death of Hollywood. Runway’s latest models can produce short clips that rival those made by blockbuster animation studios. Midjourney and Stability AI, the firms behind two of the most popular text-to-image models, are now working on video as well.

A number of companies are racing to make a business on the back of these breakthroughs. Most are figuring out what that business is as they go. “I’ll routinely scream, ‘Holy cow, that is wicked cool’ while playing with these tools,” says Gary Lipkowitz, CEO of Vyond, a firm that provides a point-and-click platform for putting together short animated videos. “But how can you use this at work?”

Whatever the answer to that question, it will probably upend a wide range of businesses and change the roles of many professionals, from animators to advertisers. Fears of misuse are also growing. The widespread ability to generate fake video will make it easier than ever to flood the internet with propaganda and nonconsensual porn. We can see it coming. The problem is, nobody has a good fix.

As we continue to get to grips what’s ahead—good and bad—here are four things to think about. We’ve also curated a selection of the best videos filmmakers have made using this technology, including an exclusive reveal of “Somme Requiem,” an experimental short film by Los Angeles–based production company Myles. Read on for a taste of where AI moviemaking is headed. 

1. Sora is just the start

OpenAI’s Sora is currently head and shoulders above the competition in video generation. But other companies are working hard to catch up. The market is going to get extremely crowded over the next few months as more firms refine their technology and start rolling out Sora’s rivals.

The UK-based startup Haiper came out of stealth this month. It was founded in 2021 by former Google DeepMind and TikTok researchers who wanted to work on technology called neural radiance fields, or NeRF, which can transform 2D images into 3D virtual environments. They thought a tool that turned snapshots into scenes users could step into would be useful for making video games.

But six months ago, Haiper pivoted from virtual environments to video clips, adapting its technology to fit what CEO Yishu Miao believes will be an even bigger market than games. “We realized that video generation was the sweet spot,” says Miao. “There will be a super-high demand for it.”

“Air Head” is a short film made by Shy Kids, a pop band and filmmaking collective based in Toronto, using Sora.

Like OpenAI’s Sora, Haiper’s generative video tech uses a diffusion model to manage the visuals and a transformer (the component in large language models like GPT-4 that makes them so good at predicting what comes next), to manage the consistency between frames. “Videos are sequences of data, and transformers are the best model to learn sequences,” says Miao.

Consistency is a big challenge for generative video and the main reason existing tools produce just a few seconds of video at a time. Transformers for video generation can boost the quality and length of the clips. The downside is that transformers make stuff up, or hallucinate. In text, this is not always obvious. In video, it can result in, say, a person with multiple heads. Keeping transformers on track requires vast silos of training data and warehouses full of computers.

That’s why Irreverent Labs, founded by former Microsoft researchers, is taking a different approach. Like Haiper, Irreverent Labs started out generating environments for games before switching to full video generation. But the company doesn’t want to follow the herd by copying what OpenAI and others are doing. “Because then it’s a battle of compute, a total GPU war,” says David Raskino, Irreverent’s cofounder and CTO. “And there’s only one winner in that scenario, and he wears a leather jacket.” (He’s talking about Jensen Huang, CEO of the trillion-dollar chip giant Nvidia.)

Instead of using a transformer, Irreverent’s tech combines a diffusion model with a model that predicts what’s in the next frame on the basis of common-sense physics, such as how a ball bounces or how water splashes on the floor. Raskino says this approach reduces both training costs and the number of hallucinations. The model still produces glitches, but they are distortions of physics (like a bouncing ball not following a smooth curve, for example) with known mathematical fixes that can be applied to the video after it is generated, he says.

Which approach will last remains to be seen. Miao compares today’s technology to large language models circa GPT-2. Five years ago, OpenAI’s groundbreaking early model amazed people because it showed what was possible. But it took several more years for the technology to become a game-changer.

It’s the same with video, says Miao: “We’re all at the bottom of the mountain.”

2. What will people do with generative video? 

Video is the medium of the internet. YouTube, TikTok, newsreels, ads: expect to see synthetic video popping up everywhere there’s video already.

The marketing industry is one of the most enthusiastic adopters of generative technology. Two-thirds of marketing professionals have experimented with generative AI in their jobs, according to a recent survey Adobe carried out in the US, with more than half saying they have used the technology to produce images.

Generative video is next. A few marketing firms have already put out short films to demonstrate the technology’s potential. The latest example is the 2.5-minute-long “Somme Requiem,” made by Myles. You can watch the film below in an exclusive reveal from MIT Technology Review.

“Somme Requiem” is a short film made by Los Angeles production company Myles. Every shot was generated using Runway’s Gen 2 model. The clips were then edited together by a team of video editors at Myles.

“Somme Requiem” depicts snowbound soldiers during the World War I Christmas ceasefire in 1914. The film is made up of dozens of different shots that were produced using a generative video model from Runway, then stitched together, color-corrected, and set to music by human video editors at Myles. “The future of storytelling will be a hybrid workflow,” says founder and CEO Josh Kahn.

Kahn picked the period wartime setting to make a point. He notes that the Apple TV+ series Masters of the Air, which follows a group of World War II airmen, cost $250 million. The team behind Peter Jackson’s World War I documentary They Shall Not Grow Old spent four years curating and restoring more than 100 hours of archival film. “Most filmmakers can only dream of ever having an opportunity to tell a story in this genre,” says Kahn.

“Independent filmmaking has been kind of dying,” he adds. “I think this will create an incredible resurgence.”

Raskino hopes so. “The horror movie genre is where people test new things, to try new things until they break,” he says. “I think we’re going to see a blockbuster horror movie created by, like, four people in a basement somewhere using AI.”

So is generative video a Hollywood-killer? Not yet. The scene-setting shots in ”Somme Requiem”—empty woods, a desolate military camp—look great. But the people in it are still afflicted with mangled fingers and distorted faces, hallmarks of the technology. Generative video is best at wide-angle pans or lingering close-ups, which creates an eerie atmosphere but little action. If ”Somme Requiem” were any longer it would get dull.

But scene-setting shots pop up all the time in feature-length movies. Most are just a few seconds long, but they can take hours to film. Raskino suggests that generative video models could soon be used to produce those in-between shots for a fraction of the cost. This could also be done on the fly in later stages of production, without requiring a reshoot.

Michal Pechoucek, CTO at Gen Digital, the cybersecurity giant behind a range of antivirus brands including Norton and Avast, agrees. “I think this is where the technology is headed,” he says. “We’ll see many different models, each specifically trained in a certain domain of movie production. These will just be tools used by talented video production teams.”

We’re not there quite yet. A big problem with generative video is the lack of control users have over the output. Producing still images can be hit and miss; producing a few seconds of video is even more risky.

“Right now it’s still fun, you get a-ha moments,” says Miao. “But generating video that is exactly what you want is a very hard technical problem. We are some way off generating long, consistent videos from a single prompt.”

That’s why Vyond’s Lipkowitz thinks the technology isn’t yet ready for most corporate clients. These users want a lot more control over the look of a video than current tools give them, he says.

Thousands of companies around the world, including around 65% of the Fortune 500 firms, use Vyond’s platform to create animated videos for in-house communications, training, marketing, and more. Vyond draws on a range of generative models, including text-to-image and text-to-voice, but provides a simple drag-and-drop interface that lets users put together a video by hand, piece by piece, rather than generate a full clip with a click.

Running a generative model is like rolling dice, says Lipkowitz. “This is a hard no for most video production teams, particularly in the enterprise sector where everything must be pixel-perfect and on brand,” he says. “If the video turns out bad—maybe the characters have too many fingers, or maybe there is a company logo that is the wrong color—well, unlucky, that’s just how gen AI works.”

The solution? More data, more training, repeat. “I wish I could point to some sophisticated algorithms,” says Miao. “But no, it’s just a lot more learning.”

3. Misinformation isn’t new, but deepfakes will make it worse.

Online misinformation has been undermining our faith in the media, in institutions, and in each other for years. Some fear that adding fake video to the mix will destroy whatever pillars of shared reality we have left.

“We are replacing trust with mistrust, confusion, fear, and hate,” says Pechoucek. “Society without ground truth will degenerate.”

Pechoucek is especially worried about the malicious use of deepfakes in elections. During last year’s elections in Slovakia, for example, attackers shared a fake video that showed the leading candidate discussing plans to manipulate voters. The video was low quality and easy to spot as a deepfake. But Pechoucek believes it was enough to turn the result in favor of the other candidate.

“Adventurous Puppies” is a short clip made by OpenAI using with Sora.

John Wissinger, who leads the strategy and innovation teams at Blackbird AI, a firm that tracks and manages the spread of misinformation online, believes fake video will be most persuasive when it blends real and fake footage. Take two videos showing President Joe Biden walking across a stage. In one he stumbles, in the other he doesn’t. Who is to say which is real?

“Let’s say an event actually occurred, but the way it’s presented to me is subtly different,” says Wissinger. “That can affect my emotional response to it.” As Pechoucek noted, a fake video doesn’t even need to be that good to make an impact. A bad fake that fits existing biases will do more damage than a slick fake that doesn’t, says Wissinger.

That’s why Blackbird focuses on who is sharing what with whom. In some sense, whether something is true or false is less important than where it came from and how it is being spread, says Wissinger. His company already tracks low-tech misinformation, such as social media posts showing real images out of context. Generative technologies make things worse, but the problem of people presenting media in misleading ways, deliberately or otherwise, is not new, he says.

Throw bots into the mix, sharing and promoting misinformation on social networks, and things get messy. Just knowing that fake media is out there will sow seeds of doubt into bad-faith discourse. “You can see how pretty soon it could become impossible to discern between what’s synthesized and what’s real anymore,” says Wissinger.

4. We are facing a new online reality.

Fakes will soon be everywhere, from disinformation campaigns, to ad spots, to Hollywood blockbusters. So what can we do to figure out what’s real and what’s just fantasy? There are a range of solutions, but none will work by themselves.

The tech industry is working on the problem. Most generative tools try to enforce certain terms of use, such as preventing people from creating videos of public figures. But there are ways to bypass these filters, and open-source versions of the tools may come with more permissive policies.

Companies are also developing standards for watermarking AI-generated media and tools for detecting it. But not all tools will add watermarks, and watermarks can be stripped from a video’s metadata. No reliable detection tool exists either. Even if such tools worked, they would become part of a cat-and-mouse game of trying to keep up with advances in the models they are designed to police.

Online platforms like X and Facebook have poor track records when it comes to moderation. We should not expect them to do better once the problem gets harder. Miao used to work at TikTok, where he helped build a moderation tool that detects video uploads that violate TikTok’s terms of use. Even he is wary of what’s coming: “There’s real danger out there,” he says. “Don’t trust things that you see on your laptop.” 

Blackbird has developed a tool called Compass, which lets you fact check articles and social media posts. Paste a link into the tool and a large language model generates a blurb drawn from trusted online sources (these are always open to review, says Wissinger) that gives some context for the linked material. The result is very similar to the community notes that sometimes get attached to controversial posts on sites like X, Facebook, and Instagram. The company envisions having Compass generate community notes for anything. “We’re working on it,” says Wissinger.

But people who put links into a fact-checking website are already pretty savvy—and many others may not know such tools exist, or may not be inclined to trust them. Misinformation also tends to travel far wider than any subsequent correction.

In the meantime, people disagree on whose problem this is in the first place. Pechoucek says tech companies need to open up their software to allow for more competition around safety and trust. That would also let cybersecurity firms like his develop third-party software to police this tech. It’s what happened 30 years ago when Windows had a malware problem, he says: “Microsoft let antivirus firms in to help protect Windows. As a result, the online world became a safer place.”

But Pechoucek isn’t too optimistic. “Technology developers need to build their tools with safety as the top objective,” he says. “But more people think about how to make the technology more powerful than worry about how to make it more safe.”

Made by OpenAI using Sora.

There’s a common fatalistic refrain in the tech industry: change is coming, deal with it. “Generative AI is not going to get uninvented,” says Raskino. “This may not be very popular, but I think it’s true: I don’t think tech companies can bear the full burden. At the end of the day, the best defense against any technology is a very well-educated public. There’s no shortcut.”

Miao agrees. “It’s inevitable that we will massively adopt generative technology,” he says. “But it’s also the responsibility of the whole of society. We need to educate people.” 

“Technology will move forward, and we need to be prepared for this change,” he adds. “We need to remind our parents, our friends, that the things they see on their screen might not be authentic.” This is especially true for older generations, he says: “Our parents need to be aware of this kind of danger. I think everyone should work together.”

We’ll need to work together quickly. When Sora came out a month ago, the tech world was stunned by how quickly generative video had progressed. But the vast majority of people have no idea this kind of technology even exists, says Wissinger: “They certainly don’t understand the trend lines that we’re on. I think it’s going to catch the world by storm.”

How three filmmakers created Sora’s latest stunning videos

In the last month, a handful of filmmakers have taken Sora for a test drive. The results, which OpenAI published this week, are amazing. The short films are a big jump up even from the cherry-picked demo videos that OpenAI used to tease its new generative model just six weeks ago. Here’s how three of the filmmakers did it.

Air Head” by Shy Kids

Shy Kids is a pop band and filmmaking collective based in Toronto that describes its style as “punk-rock Pixar.” The group has experimented with generative video tech before. Last year it made a music video for one of its songs using an open-source tool called Stable Warpfusion. It’s cool, but low-res and glitchy. The film it made with Sora, called “Air Head,” could pass for real footage—if it didn’t feature a man with a balloon for a face.

One problem with most generative video tools is that it’s hard to maintain consistency across frames. When OpenAI asked Shy Kids to try out Sora, the band wanted to see how far they could push it. “We thought a fun, interesting experiment would be—could we make a consistent character?” says Shy Kids member Walter Woodman. “We think it was mostly successful.”

Generative models can also struggle with anatomical details like hands and faces. But in the video there is a scene showing a train car full of passengers, and the faces are near perfect. “It’s mind-blowing what it can do,” says Woodman. “Those faces on the train were all Sora.”

Has generative video’s problem with faces and hands been solved? Not quite. We still get glimpses of warped body parts. And text is still a problem (in another video, by the creative agency Native Foreign, we see a bike repair shop with the sign “Biycle Repaich”). But everything in “Air Head” is raw output from Sora. After editing together many different clips produced with the tool, Shy Kids did a bunch of post-processing to make the film look even better. They used visual effects tools to fix certain shots of the main character’s balloon face, for example.

Woodman also thinks that the music (which they wrote and performed) and the voice-over (which they also wrote and performed) help to lift the quality of the film even more. Mixing these human touches in with Sora’s output is what makes the film feel alive, says Woodman. “The technology is nothing without you,” he says. “It is a powerful tool, but you are the person driving it.”

[Update: Shy Kids have posted a behind-the-scenes video for Air Head on X. Come for the pro tips, stay for the Sora bloopers: “How do you maintain a character and look consistent even though Sora is a slot machine as to what you get back?” asks Woodman.]

Abstract“ by Paul Trillo

Paul Trillo, an artist and filmmaker, wanted to stretch what Sora could do with the look of a film. His video is a mash-up of retro-style footage with shots of a figure who morphs into a glitter ball and a breakdancing trash man. He says that everything you see is raw output from Sora: “No color correction or post FX.” Even the jump-cut edits in the first part of the film were produced using the generative model.

Trillo felt that the demos that OpenAI put out last month came across too much like clips from video games. “I wanted to see what other aesthetics were possible,” he says. The result is a video that looks like something shot with vintage 16-millimeter film. “It took a fair amount of experimenting, but I stumbled upon a series of prompts that helps make the video feel more organic or filmic,” he says.

Beyond Our Reality” by Don Allen Stevenson

Don Allen Stevenson III is a filmmaker and visual effects artist. He was one of the artists invited by OpenAI to try out DALL-E 2, its text-to-image model, a couple of years ago. Stevenson’s film is a NatGeo-style nature documentary that introduces us to a menagerie of imaginary animals, from the girafflamingo to the eel cat.

In many ways working with text-to-video is like working with text-to-image, says Stevenson. “You enter a text prompt and then you tweak your prompt a bunch of times,” he says. But there’s an added hurdle. When you’re trying out different prompts, Sora produces low-res video. When you hit on something you like, you can then increase the resolution. But going from low to high res is involves another round of generation, and what you liked in the low-res version can be lost.

Sometimes the camera angle is different or the objects in the shot have moved, says Stevenson. Hallucination is still a feature of Sora, as it is in any generative model. With still images this might produce weird visual defects; with video those defects can appear across time as well, with weird jumps between frames.

Stevenson also had to figure out how to speak Sora’s language. It takes prompts very literally, he says. In one experiment he tried to create a shot that zoomed in on a helicopter. Sora produced a clip in which it mixed together a helicopter with a camera’s zoom lens. But Stevenson says that with a lot of creative prompting, Sora is easier to control than previous models.

Even so, he thinks that surprises are part of what makes the technology fun to use: “I like having less control. I like the chaos of it,” he says. There are many other video-making tools that give you control over editing and visual effects. For Stevenson, the point of a generative model like Sora is to come up with strange, unexpected material to work with in the first place.

The clips of the animals were all generated with Sora. Stevenson tried many different prompts until the tool produced something he liked. “I directed it, but it’s more like a nudge,” he says. He then went back and forth, trying out variations.

Stevenson pictured his fox crow having four legs, for example. But Sora gave it two, which worked even better. (It’s not perfect: sharp-eyed viewers will see that at one point in the video the fox crow switches from two legs to four, then back again.) Sora also produced several versions that he thought were too creepy to use.

When he had a collection of animals he really liked, he edited them together. Then he added captions and a voice-over on top. Stevenson could have created his made-up menagerie with existing tools. But it would have taken hours, even days, he says. With Sora the process was far quicker.

“I was trying to think of something that would look cool and experimented with a lot of different characters,” he says. “I have so many clips of random creatures.” Things really clicked when he saw what Sora did with the girafflamingo. “I started thinking: What’s the narrative around this creature? What does it eat, where does it live?” he says. He plans to put out a series of extended films following each of the fantasy animals in more detail.

Stevenson also hopes his fantastical animals will make a bigger point. “There’s going to be a lot of new types of content flooding feeds,” he says. “How are we going to teach people what’s real? In my opinion, one way is to tell stories that are clearly fantasy.”

Stevenson points out that his film could be the first time a lot of people see a video created by a generative model. He wants that first impression to make one thing very clear: This is not real.