Scaling customer experiences with data and AI

Today, interactions matter more than ever. According to data compiled by NICE, once a consumer makes a buying decision for a product or service, 80% of their decision to keep doing business with that brand hinges on the quality of their customer service experience, according to NICE research. Enter AI.

“I think AI is becoming a really integral part of every business today because it is finding that sweet spot in allowing businesses to grow while finding key efficiencies to manage that bottom line and really do that at scale,” says vice president of product marketing at NICE, Andy Traba.

When many think of AI and customer experiences, chatbots that give customers more headaches than help often come to mind. However, emerging AI use cases are enabling greater efficiencies than ever. From sentiment analysis to co-pilots to integration throughout the entire customer journey, the evolving era of AI is reducing friction and building better relationships between enterprises and both their employees and customers.

“When we think about bolstering AI capabilities, it’s really about getting the right data to train my models on so that they have those best outcomes.”

Deploying any technology requires a delicate balance between delivering quality solutions without compromising the bottom line. AI integration offers investment returns by scaling customer and employee capabilities, automating tedious and redundant tasks, and offering consistent experiences based on collected and specialized data.

“I think as you’re hopefully venturing into leveraging AI more to improve your business, the key recommendation I would provide is just to focus on those crystal clear high-probability use cases and get those early wins and then reinvest back into the business,” says Traba.

While artificial intelligence has increasingly grabbed headlines in recent years, augmented intelligence—where AI tools are used to enhance human capabilities rather than automate them—is worthy of similar buzz for its potential in the customer experience space, says Traba.

Currently, the customer experience landscape is highly reactive. Looking ahead, Traba foresees a shift to proactive and predictive customer experiences that blend both AI and augmented intelligence. Say a customer’s device is reaching its end-of-life state. Rather than the customer reaching out to a chatbot or contact center, AI tools would flag the device’s issue early and direct the customer to a live chat with a representative, offering both the efficiency of automation and personalized help from a human representative.

“Where I see the future evolving in terms of customer experiences, is being much more proactive with the convergence of data, these advancements of technology, and certainly generative AI,” says Traba.

This episode of Business Lab is produced in partnership with NICE.

Full Transcript

Laurel Ruma: From MIT Technology Review, I’m Laurel Ruma and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

Our topic is building better customer and employee experiences with artificial intelligence. Integrating data and AI solutions into everyday business can help provide insights, create efficiencies, and free up time for employees to work on more complicated issues. And all of this builds a better experience for customers.

Two words for you: augmented intelligence.

My guest is Andy Traba, vice president of product marketing at NICE.

This podcast is produced in partnership with NICE.

Welcome Andy.

Andy Traba: Hi Laurel. Thanks for having me.

Laurel: Well, thanks for being here. So to set some context, could you describe the current state of AI within customer experience? Common use cases that come to mind are chatbots, but what are some other applications for AI in this space?

Andy: Thank you. I think it’s a great question to get started, and I think first and foremost, the use of AI is growing everywhere. Certainly, we had this big boom last year where everybody started talking about AI thanks to ChatGPT and a lot of the advancements with generative AI, and we’re certainly seeing a lot more doing now, moving beyond just talking. So just growing a use case of trying to apply AI everywhere to improve experiences. One of the more popular ones, and this technology has been around for some time, is sentiment analysis. So instead of just proactively surveying customers to ask how are they doing, what was their experience like, using AI models to analyze the conversations they’re having with brands and automatically determine that. And it’s also a good use case, I think, to emphasize the importance of data that goes into the training of AI models.

As you think about sentiment analysis, you want to train those models based on the actual customer experience conversations, maybe past records or even surveys. What you want to avoid is training a sentiment model maybe based on movie reviews or Amazon reviews, something that’s not really well connected. So certainly sentiment analysis is a very popular use case that goes beyond just chatbots.

Two other ones I’ll bring up are co-pilots. We’ve seen, certainly, a lot of recent news with the launch of Microsoft Copilot and other forms for copilots within the contact center and certainly helping customer service agents. It’s a very popular use case that we see. The reason driving that demand is the types of conversations that are getting to agents today are much more complex. AI has done a good job of taking away the easy stuff. We no longer have to call into a contact center to reset our passwords, so what’s left over for the agents is much more difficult types of interactions.So being able to assist them in real time with prompts and guidance and recommending knowledge articles to make their job easier and more effective is really popular.

And then the third and final one just on this question is the really kind of rise of AI-driven journeys. Many, many years ago, you and I would call into a contact center, and the only channel we could use was voice. Today, those channels have exploded. There’s social media, there’s messaging, there’s voice, there’s AI assistance that we can chat with. So being able to orchestrate or navigate a customer effectively through that journey and recommend the next best action or the next best channel for them to reduce that complexity is really in demand as well. And how can I even get to a point where I can proactively engage with them on the channel of their choice at the time of day that we’re likely to get a response is certainly an area that we see AI playing an important role today, but even more so in the future those three really sentiment analysis, the rise of co-pilots and then using AI across the entire customer journey.

Laurel: So as AI becomes more popular across enterprises and across industries, why is integrating AI and customer experience then so crucial for today’s business landscape?

Andy: I think it’s so crucial today because it’s finding this sweet spot in terms of business decision-making. When we think of business decision-making, we are often challenged with, am I going to focus on revenue or cost cutting? Am I going to focus on building new products or perfecting my existing products? And rarely has there been a technology that has allowed a business to achieve all of those at once. But we’re seeing that today with AI finding a sweet spot where I can improve revenue and keep customers happy and renewing or even gain new ones without having to spend additional money. I could even do that in a more efficient way with AI. Within AI, I can take a very innovative approach and produce new products that my customers demand and save time and money through efficiencies in making my current products better. I think AI is becoming a really integral part of every business today because it is finding that sweet spot in allowing businesses to grow while finding key efficiencies to manage that bottom line and really do that at scale.

Laurel: And speaking of those efficiencies, employee experience lays that foundation for the customer. But based on your time at NICE and within business operations, how does employee experience affect the overall experience then for customers?

Andy: I think what we’ve seen at NICE is really that customer experience and employee experience are hand in glove. They’re one and the same. They have tremendous correlation between each other. Some examples, just to give some anecdotes, and this is customer experience really happening everywhere. If you go into a car dealership for a Tesla or a BMW, a high-end product, but you are interacting with a salesperson who’s a little pushy or maybe just having a bad day, it’s going to deteriorate the overall customer experience, so that bad employee experience causes a negative effect. Same thing if you go to your favorite local restaurant, but you maybe have a new server who’s not really well-trained or is still figuring out the menu and the logistics that’s going to have a negative spillover effect. And then even on the flip side of that, you can see employee experience having a positive effect on their overall customer experience.

If employees are engaged and they have the right information and the right tools, they can turn a negative into a positive. Think of airlines, a very commoditized industry right now, but if you have a problem with your flight and it got canceled and you have a critical moment of need, that employee from that airline could really turn that experience around by finding a new flight, booking you, making sure that you are on your trip and meeting your destination on time or without very little delay. So I think when we think about experiences at large and the employee and the customer outcomes are very much tied together, we’ve done research here at NICE on this exact topic, and what we found was once a consumer makes a buying decision for a particular product or service, after that point, 80% of that consumer’s decision to continue doing business with that brand is based on the quality of their interactions.

So how those conversations play out, plays a very, very important part of whether or not they will continue doing business with that brand. Today, interactions matter more than ever. To conclude on this question, one of my favorite quotes, customer experience today isn’t just part of the business, it is the business. And I think employees play a really important front role in achieving that.

Laurel: That certainly makes sense. 80% is a huge number, and I think of that in my own experiences, but could you explain the difference between artificial intelligence and augmented intelligence and also how they overlap?

Andy: Yeah, it’s a great question. I think today artificial intelligence is certainly capturing all of the buzz, but what I think is just as buzzworthy is augmented intelligence. So let’s start by defining the two. So artificial intelligence refers to machines mimicking human cognition. And when we think about customer experience, there’s really no better example of that than chatbots or virtual assistants. Technology that allows you to interact with the brand 365 24/7 at any time that you need, and it’s mimicking the conversations that you would normally have with a live human customer service representative. Augmented intelligence on the other hand, is really about AI enhancing human capabilities, increasing the cognitive load of an individual, allowing them to do more with less, saving them time. I think in the domain of customer experience, co-pilots are becoming a very popular example here. How can co-pilots make recommendations, generate responses, automate a lot of the mundane tasks that humans just don’t like to do and frankly aren’t good at?

So I think there’s a clear distinction then between artificial intelligence, really those machines taking on the human capabilities 100% versus augmented, not replacing humans, but lifting them up, allowing them to do more. And where there’s overlap, and I think we’re going to see this trend really start accelerating in the years to come in customer experiences is the blend between those two as we’re interacting with a brand. And what I mean by that is maybe starting out by having a conversation with an intelligent virtual agent, a chatbot, and then seamlessly blending into a human live customer representative to play a specialized role. So maybe as I’m researching a new product to buy such as a cell phone online, I can be able to ask the chatbot some questions and it’s referring to its knowledge base and its past interactions to answer those. But when it’s time to ask a very specific question, I might be elevated to a customer service representative for that brand, just might choose to say, “Hey, when it’s time to buy, I want to ensure you’re speaking to a live individual.” So I think there’s going to be a blend or a continuum, if you will, of these types of interactions you have. And I think we’re going to get to a point where very soon we might not even know is it a human on the other end of that digital interaction or just a machine chatting back and forth? But I think those two concepts, artificial intelligence and augmented intelligence are certainly here to stay and driving improvements in customer experience at scale with brands.

Laurel: Well, there’s the customer journey, but then there’s also the AI journey, and most of those journeys start with data. So internally, what is the process of bolstering AI capabilities in terms of data, and how does data play a role in enhancing both employee and customer experiences?

Andy: I think in today’s age, it’s common understanding really that AI is only as good as the data it’s trained on. Quick anecdote, if I’m an AI engineer and I’m trying to predict what movies people will watch, so I can drive engagement into my movie app, I’m going to want data. What movies have people watched in the past and what did they like? Similarly in customer experience, if I’m trying to predict the best outcome of that interaction, I want CX data. I want to know what’s gone well in the past on these interactions, what’s gone poorly or wrong? I don’t want data that’s just available on the public internet. I need specialized CX data for my AI models. When we think about bolstering AI capabilities, it’s really about getting the right data to train my models on so that they have those best outcomes.

And going back to the example I brought in around sentiment, I think that reinforces the need to ensure that when we’re training AI models for customer experience, it’s done off of rich CX datasets and not just publicly available information like some of the more popular large language models are using.

And I think about how data plays a role in enhancing employee and customer experiences. There’s a strategy that’s important to derive new information or derive new data from those unstructured data sets that often these contact centers and experience centers have. So when we think about a conversation, it’s very open-ended, right? It could go many ways. It is not often predictable and it’s very hard to understand it at the surface where AI and advanced machine learning techniques can help though is deriving new information from those conversations such as what was the consumer’s sentiment level at the beginning of the conversation versus the end. What actions did the agent take that either drove positive trends in that sentiment or negative trends? How did all of these elements play out? And very quickly you can go from taking large unstructured data sets that might not have a lot of information or signals in them to very large data sets that are rich and contain a lot of signals and deriving that new information or understanding, how I like to think of it, the chemistry of that conversation is playing a very critical role I think in AI powering customer experiences today to ensure that those experiences are trusted, they’re done right, and they’re built on consumer data that can be trusted, not public information that doesn’t really help drive a positive customer experience.

Laurel: Getting back to your idea of customer experience is the business. One of the major questions that most organizations face with technology deployment is how to deliver quality customer experiences without compromising the bottom line. So how can AI move the needle in this way in that positive territory?

Andy: Yeah, I think if there’s one word to think about when it comes to AI moving the bottom line, it’s scale. I think how we think of things is really all about scale, allowing humans or employees to do more, whether that’s by increasing their cognitive load, saving them time, allowing things to be more efficient. Again, that’s referring back to that augmented intelligence. And then when we go through artificial intelligence thinking all about automation. So how can we offer customer experience 365 24/7? How can allowing consumers to reach out to a brand at any time that’s convenient boost that customer experience? So doing both of those tactics in a way that moves the bottom line and drives results is important. I think there’s a third one though that isn’t receiving enough attention, and that’s consistency. So we can allow employees to do more. We can automate their tasks to provide more capacity, but we also have to provide consistent, positive experiences.

And where AI and machine learning really help here is finding areas of variability, finding not only the areas of variability but then also the root cause or the driver of those variabilities to close those gaps. And a brand I’ll give a shout out to who I think does this incredibly well is Starbucks. I can go to a Starbucks in any location around the world and order an iced caramel macchiato, and I’m going to get that same drink experience regardless of the thousands of Starbucks locations. And I think that consistency plays a really powerful role in the overall customer experience of Starbucks’ brand. And when you think about the logistics of doing that at scale, it’s incredibly complex and challenging. If you have the data and you have the right tools and the AI, finding those gaps and offering more consistent experiences is incredibly powerful.

Laurel: So could you share some practical strategies and best practices for organizations to leverage AI to empower employees, foster positive and productive work environments, and then also all of this would ultimately improve customer interactions?

Andy: Yeah, I think the overall positive, going back to earlier in our conversation is there are many use cases. AI has a tremendous opportunity in this space. The recommendation I would provide is to focus first on a crystal clear, high-probability use case for your business. Auto summary or the automated note-taking of agents after call work is becoming an increasingly popular one that we’re seeing in the space. And I think the reasons for it are really clear. It’s a win-win-win for the employee, the customer, and the business. It’s a win for the employee because AI is going to automate something that is mundane for them or very procedural. If you think of a customer service representative, they’re taking 40, 50 maybe in upwards of 60 conversations a day during their job, taking notes of what was talked about. What are action items? Very complicated, mundane, tiresome even. They don’t like doing it.

So AI can offload that activity from them, which is a win for the employee. It’s a win for the customer as a lot of times the agents are great at note-taking, especially when they’re doing that so often, which can lead to that unfortunate experience where you have to call back as a consumer and repeat yourself because the agent you’re now talking to can’t understand or doesn’t have good information about what you called or interacted with previously. So from a consumer experience, it helps them because they have to repeat themselves less often. The agent that they’re currently speaking with can offer a more personalized service because they have better notes or history of past interactions.

And then finally, the third win, it’s really good for the business because you’re saving time and money that the agents no longer have to manually do something. We see that 30 to 60 seconds of note-taking at a business with 1,000 employees adds up to be millions of dollars every year. So there’s a clear-cut business case for the business to achieve results, improve customer experience, and improve employee experience at the same time. I think as you’re hopefully venturing into leveraging AI more to improve your business, the key recommendation I would provide is just to focus on those crystal clear high-probability use cases and get those early wins and then reinvest back into the business.

Laurel: Yeah, I think those are the positive aspects of that, but concerns about job loss due to automation tend to crop up with AI deployment. So what are the opportunities that AI integration can provide for organizations and their employees so it’s a win-win for everybody?

Andy: And certainly empathetic to this topic. As with all new technologies, whenever there’s excitement around them, there’s also this uncertainty of what will those long-term outcomes be? But I think when we historically look back, all transformative technologies have boosted GDP and they’ve created more jobs. And so I see no reason to believe this time around will be different. Now those jobs might be different and new roles will emerge. When it comes to customer experience and the employee experience one interesting theory I’m following is, if you think about Apple, they had a really revolutionary model where they branded their employees geniuses. So you’d go into an Apple store and you would speak to a genius, and that model carried through all of their physical flagship stores. A very positive model. Back in the day, people would actually pay money to go speak to a genius or get a priority customer service slot but a model that’s really hard to scale and a model that hasn’t been successful in a virtual environment.

I think when we see AI and a lot of these new technology advancements though, that’s a prime example of maybe a new job that does emerge where if AI is offloading a lot of the interactions to chatbots, what do customer service agents do? Maybe they become geniuses where they’re playing a more proactive, high-value add back to consumers and overall improving the service and the experience there. So I do think that AI will have job shifts, but overall there’ll be a net positive just like there has been with all past transformative technologies.

Laurel: Continuing that look ahead, how do you see the era of AI evolving in terms of customer and employee experience? What excites you about the future in this space?

Andy: This is actually what I’m most excited about is when we think about customer experience today, it’s highly reactive. As a consumer, if I have a problem, I search your website, I interact with your chatbot, I end up talking to a live customer service representative. The consumer is the driving force of everything and the business or the brand is having to be reactive to them. Where I see the future evolving in terms of customer experiences, is being much more proactive with the convergence of data, these advancements of technology, and certainly generative AI. I do see AI becoming smarter and being more predictive and proactive to alert that there is going to be a problem before the consumer actually is experiencing it and to take action on that proactively before that problem manifests itself.

And just a quick example of maybe there’s a media or a cable company where a device is reaching its end-of-life state, so rather than it have it go on the fritz the day of the Super Bowl, reach out, be proactive, contact that individual, give them specific instructions to follow. And I think that’s really where we see the advancements of not only big data, AI, but just the abundance of the ability to reach out in preferred channels, whether that’s a simple SMS or a high-touch service representative reaching out really where the future of customer experience moves to a much more proactive state from its reactive state today.

Laurel: Well, thank you so much, Andy. I appreciate your time, and thank you for joining us on the Business Lab today.

Andy: Thanks. This was an excellent conversation, Laurel, and thanks again for having me.

Laurel: That was Andy Traba, who is the vice president of product marketing at NICE, who I spoke with from Cambridge Massachusetts, the home of MIT and MIT Technology Review.

That’s it for this episode of Business Lab. I’m your host, Laurel Ruma. I’m the global director of Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology, and you can find us in print on the web and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

This show is available wherever you get your podcasts. If you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review. This episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Four things you need to know about China’s AI talent pool 

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday.

In 2019, MIT Technology Review covered a report that shined a light on how fast China’s AI talent pool was growing. Its main finding was pretty interesting: the number of elite AI scholars with Chinese origins had multiplied by 10 in the previous decade, but relatively few of them stayed in China for their work. The majority moved to the US. 

Now the think tank behind the report has published an updated analysis, showing how the makeup of global AI talent has changed since—during a critical period when the industry has shifted significantly and become the hottest technology sector. 

The team at MacroPolo, the think tank of the Paulson Institute, an organization that focuses on US-China relations, studied the national origin, educational background, and current work affiliation of top researchers who gave presentations and had papers accepted at NeurIPS, a top academic conference on AI. Their analysis of the 2019 conference resulted in the first iteration of the Global AI Talent Tracker. They’ve analyzed the December 2022 NeurIPS conference for an update three years later.

I recommend you read the original report, which has a very well-designed infographic that shows the talent flow across countries. But to save you some time, I also talked to the authors and highlighted what I think are the most surprising or important takeaways from the new report. Here are the four main things you need to know about the global AI talent landscape today. 

1.  China has become an even more important country for training AI talent.

Even in 2019, Chinese researchers were already a significant part of the global AI community, making up one-tenth of the most elite AI researchers. In 2022, they accounted for 26%, almost dethroning the US (American researchers accounted for 28%). 

Two pie charts showing the countries of origin of AI researchers in 2019 and 2022.

“Timing matters,” says Ruihan Huang, senior research associate at MacroPolo and one of the lead authors. “The last three years have seen China dramatically expand AI programs across its university system—now there are some 2,000 AI majors—because it was also building an AI industry to absorb that talent.” 

As a result of these university and industry efforts, many more students in computer science or other STEM majors have joined the AI industry, making Chinese researchers the backbone of cutting-edge AI research.

2. AI researchers now tend to stay in the country where they receive their graduate degree. 

This is perhaps intuitive, but the numbers are still surprisingly high: 80% of AI researchers who went to a graduate school in the US stayed to work in the US, while 90% of their peers who went to a graduate school in China stayed in China.

In a world where major countries are competing with each other to take the lead in AI development, this finding suggests a trick they could use to expand their research capacity: invest in graduate-level institutions and attract overseas students to come. 

This is particularly important in the US-China context, where the souring of the relationship between the two countries has affected the academic field. According to news reports, quite a few Chinese graduate students have been interrogated at the US border or even denied entry in recent years, as a Trump-era policy persisted. Along with the border restrictions imposed during the pandemic years, this hostility could have prevented more Chinese AI experts from coming to the US to learn and work. 

3. The US still overwhelmingly attracts the most AI talent, but China is catching up.

In both 2019 and 2022, the United States topped the rankings in terms of where elite AI researchers work. But it’s also clear that the distance between the US and other countries, particularly China, has shortened. In 2019, almost three-fifths of top AI researchers worked in the US; only two-fifths worked here in 2022. 

“The thing about elite talent is that they generally want to work at the most cutting-edge and dynamic places. They want to do incredible work and be rewarded for it,” says AJ Cortese, a senior research associate at MacroPolo and another of the main authors. “So far, the United States still leads the way in having that AI ecosystem—from leading institutions to companies—that appeals to top talent.”

Two pie charts showing the leading countries where AI researchers work in 2019 and 2022.

In 2022, 28% of the top AI researchers were working in China. This significant portion speaks to the growth of the domestic AI sector in China and the job opportunities it has created. Compared with 2019, three more Chinese universities and one company (Huawei) made it into the top tier of institutions that produce AI research. 

It’s true that most Chinese AI companies are still considered to lag behind their US peers—for example, China usually trails the US by a few months in releasing comparable generative AI models. However, it seems like they have started catching up.

4. Top-tier AI researchers now are more willing to work in their home countries.

This is perhaps the biggest and also most surprising change in the data, in my opinion. Like their Chinese peers, more Indian AI researchers ended up staying in their home country for work.

In fact, this seems to be a broader pattern across the board: it used to be that more than half of AI researchers worked in a country different from their home. Now, the balance has tipped in favor of working in their own countries. 

Two pie charts showing the portion of AI researchers choosing to work abroad vs. at home in 2019 and 2022.

This is good news for countries trying to catch up with the US research lead in AI. “It goes without saying most countries would prefer ‘brain gain’ over ‘brain drain’—especially when it comes to a highly complex and technical discipline like AI,” Cortese says. 

It’s not easy to create an environment and culture that not only retains its own talents but manages to pull scholars from other countries, but lots of countries are now working on it. I can only begin to imagine what the report might look like in a few years.  

Did anything else stand out to you in the report? Let me know your thoughts by writing to zeyi@technologyreview.com.


Now read the rest of China Report

Catch up with China

1. The Dutch prime minister will visit China this week to discuss with Chinese president Xi Jinping whether the Dutch chipmaking equipment company ASML can keep servicing Chinese clients. (Reuters $)

  • Here’s an inside look into ASML’s factory and how it managed to dominate advanced chipmaking. (MIT Technology Review)

2. Hong Kong passed a tough national security law that makes it more dangerous to protest Beijing’s rule. (BBC)

3. A new bill in France suggests imposing hefty fines on Shein and similar ultrafast-fashion companies for their negative environmental impact—as much as $11 per item that they sell in France. (Nikkei Asia)

4. Huawei filed a patent to make more advanced chips with a low-tech workaround. (Bloomberg $)

  • Meanwhile, a US official accused the Chinese chip foundry SMIC of breaking US law by making a chip for Huawei. (South China Morning Post $)

5. Instead of the usual six and a half days a week, Tesla has instructed its Shanghai factory to reduce production to five days a week. The slowdown of EV sales in China could be the reason. (Bloomberg $)

6. TikTok is still having plenty of troubles. A new political TV ad (paid for by a mysterious new nonprofit), playing in three US swing states, attacks Zhang Fuping, a ByteDance vice president that very few people have heard of. (Punchbowl News)

  • As TikTok still hasn’t reached a licensing deal with Universal Music Group, users have had to get creative to find alternative soundtracks for their videos. (Billboard)

7. China launched a communications satellite that will help relay signals for missions to explore the dark side of the moon. (Reuters $)

Lost in translation

The most-hyped generative AI app in China these days is Kimi, according to the Chinese publication Sina Tech. Released by Moonshot AI, a Chinese “unicorn” startup, Kimi made headlines last week when it announced it had started supporting inputting text using over 2 million Chinese characters. (For comparison, OpenAI’s GPT-4 Turbo currently supports inputting 100,000 Chinese characters, while Claude3-200K supports about 160,000 characters.)

While some of the app’s virality can be credited to a marketing push that intensified recently. Chinese users are now busy feeding popular and classic books to the model and testing how well it can understand the context. Feeling threatened, other Chinese AI apps owned by tech giants like Baidu and Alibaba have followed suit, announcing that they will soon support 5 million or even 10 million Chinese characters. But processing large amounts of text, while impressive, is very costly in the generative AI age—and some observers worry this isn’t the commercial direction that companies ought to head in.

One more thing

Fluffy pajamas, sweatpants, outdated attire: young Chinese people are dressing themselves in “gross outfits” to work—an intentional provocation to their bosses and an expression of silent resistance to the trend that glorifies career hustle. “I just don’t think it’s worth spending money to dress up for work, since I’m just sitting there,” one of them told the New York Times.

Update: The story has been updated to clarify the affiliation of the report authors.

What’s next for generative video

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

When OpenAI revealed its new generative video model, Sora, last month, it invited a handful of filmmakers to try it out. This week the company published the results: seven surreal short films that leave no doubt that the future of generative video is coming fast. 

The first batch of models that could turn text into video appeared in late 2022, from companies including Meta, Google, and video-tech startup Runway. It was a neat trick, but the results were grainy, glitchy, and just a few seconds long.

Fast-forward 18 months, and the best of Sora’s high-definition, photorealistic output is so stunning that some breathless observers are predicting the death of Hollywood. Runway’s latest models can produce short clips that rival those made by blockbuster animation studios. Midjourney and Stability AI, the firms behind two of the most popular text-to-image models, are now working on video as well.

A number of companies are racing to make a business on the back of these breakthroughs. Most are figuring out what that business is as they go. “I’ll routinely scream, ‘Holy cow, that is wicked cool’ while playing with these tools,” says Gary Lipkowitz, CEO of Vyond, a firm that provides a point-and-click platform for putting together short animated videos. “But how can you use this at work?”

Whatever the answer to that question, it will probably upend a wide range of businesses and change the roles of many professionals, from animators to advertisers. Fears of misuse are also growing. The widespread ability to generate fake video will make it easier than ever to flood the internet with propaganda and nonconsensual porn. We can see it coming. The problem is, nobody has a good fix.

As we continue to get to grips what’s ahead—good and bad—here are four things to think about. We’ve also curated a selection of the best videos filmmakers have made using this technology, including an exclusive reveal of “Somme Requiem,” an experimental short film by Los Angeles–based production company Myles. Read on for a taste of where AI moviemaking is headed. 

1. Sora is just the start

OpenAI’s Sora is currently head and shoulders above the competition in video generation. But other companies are working hard to catch up. The market is going to get extremely crowded over the next few months as more firms refine their technology and start rolling out Sora’s rivals.

The UK-based startup Haiper came out of stealth this month. It was founded in 2021 by former Google DeepMind and TikTok researchers who wanted to work on technology called neural radiance fields, or NeRF, which can transform 2D images into 3D virtual environments. They thought a tool that turned snapshots into scenes users could step into would be useful for making video games.

But six months ago, Haiper pivoted from virtual environments to video clips, adapting its technology to fit what CEO Yishu Miao believes will be an even bigger market than games. “We realized that video generation was the sweet spot,” says Miao. “There will be a super-high demand for it.”

“Air Head” is a short film made by Shy Kids, a pop band and filmmaking collective based in Toronto, using Sora.

Like OpenAI’s Sora, Haiper’s generative video tech uses a diffusion model to manage the visuals and a transformer (the component in large language models like GPT-4 that makes them so good at predicting what comes next), to manage the consistency between frames. “Videos are sequences of data, and transformers are the best model to learn sequences,” says Miao.

Consistency is a big challenge for generative video and the main reason existing tools produce just a few seconds of video at a time. Transformers for video generation can boost the quality and length of the clips. The downside is that transformers make stuff up, or hallucinate. In text, this is not always obvious. In video, it can result in, say, a person with multiple heads. Keeping transformers on track requires vast silos of training data and warehouses full of computers.

That’s why Irreverent Labs, founded by former Microsoft researchers, is taking a different approach. Like Haiper, Irreverent Labs started out generating environments for games before switching to full video generation. But the company doesn’t want to follow the herd by copying what OpenAI and others are doing. “Because then it’s a battle of compute, a total GPU war,” says David Raskino, Irreverent’s cofounder and CTO. “And there’s only one winner in that scenario, and he wears a leather jacket.” (He’s talking about Jensen Huang, CEO of the trillion-dollar chip giant Nvidia.)

Instead of using a transformer, Irreverent’s tech combines a diffusion model with a model that predicts what’s in the next frame on the basis of common-sense physics, such as how a ball bounces or how water splashes on the floor. Raskino says this approach reduces both training costs and the number of hallucinations. The model still produces glitches, but they are distortions of physics (like a bouncing ball not following a smooth curve, for example) with known mathematical fixes that can be applied to the video after it is generated, he says.

Which approach will last remains to be seen. Miao compares today’s technology to large language models circa GPT-2. Five years ago, OpenAI’s groundbreaking early model amazed people because it showed what was possible. But it took several more years for the technology to become a game-changer.

It’s the same with video, says Miao: “We’re all at the bottom of the mountain.”

2. What will people do with generative video? 

Video is the medium of the internet. YouTube, TikTok, newsreels, ads: expect to see synthetic video popping up everywhere there’s video already.

The marketing industry is one of the most enthusiastic adopters of generative technology. Two-thirds of marketing professionals have experimented with generative AI in their jobs, according to a recent survey Adobe carried out in the US, with more than half saying they have used the technology to produce images.

Generative video is next. A few marketing firms have already put out short films to demonstrate the technology’s potential. The latest example is the 2.5-minute-long “Somme Requiem,” made by Myles. You can watch the film below in an exclusive reveal from MIT Technology Review.

“Somme Requiem” is a short film made by Los Angeles production company Myles. Every shot was generated using Runway’s Gen 2 model. The clips were then edited together by a team of video editors at Myles.

“Somme Requiem” depicts snowbound soldiers during the World War I Christmas ceasefire in 1914. The film is made up of dozens of different shots that were produced using a generative video model from Runway, then stitched together, color-corrected, and set to music by human video editors at Myles. “The future of storytelling will be a hybrid workflow,” says founder and CEO Josh Kahn.

Kahn picked the period wartime setting to make a point. He notes that the Apple TV+ series Masters of the Air, which follows a group of World War II airmen, cost $250 million. The team behind Peter Jackson’s World War I documentary They Shall Not Grow Old spent four years curating and restoring more than 100 hours of archival film. “Most filmmakers can only dream of ever having an opportunity to tell a story in this genre,” says Kahn.

“Independent filmmaking has been kind of dying,” he adds. “I think this will create an incredible resurgence.”

Raskino hopes so. “The horror movie genre is where people test new things, to try new things until they break,” he says. “I think we’re going to see a blockbuster horror movie created by, like, four people in a basement somewhere using AI.”

So is generative video a Hollywood-killer? Not yet. The scene-setting shots in ”Somme Requiem”—empty woods, a desolate military camp—look great. But the people in it are still afflicted with mangled fingers and distorted faces, hallmarks of the technology. Generative video is best at wide-angle pans or lingering close-ups, which creates an eerie atmosphere but little action. If ”Somme Requiem” were any longer it would get dull.

But scene-setting shots pop up all the time in feature-length movies. Most are just a few seconds long, but they can take hours to film. Raskino suggests that generative video models could soon be used to produce those in-between shots for a fraction of the cost. This could also be done on the fly in later stages of production, without requiring a reshoot.

Michal Pechoucek, CTO at Gen Digital, the cybersecurity giant behind a range of antivirus brands including Norton and Avast, agrees. “I think this is where the technology is headed,” he says. “We’ll see many different models, each specifically trained in a certain domain of movie production. These will just be tools used by talented video production teams.”

We’re not there quite yet. A big problem with generative video is the lack of control users have over the output. Producing still images can be hit and miss; producing a few seconds of video is even more risky.

“Right now it’s still fun, you get a-ha moments,” says Miao. “But generating video that is exactly what you want is a very hard technical problem. We are some way off generating long, consistent videos from a single prompt.”

That’s why Vyond’s Lipkowitz thinks the technology isn’t yet ready for most corporate clients. These users want a lot more control over the look of a video than current tools give them, he says.

Thousands of companies around the world, including around 65% of the Fortune 500 firms, use Vyond’s platform to create animated videos for in-house communications, training, marketing, and more. Vyond draws on a range of generative models, including text-to-image and text-to-voice, but provides a simple drag-and-drop interface that lets users put together a video by hand, piece by piece, rather than generate a full clip with a click.

Running a generative model is like rolling dice, says Lipkowitz. “This is a hard no for most video production teams, particularly in the enterprise sector where everything must be pixel-perfect and on brand,” he says. “If the video turns out bad—maybe the characters have too many fingers, or maybe there is a company logo that is the wrong color—well, unlucky, that’s just how gen AI works.”

The solution? More data, more training, repeat. “I wish I could point to some sophisticated algorithms,” says Miao. “But no, it’s just a lot more learning.”

3. Misinformation isn’t new, but deepfakes will make it worse.

Online misinformation has been undermining our faith in the media, in institutions, and in each other for years. Some fear that adding fake video to the mix will destroy whatever pillars of shared reality we have left.

“We are replacing trust with mistrust, confusion, fear, and hate,” says Pechoucek. “Society without ground truth will degenerate.”

Pechoucek is especially worried about the malicious use of deepfakes in elections. During last year’s elections in Slovakia, for example, attackers shared a fake video that showed the leading candidate discussing plans to manipulate voters. The video was low quality and easy to spot as a deepfake. But Pechoucek believes it was enough to turn the result in favor of the other candidate.

“Adventurous Puppies” is a short clip made by OpenAI using with Sora.

John Wissinger, who leads the strategy and innovation teams at Blackbird AI, a firm that tracks and manages the spread of misinformation online, believes fake video will be most persuasive when it blends real and fake footage. Take two videos showing President Joe Biden walking across a stage. In one he stumbles, in the other he doesn’t. Who is to say which is real?

“Let’s say an event actually occurred, but the way it’s presented to me is subtly different,” says Wissinger. “That can affect my emotional response to it.” As Pechoucek noted, a fake video doesn’t even need to be that good to make an impact. A bad fake that fits existing biases will do more damage than a slick fake that doesn’t, says Wissinger.

That’s why Blackbird focuses on who is sharing what with whom. In some sense, whether something is true or false is less important than where it came from and how it is being spread, says Wissinger. His company already tracks low-tech misinformation, such as social media posts showing real images out of context. Generative technologies make things worse, but the problem of people presenting media in misleading ways, deliberately or otherwise, is not new, he says.

Throw bots into the mix, sharing and promoting misinformation on social networks, and things get messy. Just knowing that fake media is out there will sow seeds of doubt into bad-faith discourse. “You can see how pretty soon it could become impossible to discern between what’s synthesized and what’s real anymore,” says Wissinger.

4. We are facing a new online reality.

Fakes will soon be everywhere, from disinformation campaigns, to ad spots, to Hollywood blockbusters. So what can we do to figure out what’s real and what’s just fantasy? There are a range of solutions, but none will work by themselves.

The tech industry is working on the problem. Most generative tools try to enforce certain terms of use, such as preventing people from creating videos of public figures. But there are ways to bypass these filters, and open-source versions of the tools may come with more permissive policies.

Companies are also developing standards for watermarking AI-generated media and tools for detecting it. But not all tools will add watermarks, and watermarks can be stripped from a video’s metadata. No reliable detection tool exists either. Even if such tools worked, they would become part of a cat-and-mouse game of trying to keep up with advances in the models they are designed to police.

Online platforms like X and Facebook have poor track records when it comes to moderation. We should not expect them to do better once the problem gets harder. Miao used to work at TikTok, where he helped build a moderation tool that detects video uploads that violate TikTok’s terms of use. Even he is wary of what’s coming: “There’s real danger out there,” he says. “Don’t trust things that you see on your laptop.” 

Blackbird has developed a tool called Compass, which lets you fact check articles and social media posts. Paste a link into the tool and a large language model generates a blurb drawn from trusted online sources (these are always open to review, says Wissinger) that gives some context for the linked material. The result is very similar to the community notes that sometimes get attached to controversial posts on sites like X, Facebook, and Instagram. The company envisions having Compass generate community notes for anything. “We’re working on it,” says Wissinger.

But people who put links into a fact-checking website are already pretty savvy—and many others may not know such tools exist, or may not be inclined to trust them. Misinformation also tends to travel far wider than any subsequent correction.

In the meantime, people disagree on whose problem this is in the first place. Pechoucek says tech companies need to open up their software to allow for more competition around safety and trust. That would also let cybersecurity firms like his develop third-party software to police this tech. It’s what happened 30 years ago when Windows had a malware problem, he says: “Microsoft let antivirus firms in to help protect Windows. As a result, the online world became a safer place.”

But Pechoucek isn’t too optimistic. “Technology developers need to build their tools with safety as the top objective,” he says. “But more people think about how to make the technology more powerful than worry about how to make it more safe.”

Made by OpenAI using Sora.

There’s a common fatalistic refrain in the tech industry: change is coming, deal with it. “Generative AI is not going to get uninvented,” says Raskino. “This may not be very popular, but I think it’s true: I don’t think tech companies can bear the full burden. At the end of the day, the best defense against any technology is a very well-educated public. There’s no shortcut.”

Miao agrees. “It’s inevitable that we will massively adopt generative technology,” he says. “But it’s also the responsibility of the whole of society. We need to educate people.” 

“Technology will move forward, and we need to be prepared for this change,” he adds. “We need to remind our parents, our friends, that the things they see on their screen might not be authentic.” This is especially true for older generations, he says: “Our parents need to be aware of this kind of danger. I think everyone should work together.”

We’ll need to work together quickly. When Sora came out a month ago, the tech world was stunned by how quickly generative video had progressed. But the vast majority of people have no idea this kind of technology even exists, says Wissinger: “They certainly don’t understand the trend lines that we’re on. I think it’s going to catch the world by storm.”

How three filmmakers created Sora’s latest stunning videos

In the last month, a handful of filmmakers have taken Sora for a test drive. The results, which OpenAI published this week, are amazing. The short films are a big jump up even from the cherry-picked demo videos that OpenAI used to tease its new generative model just six weeks ago. Here’s how three of the filmmakers did it.

Air Head” by Shy Kids

Shy Kids is a pop band and filmmaking collective based in Toronto that describes its style as “punk-rock Pixar.” The group has experimented with generative video tech before. Last year it made a music video for one of its songs using an open-source tool called Stable Warpfusion. It’s cool, but low-res and glitchy. The film it made with Sora, called “Air Head,” could pass for real footage—if it didn’t feature a man with a balloon for a face.

One problem with most generative video tools is that it’s hard to maintain consistency across frames. When OpenAI asked Shy Kids to try out Sora, the band wanted to see how far they could push it. “We thought a fun, interesting experiment would be—could we make a consistent character?” says Shy Kids member Walter Woodman. “We think it was mostly successful.”

Generative models can also struggle with anatomical details like hands and faces. But in the video there is a scene showing a train car full of passengers, and the faces are near perfect. “It’s mind-blowing what it can do,” says Woodman. “Those faces on the train were all Sora.”

Has generative video’s problem with faces and hands been solved? Not quite. We still get glimpses of warped body parts. And text is still a problem (in another video, by the creative agency Native Foreign, we see a bike repair shop with the sign “Biycle Repaich”). But everything in “Air Head” is raw output from Sora. After editing together many different clips produced with the tool, Shy Kids did a bunch of post-processing to make the film look even better. They used visual effects tools to fix certain shots of the main character’s balloon face, for example.

Woodman also thinks that the music (which they wrote and performed) and the voice-over (which they also wrote and performed) help to lift the quality of the film even more. Mixing these human touches in with Sora’s output is what makes the film feel alive, says Woodman. “The technology is nothing without you,” he says. “It is a powerful tool, but you are the person driving it.”

[Update: Shy Kids have posted a behind-the-scenes video for Air Head on X. Come for the pro tips, stay for the Sora bloopers: “How do you maintain a character and look consistent even though Sora is a slot machine as to what you get back?” asks Woodman.]

Abstract“ by Paul Trillo

Paul Trillo, an artist and filmmaker, wanted to stretch what Sora could do with the look of a film. His video is a mash-up of retro-style footage with shots of a figure who morphs into a glitter ball and a breakdancing trash man. He says that everything you see is raw output from Sora: “No color correction or post FX.” Even the jump-cut edits in the first part of the film were produced using the generative model.

Trillo felt that the demos that OpenAI put out last month came across too much like clips from video games. “I wanted to see what other aesthetics were possible,” he says. The result is a video that looks like something shot with vintage 16-millimeter film. “It took a fair amount of experimenting, but I stumbled upon a series of prompts that helps make the video feel more organic or filmic,” he says.

Beyond Our Reality” by Don Allen Stevenson

Don Allen Stevenson III is a filmmaker and visual effects artist. He was one of the artists invited by OpenAI to try out DALL-E 2, its text-to-image model, a couple of years ago. Stevenson’s film is a NatGeo-style nature documentary that introduces us to a menagerie of imaginary animals, from the girafflamingo to the eel cat.

In many ways working with text-to-video is like working with text-to-image, says Stevenson. “You enter a text prompt and then you tweak your prompt a bunch of times,” he says. But there’s an added hurdle. When you’re trying out different prompts, Sora produces low-res video. When you hit on something you like, you can then increase the resolution. But going from low to high res is involves another round of generation, and what you liked in the low-res version can be lost.

Sometimes the camera angle is different or the objects in the shot have moved, says Stevenson. Hallucination is still a feature of Sora, as it is in any generative model. With still images this might produce weird visual defects; with video those defects can appear across time as well, with weird jumps between frames.

Stevenson also had to figure out how to speak Sora’s language. It takes prompts very literally, he says. In one experiment he tried to create a shot that zoomed in on a helicopter. Sora produced a clip in which it mixed together a helicopter with a camera’s zoom lens. But Stevenson says that with a lot of creative prompting, Sora is easier to control than previous models.

Even so, he thinks that surprises are part of what makes the technology fun to use: “I like having less control. I like the chaos of it,” he says. There are many other video-making tools that give you control over editing and visual effects. For Stevenson, the point of a generative model like Sora is to come up with strange, unexpected material to work with in the first place.

The clips of the animals were all generated with Sora. Stevenson tried many different prompts until the tool produced something he liked. “I directed it, but it’s more like a nudge,” he says. He then went back and forth, trying out variations.

Stevenson pictured his fox crow having four legs, for example. But Sora gave it two, which worked even better. (It’s not perfect: sharp-eyed viewers will see that at one point in the video the fox crow switches from two legs to four, then back again.) Sora also produced several versions that he thought were too creepy to use.

When he had a collection of animals he really liked, he edited them together. Then he added captions and a voice-over on top. Stevenson could have created his made-up menagerie with existing tools. But it would have taken hours, even days, he says. With Sora the process was far quicker.

“I was trying to think of something that would look cool and experimented with a lot of different characters,” he says. “I have so many clips of random creatures.” Things really clicked when he saw what Sora did with the girafflamingo. “I started thinking: What’s the narrative around this creature? What does it eat, where does it live?” he says. He plans to put out a series of extended films following each of the fantasy animals in more detail.

Stevenson also hopes his fantastical animals will make a bigger point. “There’s going to be a lot of new types of content flooding feeds,” he says. “How are we going to teach people what’s real? In my opinion, one way is to tell stories that are clearly fantasy.”

Stevenson points out that his film could be the first time a lot of people see a video created by a generative model. He wants that first impression to make one thing very clear: This is not real.

It’s easy to tamper with watermarks from AI-generated text

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. 

Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May, will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks. The research is yet to be peer reviewed, but will be presented at the International Conference on Learning Representations conference in May.  

AI language models work by predicting the next likely word in a sentence, generating one word at a time on the basis of those predictions. Watermarking algorithms for text divide the language model’s vocabulary into words on a “green list” and a “red list,” and then make the AI model choose words from the green list. The more words in a sentence that are from the green list, the more likely it is that the text was generated by a computer. Humans tend to write sentences that include a more random mix of words. 

The researchers tampered with five different watermarks that work in this way. They were able to reverse-engineer the watermarks by using an API to access the AI model with the watermark applied and prompting it many times, says Staab. The responses allow the attacker to “steal” the watermark by building an approximate model of the watermarking rules. They do this by analyzing the AI outputs and comparing them with normal text. 

Once they have an approximate idea of what the watermarked words might be, this allows the researchers to execute two kinds of attacks. The first one, called a spoofing attack, allows malicious actors to use the information they learned from stealing the watermark to produce text that can be passed off as being watermarked. The second attack allows hackers to scrub AI-generated text from its watermark, so the text can be passed off as human-written. 

The team had a roughly 80% success rate in spoofing watermarks, and an 85% success rate in stripping AI-generated text of its watermark. 

Researchers not affiliated with the ETH Zürich team, such as Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also found watermarks to be unreliable and vulnerable to spoofing attacks. 

The findings from ETH Zürich confirm that these issues with watermarks persist and extend to the most advanced types of chatbots and large language models being used today, says Feizi. 

The research “underscores the importance of exercising caution when deploying such detection mechanisms on a large scale,” he says. 

Despite the findings, watermarks remain the most promising way to detect AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who worked on the research. 

But more research is needed to make watermarks ready for deployment on a large scale, he adds. Until then, we should manage our expectations of how reliable and useful these tools are. “If it’s better than nothing, it is still useful,” he says.  

Update: This research will be presented at the International Conference on Learning Representations conference. The story has been updated to reflect that.

A conversation with OpenAI’s first artist in residence

Alex Reben’s work is often absurd, sometimes surreal: a mash-up of giant ears imagined by DALL-E and sculpted by hand out of marble; critical burns generated by ChatGPT that thumb the nose at AI art. But its message is relevant to everyone. Reben is interested in the roles humans play in a world filled with machines, and how those roles are changing.

“I kind of use humor and absurdity to deal with a lot of these issues,” says Reben. “Some artists may come at things head-on in a very serious manner, but I find if you’re a little absurd it makes the ideas more approachable, even if the story you’re trying to tell is very serious.”

COURTESY OF ALEXANDER REBEN

Reben is OpenAI’s first artist in residence. Officially, the appointment started in January and lasts three months. But Reben’s relationship with the San Francisco–based AI firm seems casual: “It’s a little fuzzy, because I’m the first, and we’re figuring stuff out. I’m probably going to keep working with them.”

In fact, Reben has been working with OpenAI for years already. Five years ago, he was invited to try out an early version of GPT-3 before it was released to the public. “I got to play around with that quite a bit and made a few artworks,” he says. “They were quite interested in seeing how I could use their systems in different ways. And I was like, cool, I’d love to try something new, obviously. Back then I was mostly making stuff with my own models or using websites like Ganbreeder [a precursor of today’s generative image-making models].”

In 2008, Reben studied math and robotics at MIT’s Media Lab. There he helped create a cardboard robot called Boxie, which inspired the cute robot Baymax in the movie Big Hero 6. He is now director of technology and research at Stochastic Labs, a nonprofit incubator for artists and engineers in Berkeley, California. I spoke to Reben via Zoom about his work, the unresolved tension between art and technology, and the future of human creativity.

Our conversation has been edited for length and clarity.

You’re interested in ways that humans and machines interact. As an AI artist, how would you describe what you do with technology? Is it a tool, a collaborator?

Firstly, I don’t call myself an AI artist. AI is simply another technological tool. If something comes along after AI that interests me, I wouldn’t, like, say, “Oh, I’m only an AI artist.”

Okay. But what is it about these AI tools? Why have you spent your career playing around with this kind of technology?

My research at the Media Lab was all about social robotics, looking at how people and robots come together in different ways. One robot [Boxie] was also a filmmaker. It basically interviewed people, and we found that the robot was making people open up to it and tell it very deep stories. This was pre-Siri, or anything like that. These days people are familiar with the idea of talking to machines. So I’ve always been interested in how humanity and technology co-evolve over time. You know, we are who we are today because of technology.

three small sculptures on a white plinth. The first is a puppet head wearing a white cowboy hat and the other two are small smiling cardboard robots on plastic conveyor wheels
A few cardboard BlabDroids displayed next to a plastic mask from a performative art piece, entitled Five Dollars Can Save Planet Earth.
COURTESY OF ALEXANDER REBEN

Right now, there’s a lot of pushback against the use of AI in art. There’s a lot of understandable unhappiness about technology that lets you just press a button and get an image. People are unhappy that these tools were even made and argue that the makers of these tools, like OpenAI, should maybe carry some more responsibility. But here you are, immersed in the art world, continuing to make fun, engaging art. I’m wondering what your experience of those kinds of conversations has been?

Yeah. So as I’m sure you know, being in the media, the negative voices are always louder. The people who are using these tools in positive ways aren’t quite as loud sometimes.

But, I mean, it’s also a very wide issue. People take a negative view for many different reasons. Some people worry about the data sets, some people worry about job replacement. Other people worry about, you know, disinformation and the world being flooded with media. And they’re all valid concerns.

When I talk about this, I go to the history of photography. What we’re seeing today is basically a parallel of what happened back then. There are no longer artists who paint products for a living—like, who paint cans of peaches for an advertisement in a magazine or on a billboard. But that used to be a job, right? Photography eliminated that swath of folks.

You know, you used the phrase—I wrote it down—“just press a button and get an image,” which also reminds me of photography. Anyone can push a button and get an image, but to be a fine-art photographer, it takes a lot of skill. Just because artwork is quick to make doesn’t necessarily mean it’s any worse than, like, someone sculpting something for 60 years out of marble. They’re different things.

AI is moving fast. We’ve moved past the equivalent of wet-plate photography using cyanide. But we’re certainly not in the Polaroid phase quite yet. We’re still coming to terms with what this means, both in a fine-art sense but also for jobs.

But, yeah, your question has so many facets. We could pick any one of them and go at it. There’s definitely a lot of valid concerns out there. But I also think looking at the history of technology, and how it’s actually empowered artists and people to make new things, is important as well.

There’s another line of argument that if you have a potentially infinite supply of AI-generated images, it devalues creativity. I’m curious about the balance you see in your work between what you do and what the technology does for you. How do you relate that balance to this question of value, and where we find value in art?

Sure, value in art—there’s an economic sense and there’s a critical sense, right? In an economic sense, you could tape a banana to a wall and sell it for 30,000 dollars. It’s just who’s willing to buy it or whatever.

In a critical sense, again, going back to photography, the world is flooded with images and there are still people making great photography out there. And there are people who set themselves apart by doing something that is different.

Reben’s exhibition “AI am I?” featuring The Plungers is on view at Sacramento’s Crocker Art Museum until the end of April.
COURTESY OF ALEXANDER REBEN

I play around with those ideas. A little bit like—you know, the plunger work was the first one. [The Plungers is an installation that Reben made by creating a physical version of an artwork invented by GPT-3.] I got GPT to describe an artwork that didn’t exist; then I made it. Which kind of flips the idea of authorship on his head but still required me to go through thousands of outputs to find one that was funny enough to make.

Back then GPT wasn’t a chatbot. I spent a good month coming up with the beginning bits of texts—like, wall labels next to art in museums—and getting GPT to complete them.

I also really like your ear sculpture, Ear we go again. It’s a sculpture described by GPT-3, visualized by DALL-E, and carved out of marble by a robot. It’s sort of like a waterfall, with one kind of software feeding the next.

When text-to-image came out, it made obvious sense to feed it the descriptions of artworks I’d been generating. It’s a chain, sort of back and forth, human to machine back to human. That ear, in particular: it starts with a description that’s fed into DALL-E, but then that image was turned into a 3D model by a human 3D artist.

And after that it was carved by robots. But the robots get only so far with the detail, so human sculptors have to come in and finish it by hand. I’ve made 10 or 15 permutations of this, playing with those back-and-forths, chaining technology together. And the final thing that happens now is that I will take a picture of the artwork and get GPT-4 to create the wall label for it. 

Yeah, that keeps coming up in your work, the different ways that humans and machines interact.

You know, I made some videos of the process of these things being made to show how many artisans were employed in making them. There are still huge industries where I can see AI increasing work for folks, people who will make stuff that AI comes up with.  

I’m struck by the serendipity that often comes with generative tools, making art out of something random. Do you see a connection between your work and found art or ready-mades, like Duchamp’s Fountain? I mean, you’re maybe not just coming across a urinal and thinking, “Oh, that’s cool.” But when you play around with these tools, at some point you must get something presented to you that you react to and think, “I can use that.”

For sure. Yeah, it actually reminds me a little bit more of street photography, which I used to do when I was in college in New York City, where you would just kind of roam around and wait for something to inspire you. Then you’d set yourself up to capture the image in the way that you wanted. It’s kind of like that for sure. There’s definitely a curatorial process to it. There’s a process of finding things, which I think is interesting.

We talked about photography. Photography changed the art that came after it. You know, you had movements where people wanted to try to get at a reality that wasn’t photographic reality—things like Impressionism, and Cubism or Picasso. Do you think we’ll see something similar happening because of AI?

I think so. Any new artistic tool definitely changes the field as people figure out not only how to use that tool but how to differentiate themselves from what that tool can do.

Talking of AI as a tool—do you think that art will always be something made by humans? That no matter how good the tech gets, it will always just be a tool? You know, the way you’ve strung together these different AIs—you could do that without being in the loop. You could just have some kind of curator AI at the end that chooses what it likes best. Would that ever be art?

I actually have a couple of works in which an AI creates an image, uses the image to create a new image, and just keeps going. But I think even in a super-automated process you can go back far enough to find some human somewhere who made a decision to do something. Like, maybe they chose what data set to use.

We might see hotel rooms filled with robot paintings. I mean, stuff we hardly even look at, that never even makes its way through human curation.

I guess the question is really how much human involvement is needed to make something art. Is there a threshold or, like, a percentage of involvement? It’s a good question.

Yeah, I guess it’s like, is it still art if there’s no one there to see it?

You know, what is and isn’t art is one of those questions that has been asked forever. I think more to the point is: What is good art versus bad art? And that’s very personal.

But I think humans are always going to be doing this stuff. We will still be painting in the far future, even when robots are making paintings.

AI could make better beer. Here’s how.

Crafting a good-tasting beer is a difficult task. Big breweries select hundreds of trained tasters from among their employees to test their new products. But running such sensory tasting panels is expensive, and perceptions of what tastes good can be highly subjective.  

What if artificial intelligence could help lighten the load? New AI models can accurately identify not only how highly consumers will rate a certain Belgian beer, but also what kinds of compounds brewers should be adding to make the beer taste better, according to research published in Nature Communications today.

These kinds of models could help food and drink manufacturers develop new products or tweak existing recipes to better suit the tastes of consumers, which could help save a lot of time and money that would have gone into running trials. 

To train their AI models, the researchers spent five years chemically analyzing 250 commercial beers, measuring each beer’s chemical properties and flavor compounds—which dictate how it’ll taste. 

The researchers then combined these detailed analyses with a trained tasting panel’s assessments of the beers—including hop, yeast, and malt flavors—and 180,000 reviews of the same beers taken from the popular online platform RateBeer, sampling scores for the beers’ taste, appearance, aroma, and overall quality.

This large data set, which links chemical data with sensory features, was used to train 10 machine-learning models to accurately predict a beer’s taste, smell, and mouthfeel and how likely a consumer was to rate it highly. 

To compare the models, they split the data into a training set and a test set. Once a model was trained on the data within the training set, they evaluated its ability to predict the test set.

The researchers found that all the models were better than the trained panel of human experts at predicting the rating a beer had received from RateBeer.

Through these models, the researchers were able to pinpoint specific compounds that contribute to consumer appreciation of a beer: people were more likely to rate a beer highly if it contained these specific compounds. For example, the models predicted that adding lactic acid, which is present in tart-tasting sour beers, could improve other kinds of beers by making them taste fresher.

“We had the models analyze these beers and then asked them ‘How can we make these beers better?’” says Kevin Verstrepen, a professor at KU Leuven and director of the VIB-KU Leuven Center for Microbiology, who worked on the project. “Then we went in and actually made those changes to the beers by adding flavor compounds. And lo and behold—once we did blind tastings, the beers became better, and more generally appreciated.”

One exciting application of the research is that it could be used to make better alcohol-free beers—a major challenge for the beverage industry, he says. The researchers used the model’s predictions to add a mixture of compounds to a nonalcoholic beer that human tasters rated significantly higher in terms of body and sweetness than its previous incarnation.

This type of machine-learning approach could also be enormously useful in exploring food texture and nutrition and adapting ingredients to suit different populations, says Carolyn Ross, a professor of food science at Washington State University, who was not involved in the research. For example, older people tend to find complex combinations of textures or ingredients less appealing, she says. 

“There’s so much that we can explore there, especially when we’re looking at different populations and trying to come up with specific products for them,” she says.

Apple researchers explore dropping “Siri” phrase & listening with AI instead

Researchers from Apple are probing whether it’s possible to use artificial intelligence to detect when a user is speaking to a device like an iPhone, thereby eliminating the technical need for a trigger phrase like “Siri,” according to a paper published on Friday.

In a study, which was uploaded to Arxiv and has not been peer-reviewed, researchers trained a large language model using both speech captured by smartphones as well as acoustic data from background noise to look for patterns that could indicate when they want help from the device. The model was built in part with a version of OpenAI’s GPT-2, “since it is relatively lightweight and can potentially run on devices such as smartphones,” the researchers wrote. The paper describes over 129 hours of data and additional text data used to train the model, but did not specify the source of the recordings that went into the training set. Six of the seven authors list their affiliation as Apple, and three of them work on the company’s Siri team according to their LinkedIn profiles. (The seventh author did work related to the paper during an Apple internship.)

The results were promising, according to the paper. The model was able to make more accurate predictions than audio-only or text-only models, and improved further as the size of the models grew larger. Beyond exploring the research question, it’s unclear if Apple plans to eliminate the “Hey Siri” trigger phrase.

Neither Apple, nor the paper’s researchers immediately returned requests for comment.

Currently, Siri functions by holding small amounts of audio and does not begin recording or preparing to answer user prompts until it hears the trigger phrase. Eliminating that “Hey Siri” prompt could increase concerns about our devices “always listening”, said Jen King, a privacy and data policy fellow at the Stanford Institute for Human-Centered Artificial Intelligence. 

The way Apple handles audio data has previously come under scrutiny by privacy advocates. In 2019, reporting from The Guardian revealed that Apple’s quality control contractors regularly heard private audio collected from iPhones while they worked with Siri data, including sensitive conversations between doctors and patients. Two years later, Apple responded with policy changes, including storing more data on devices and allowing users to opt-out of allowing their recordings to be used to improve Siri. A class action suit was brought against the company in California in 2021 that alleged Siri is being turned on even when not activated.  

The “Hey Siri” prompt can serve an important purpose for users, according to King. The phrases provide a way to know when the device is listening, and getting rid of that might mean more convenience, but less transparency from the device, King told MIT Technology Review. The research did not detail if the trigger phrase would be replaced by any other signal that the AI assistant is engaged. 

“I’m skeptical that a company should mandate that form of interaction,” King says.

The paper is one of a number of recent signals that Apple, which is perceived to be lagging behind other tech giants like Amazon, Google, and Facebook in the artificial intelligence race, is planning to incorporate more AI into its products. According to news first reported by VentureBeat, Apple is building a generative AI model called MM1 that can work in text and images, which would be the company’s answer to Open AI’s ChatGPT and a host of other chatbots by leading tech giants. Meanwhile, Bloomberg reported that Apple is in talks with Google about using the company’s AI model Gemini in iPhones, and on Friday the Wall Street Journal reported that it had engaged in talks with Baidu about using that company’s AI products.

Google DeepMind’s new AI assistant helps elite soccer coaches get even better

Soccer teams are always looking to get an edge over their rivals. Whether it’s studying players’ susceptibility to injury, or opponents’ tactics—top clubs look at reams of data to give them the best shot of winning. 

They might want to add a new AI assistant developed by Google DeepMind to their arsenal. It can suggest tactics for soccer set-pieces that are even better than those created by professional club coaches. 

The system, called TacticAI, works by analyzing a dataset of 7,176 corner kicks taken by players for Liverpool FC, one of the biggest soccer clubs in the world. 

Corner kicks are awarded to an attacking team when the ball passes over the goal line after touching a player on the defending team. In a sport as free-flowing and unpredictable as soccer, corners—like free kicks and penalties—are rare instances in the game when teams can try out pre-planned plays.

TacticAI uses predictive and generative AI models to convert each corner kick scenario—such as a receiver successfully scoring a goal, or a rival defender intercepting the ball and returning it to their team—into a graph, and the data from each player into a node on the graph, before modeling the interactions between each node. The work was published in Nature Communications today.

Using this data, the model provides recommendations about where to position players during a corner to give them, for example, the best shot at scoring a goal, or the best combination of players to get up front. It can also try to predict the outcomes of a corner, including whether a shot will take place, or which player is most likely to touch the ball first.

The main benefit is that the AI assistant reduces the workload of the coaches, says Ondřej Hubáček, an analyst at the sports data firm Ematiq who specializes in predictive models, and who did not work on the project. “An AI system can go through the data quickly and point out errors a team is making—I think that’s the added value you can get from AI assistants,” he says. 

To assess TacticAI’s suggestions, GoogleDeepMind presented them to five football experts: three data scientists, one video analyst, and one coaching assistant, all of whom work at Liverpool FC. Not only did these experts struggle to distinguish’s TacticAI’s suggestions from real game play scenarios, they also favored the system’s strategies over existing tactics 90% of the time.

These findings suggest that TacticAI’s strategies could be useful for human coaches in real-life games, says Petar Veličković, a staff research scientist at GoogleDeepMind who worked on the project. “Top clubs are always searching for an edge, and I think our results indicate that techniques like these are likely going to become a part of modern football going forward,” he says.

TacticAI’s powers of prediction aren’t just limited to corner kicks either—the same method could be easily applied to other set pieces, general play throughout a match, or even other sports entirely, such as American football, hockey, or basketball, says Veličković.

“As long as there’s a team-based sport where you believe that modeling relationships between players will be useful and you have a source of data, it’s applicable,” he says.

How AI taught Cassie the two-legged robot to run and jump

If you’ve watched Boston Dynamics’ slick videos of robots running, jumping and doing parkour, you might have the impression robots have learned to be amazingly agile. In fact, these robots are still coded by hand, and would struggle to deal with new obstacles they haven’t encountered before.

However, a new method of teaching robots to move could help to deal with new scenarios, through trial and error—just as humans learn and adapt to unpredictable events.  

Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie to run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. Reinforcement learning works by rewarding or penalizing an AI as it tries to carry out an objective. In this case, the approach taught the robot to generalize and respond in new scenarios, instead of freezing like its predecessors may have done. 

“We wanted to push the limits of robot agility,” says Zhongyu Li, a PhD student at University of California, Berkeley, who worked on the project, which has not yet been peer-reviewed. “The high-level goal was to teach the robot to learn how to do all kinds of dynamic motions the way a human does.”

The team used a simulation to train Cassie, an approach that dramatically speeds up the time it takes it to learn—from years to weeks—and enables the robot to perform those same skills in the real world without further fine-tuning.

Firstly, they trained the neural network that controlled Cassie to master a simple skill from scratch, such as jumping on the spot, walking forward, or running forward without toppling over. It was taught by being encouraged to mimic motions it was shown, which included motion capture data collected from a human and animations demonstrating the desired movement.

After the first stage was complete, the team presented the model with new commands encouraging the robot to perform tasks using its new movement skills. Once it became proficient at performing the new tasks in a simulated environment, they then diversified the tasks it had been trained on through a method called task randomization. 

This makes the robot much more prepared for unexpected scenarios. For example, the robot was able to maintain a steady running gait while being pulled sideways by a leash. “We allowed the robot to utilize the history of what it’s observed and adapt quickly to the real world,” says Li.

Cassie completed a 400-meter run in two minutes and 34 seconds, then jumped 1.4 meters in the long jump without needing additional training.

The researchers are now planning on studying how this kind of technique could be used to train robots equipped with on-board cameras. This will be more challenging than completing actions blind, adds Alan Fern, a professor of computer science at Oregon State University who helped to develop the Cassie robot but was not involved with this project.

“The next major step for the field is humanoid robots that do real work, plan out activities, and actually interact with the physical world in ways that are not just interactions between feet and the ground,” he says.