How three filmmakers created Sora’s latest stunning videos

In the last month, a handful of filmmakers have taken Sora for a test drive. The results, which OpenAI published this week, are amazing. The short films are a big jump up even from the cherry-picked demo videos that OpenAI used to tease its new generative model just six weeks ago. Here’s how three of the filmmakers did it.

Air Head” by Shy Kids

Shy Kids is a pop band and filmmaking collective based in Toronto that describes its style as “punk-rock Pixar.” The group has experimented with generative video tech before. Last year it made a music video for one of its songs using an open-source tool called Stable Warpfusion. It’s cool, but low-res and glitchy. The film it made with Sora, called “Air Head,” could pass for real footage—if it didn’t feature a man with a balloon for a face.

One problem with most generative video tools is that it’s hard to maintain consistency across frames. When OpenAI asked Shy Kids to try out Sora, the band wanted to see how far they could push it. “We thought a fun, interesting experiment would be—could we make a consistent character?” says Shy Kids member Walter Woodman. “We think it was mostly successful.”

Generative models can also struggle with anatomical details like hands and faces. But in the video there is a scene showing a train car full of passengers, and the faces are near perfect. “It’s mind-blowing what it can do,” says Woodman. “Those faces on the train were all Sora.”

Has generative video’s problem with faces and hands been solved? Not quite. We still get glimpses of warped body parts. And text is still a problem (in another video, by the creative agency Native Foreign, we see a bike repair shop with the sign “Biycle Repaich”). But everything in “Air Head” is raw output from Sora. After editing together many different clips produced with the tool, Shy Kids did a bunch of post-processing to make the film look even better. They used visual effects tools to fix certain shots of the main character’s balloon face, for example.

Woodman also thinks that the music (which they wrote and performed) and the voice-over (which they also wrote and performed) help to lift the quality of the film even more. Mixing these human touches in with Sora’s output is what makes the film feel alive, says Woodman. “The technology is nothing without you,” he says. “It is a powerful tool, but you are the person driving it.”

[Update: Shy Kids have posted a behind-the-scenes video for Air Head on X. Come for the pro tips, stay for the Sora bloopers: “How do you maintain a character and look consistent even though Sora is a slot machine as to what you get back?” asks Woodman.]

Abstract“ by Paul Trillo

Paul Trillo, an artist and filmmaker, wanted to stretch what Sora could do with the look of a film. His video is a mash-up of retro-style footage with shots of a figure who morphs into a glitter ball and a breakdancing trash man. He says that everything you see is raw output from Sora: “No color correction or post FX.” Even the jump-cut edits in the first part of the film were produced using the generative model.

Trillo felt that the demos that OpenAI put out last month came across too much like clips from video games. “I wanted to see what other aesthetics were possible,” he says. The result is a video that looks like something shot with vintage 16-millimeter film. “It took a fair amount of experimenting, but I stumbled upon a series of prompts that helps make the video feel more organic or filmic,” he says.

Beyond Our Reality” by Don Allen Stevenson

Don Allen Stevenson III is a filmmaker and visual effects artist. He was one of the artists invited by OpenAI to try out DALL-E 2, its text-to-image model, a couple of years ago. Stevenson’s film is a NatGeo-style nature documentary that introduces us to a menagerie of imaginary animals, from the girafflamingo to the eel cat.

In many ways working with text-to-video is like working with text-to-image, says Stevenson. “You enter a text prompt and then you tweak your prompt a bunch of times,” he says. But there’s an added hurdle. When you’re trying out different prompts, Sora produces low-res video. When you hit on something you like, you can then increase the resolution. But going from low to high res is involves another round of generation, and what you liked in the low-res version can be lost.

Sometimes the camera angle is different or the objects in the shot have moved, says Stevenson. Hallucination is still a feature of Sora, as it is in any generative model. With still images this might produce weird visual defects; with video those defects can appear across time as well, with weird jumps between frames.

Stevenson also had to figure out how to speak Sora’s language. It takes prompts very literally, he says. In one experiment he tried to create a shot that zoomed in on a helicopter. Sora produced a clip in which it mixed together a helicopter with a camera’s zoom lens. But Stevenson says that with a lot of creative prompting, Sora is easier to control than previous models.

Even so, he thinks that surprises are part of what makes the technology fun to use: “I like having less control. I like the chaos of it,” he says. There are many other video-making tools that give you control over editing and visual effects. For Stevenson, the point of a generative model like Sora is to come up with strange, unexpected material to work with in the first place.

The clips of the animals were all generated with Sora. Stevenson tried many different prompts until the tool produced something he liked. “I directed it, but it’s more like a nudge,” he says. He then went back and forth, trying out variations.

Stevenson pictured his fox crow having four legs, for example. But Sora gave it two, which worked even better. (It’s not perfect: sharp-eyed viewers will see that at one point in the video the fox crow switches from two legs to four, then back again.) Sora also produced several versions that he thought were too creepy to use.

When he had a collection of animals he really liked, he edited them together. Then he added captions and a voice-over on top. Stevenson could have created his made-up menagerie with existing tools. But it would have taken hours, even days, he says. With Sora the process was far quicker.

“I was trying to think of something that would look cool and experimented with a lot of different characters,” he says. “I have so many clips of random creatures.” Things really clicked when he saw what Sora did with the girafflamingo. “I started thinking: What’s the narrative around this creature? What does it eat, where does it live?” he says. He plans to put out a series of extended films following each of the fantasy animals in more detail.

Stevenson also hopes his fantastical animals will make a bigger point. “There’s going to be a lot of new types of content flooding feeds,” he says. “How are we going to teach people what’s real? In my opinion, one way is to tell stories that are clearly fantasy.”

Stevenson points out that his film could be the first time a lot of people see a video created by a generative model. He wants that first impression to make one thing very clear: This is not real.

It’s easy to tamper with watermarks from AI-generated text

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. 

Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May, will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks. The research is yet to be peer reviewed, but will be presented at the International Conference on Learning Representations conference in May.  

AI language models work by predicting the next likely word in a sentence, generating one word at a time on the basis of those predictions. Watermarking algorithms for text divide the language model’s vocabulary into words on a “green list” and a “red list,” and then make the AI model choose words from the green list. The more words in a sentence that are from the green list, the more likely it is that the text was generated by a computer. Humans tend to write sentences that include a more random mix of words. 

The researchers tampered with five different watermarks that work in this way. They were able to reverse-engineer the watermarks by using an API to access the AI model with the watermark applied and prompting it many times, says Staab. The responses allow the attacker to “steal” the watermark by building an approximate model of the watermarking rules. They do this by analyzing the AI outputs and comparing them with normal text. 

Once they have an approximate idea of what the watermarked words might be, this allows the researchers to execute two kinds of attacks. The first one, called a spoofing attack, allows malicious actors to use the information they learned from stealing the watermark to produce text that can be passed off as being watermarked. The second attack allows hackers to scrub AI-generated text from its watermark, so the text can be passed off as human-written. 

The team had a roughly 80% success rate in spoofing watermarks, and an 85% success rate in stripping AI-generated text of its watermark. 

Researchers not affiliated with the ETH Zürich team, such as Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also found watermarks to be unreliable and vulnerable to spoofing attacks. 

The findings from ETH Zürich confirm that these issues with watermarks persist and extend to the most advanced types of chatbots and large language models being used today, says Feizi. 

The research “underscores the importance of exercising caution when deploying such detection mechanisms on a large scale,” he says. 

Despite the findings, watermarks remain the most promising way to detect AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who worked on the research. 

But more research is needed to make watermarks ready for deployment on a large scale, he adds. Until then, we should manage our expectations of how reliable and useful these tools are. “If it’s better than nothing, it is still useful,” he says.  

Update: This research will be presented at the International Conference on Learning Representations conference. The story has been updated to reflect that.

A conversation with OpenAI’s first artist in residence

Alex Reben’s work is often absurd, sometimes surreal: a mash-up of giant ears imagined by DALL-E and sculpted by hand out of marble; critical burns generated by ChatGPT that thumb the nose at AI art. But its message is relevant to everyone. Reben is interested in the roles humans play in a world filled with machines, and how those roles are changing.

“I kind of use humor and absurdity to deal with a lot of these issues,” says Reben. “Some artists may come at things head-on in a very serious manner, but I find if you’re a little absurd it makes the ideas more approachable, even if the story you’re trying to tell is very serious.”

COURTESY OF ALEXANDER REBEN

Reben is OpenAI’s first artist in residence. Officially, the appointment started in January and lasts three months. But Reben’s relationship with the San Francisco–based AI firm seems casual: “It’s a little fuzzy, because I’m the first, and we’re figuring stuff out. I’m probably going to keep working with them.”

In fact, Reben has been working with OpenAI for years already. Five years ago, he was invited to try out an early version of GPT-3 before it was released to the public. “I got to play around with that quite a bit and made a few artworks,” he says. “They were quite interested in seeing how I could use their systems in different ways. And I was like, cool, I’d love to try something new, obviously. Back then I was mostly making stuff with my own models or using websites like Ganbreeder [a precursor of today’s generative image-making models].”

In 2008, Reben studied math and robotics at MIT’s Media Lab. There he helped create a cardboard robot called Boxie, which inspired the cute robot Baymax in the movie Big Hero 6. He is now director of technology and research at Stochastic Labs, a nonprofit incubator for artists and engineers in Berkeley, California. I spoke to Reben via Zoom about his work, the unresolved tension between art and technology, and the future of human creativity.

Our conversation has been edited for length and clarity.

You’re interested in ways that humans and machines interact. As an AI artist, how would you describe what you do with technology? Is it a tool, a collaborator?

Firstly, I don’t call myself an AI artist. AI is simply another technological tool. If something comes along after AI that interests me, I wouldn’t, like, say, “Oh, I’m only an AI artist.”

Okay. But what is it about these AI tools? Why have you spent your career playing around with this kind of technology?

My research at the Media Lab was all about social robotics, looking at how people and robots come together in different ways. One robot [Boxie] was also a filmmaker. It basically interviewed people, and we found that the robot was making people open up to it and tell it very deep stories. This was pre-Siri, or anything like that. These days people are familiar with the idea of talking to machines. So I’ve always been interested in how humanity and technology co-evolve over time. You know, we are who we are today because of technology.

three small sculptures on a white plinth. The first is a puppet head wearing a white cowboy hat and the other two are small smiling cardboard robots on plastic conveyor wheels
A few cardboard BlabDroids displayed next to a plastic mask from a performative art piece, entitled Five Dollars Can Save Planet Earth.
COURTESY OF ALEXANDER REBEN

Right now, there’s a lot of pushback against the use of AI in art. There’s a lot of understandable unhappiness about technology that lets you just press a button and get an image. People are unhappy that these tools were even made and argue that the makers of these tools, like OpenAI, should maybe carry some more responsibility. But here you are, immersed in the art world, continuing to make fun, engaging art. I’m wondering what your experience of those kinds of conversations has been?

Yeah. So as I’m sure you know, being in the media, the negative voices are always louder. The people who are using these tools in positive ways aren’t quite as loud sometimes.

But, I mean, it’s also a very wide issue. People take a negative view for many different reasons. Some people worry about the data sets, some people worry about job replacement. Other people worry about, you know, disinformation and the world being flooded with media. And they’re all valid concerns.

When I talk about this, I go to the history of photography. What we’re seeing today is basically a parallel of what happened back then. There are no longer artists who paint products for a living—like, who paint cans of peaches for an advertisement in a magazine or on a billboard. But that used to be a job, right? Photography eliminated that swath of folks.

You know, you used the phrase—I wrote it down—“just press a button and get an image,” which also reminds me of photography. Anyone can push a button and get an image, but to be a fine-art photographer, it takes a lot of skill. Just because artwork is quick to make doesn’t necessarily mean it’s any worse than, like, someone sculpting something for 60 years out of marble. They’re different things.

AI is moving fast. We’ve moved past the equivalent of wet-plate photography using cyanide. But we’re certainly not in the Polaroid phase quite yet. We’re still coming to terms with what this means, both in a fine-art sense but also for jobs.

But, yeah, your question has so many facets. We could pick any one of them and go at it. There’s definitely a lot of valid concerns out there. But I also think looking at the history of technology, and how it’s actually empowered artists and people to make new things, is important as well.

There’s another line of argument that if you have a potentially infinite supply of AI-generated images, it devalues creativity. I’m curious about the balance you see in your work between what you do and what the technology does for you. How do you relate that balance to this question of value, and where we find value in art?

Sure, value in art—there’s an economic sense and there’s a critical sense, right? In an economic sense, you could tape a banana to a wall and sell it for 30,000 dollars. It’s just who’s willing to buy it or whatever.

In a critical sense, again, going back to photography, the world is flooded with images and there are still people making great photography out there. And there are people who set themselves apart by doing something that is different.

Reben’s exhibition “AI am I?” featuring The Plungers is on view at Sacramento’s Crocker Art Museum until the end of April.
COURTESY OF ALEXANDER REBEN

I play around with those ideas. A little bit like—you know, the plunger work was the first one. [The Plungers is an installation that Reben made by creating a physical version of an artwork invented by GPT-3.] I got GPT to describe an artwork that didn’t exist; then I made it. Which kind of flips the idea of authorship on his head but still required me to go through thousands of outputs to find one that was funny enough to make.

Back then GPT wasn’t a chatbot. I spent a good month coming up with the beginning bits of texts—like, wall labels next to art in museums—and getting GPT to complete them.

I also really like your ear sculpture, Ear we go again. It’s a sculpture described by GPT-3, visualized by DALL-E, and carved out of marble by a robot. It’s sort of like a waterfall, with one kind of software feeding the next.

When text-to-image came out, it made obvious sense to feed it the descriptions of artworks I’d been generating. It’s a chain, sort of back and forth, human to machine back to human. That ear, in particular: it starts with a description that’s fed into DALL-E, but then that image was turned into a 3D model by a human 3D artist.

And after that it was carved by robots. But the robots get only so far with the detail, so human sculptors have to come in and finish it by hand. I’ve made 10 or 15 permutations of this, playing with those back-and-forths, chaining technology together. And the final thing that happens now is that I will take a picture of the artwork and get GPT-4 to create the wall label for it. 

Yeah, that keeps coming up in your work, the different ways that humans and machines interact.

You know, I made some videos of the process of these things being made to show how many artisans were employed in making them. There are still huge industries where I can see AI increasing work for folks, people who will make stuff that AI comes up with.  

I’m struck by the serendipity that often comes with generative tools, making art out of something random. Do you see a connection between your work and found art or ready-mades, like Duchamp’s Fountain? I mean, you’re maybe not just coming across a urinal and thinking, “Oh, that’s cool.” But when you play around with these tools, at some point you must get something presented to you that you react to and think, “I can use that.”

For sure. Yeah, it actually reminds me a little bit more of street photography, which I used to do when I was in college in New York City, where you would just kind of roam around and wait for something to inspire you. Then you’d set yourself up to capture the image in the way that you wanted. It’s kind of like that for sure. There’s definitely a curatorial process to it. There’s a process of finding things, which I think is interesting.

We talked about photography. Photography changed the art that came after it. You know, you had movements where people wanted to try to get at a reality that wasn’t photographic reality—things like Impressionism, and Cubism or Picasso. Do you think we’ll see something similar happening because of AI?

I think so. Any new artistic tool definitely changes the field as people figure out not only how to use that tool but how to differentiate themselves from what that tool can do.

Talking of AI as a tool—do you think that art will always be something made by humans? That no matter how good the tech gets, it will always just be a tool? You know, the way you’ve strung together these different AIs—you could do that without being in the loop. You could just have some kind of curator AI at the end that chooses what it likes best. Would that ever be art?

I actually have a couple of works in which an AI creates an image, uses the image to create a new image, and just keeps going. But I think even in a super-automated process you can go back far enough to find some human somewhere who made a decision to do something. Like, maybe they chose what data set to use.

We might see hotel rooms filled with robot paintings. I mean, stuff we hardly even look at, that never even makes its way through human curation.

I guess the question is really how much human involvement is needed to make something art. Is there a threshold or, like, a percentage of involvement? It’s a good question.

Yeah, I guess it’s like, is it still art if there’s no one there to see it?

You know, what is and isn’t art is one of those questions that has been asked forever. I think more to the point is: What is good art versus bad art? And that’s very personal.

But I think humans are always going to be doing this stuff. We will still be painting in the far future, even when robots are making paintings.

AI could make better beer. Here’s how.

Crafting a good-tasting beer is a difficult task. Big breweries select hundreds of trained tasters from among their employees to test their new products. But running such sensory tasting panels is expensive, and perceptions of what tastes good can be highly subjective.  

What if artificial intelligence could help lighten the load? New AI models can accurately identify not only how highly consumers will rate a certain Belgian beer, but also what kinds of compounds brewers should be adding to make the beer taste better, according to research published in Nature Communications today.

These kinds of models could help food and drink manufacturers develop new products or tweak existing recipes to better suit the tastes of consumers, which could help save a lot of time and money that would have gone into running trials. 

To train their AI models, the researchers spent five years chemically analyzing 250 commercial beers, measuring each beer’s chemical properties and flavor compounds—which dictate how it’ll taste. 

The researchers then combined these detailed analyses with a trained tasting panel’s assessments of the beers—including hop, yeast, and malt flavors—and 180,000 reviews of the same beers taken from the popular online platform RateBeer, sampling scores for the beers’ taste, appearance, aroma, and overall quality.

This large data set, which links chemical data with sensory features, was used to train 10 machine-learning models to accurately predict a beer’s taste, smell, and mouthfeel and how likely a consumer was to rate it highly. 

To compare the models, they split the data into a training set and a test set. Once a model was trained on the data within the training set, they evaluated its ability to predict the test set.

The researchers found that all the models were better than the trained panel of human experts at predicting the rating a beer had received from RateBeer.

Through these models, the researchers were able to pinpoint specific compounds that contribute to consumer appreciation of a beer: people were more likely to rate a beer highly if it contained these specific compounds. For example, the models predicted that adding lactic acid, which is present in tart-tasting sour beers, could improve other kinds of beers by making them taste fresher.

“We had the models analyze these beers and then asked them ‘How can we make these beers better?’” says Kevin Verstrepen, a professor at KU Leuven and director of the VIB-KU Leuven Center for Microbiology, who worked on the project. “Then we went in and actually made those changes to the beers by adding flavor compounds. And lo and behold—once we did blind tastings, the beers became better, and more generally appreciated.”

One exciting application of the research is that it could be used to make better alcohol-free beers—a major challenge for the beverage industry, he says. The researchers used the model’s predictions to add a mixture of compounds to a nonalcoholic beer that human tasters rated significantly higher in terms of body and sweetness than its previous incarnation.

This type of machine-learning approach could also be enormously useful in exploring food texture and nutrition and adapting ingredients to suit different populations, says Carolyn Ross, a professor of food science at Washington State University, who was not involved in the research. For example, older people tend to find complex combinations of textures or ingredients less appealing, she says. 

“There’s so much that we can explore there, especially when we’re looking at different populations and trying to come up with specific products for them,” she says.

Apple researchers explore dropping “Siri” phrase & listening with AI instead

Researchers from Apple are probing whether it’s possible to use artificial intelligence to detect when a user is speaking to a device like an iPhone, thereby eliminating the technical need for a trigger phrase like “Siri,” according to a paper published on Friday.

In a study, which was uploaded to Arxiv and has not been peer-reviewed, researchers trained a large language model using both speech captured by smartphones as well as acoustic data from background noise to look for patterns that could indicate when they want help from the device. The model was built in part with a version of OpenAI’s GPT-2, “since it is relatively lightweight and can potentially run on devices such as smartphones,” the researchers wrote. The paper describes over 129 hours of data and additional text data used to train the model, but did not specify the source of the recordings that went into the training set. Six of the seven authors list their affiliation as Apple, and three of them work on the company’s Siri team according to their LinkedIn profiles. (The seventh author did work related to the paper during an Apple internship.)

The results were promising, according to the paper. The model was able to make more accurate predictions than audio-only or text-only models, and improved further as the size of the models grew larger. Beyond exploring the research question, it’s unclear if Apple plans to eliminate the “Hey Siri” trigger phrase.

Neither Apple, nor the paper’s researchers immediately returned requests for comment.

Currently, Siri functions by holding small amounts of audio and does not begin recording or preparing to answer user prompts until it hears the trigger phrase. Eliminating that “Hey Siri” prompt could increase concerns about our devices “always listening”, said Jen King, a privacy and data policy fellow at the Stanford Institute for Human-Centered Artificial Intelligence. 

The way Apple handles audio data has previously come under scrutiny by privacy advocates. In 2019, reporting from The Guardian revealed that Apple’s quality control contractors regularly heard private audio collected from iPhones while they worked with Siri data, including sensitive conversations between doctors and patients. Two years later, Apple responded with policy changes, including storing more data on devices and allowing users to opt-out of allowing their recordings to be used to improve Siri. A class action suit was brought against the company in California in 2021 that alleged Siri is being turned on even when not activated.  

The “Hey Siri” prompt can serve an important purpose for users, according to King. The phrases provide a way to know when the device is listening, and getting rid of that might mean more convenience, but less transparency from the device, King told MIT Technology Review. The research did not detail if the trigger phrase would be replaced by any other signal that the AI assistant is engaged. 

“I’m skeptical that a company should mandate that form of interaction,” King says.

The paper is one of a number of recent signals that Apple, which is perceived to be lagging behind other tech giants like Amazon, Google, and Facebook in the artificial intelligence race, is planning to incorporate more AI into its products. According to news first reported by VentureBeat, Apple is building a generative AI model called MM1 that can work in text and images, which would be the company’s answer to Open AI’s ChatGPT and a host of other chatbots by leading tech giants. Meanwhile, Bloomberg reported that Apple is in talks with Google about using the company’s AI model Gemini in iPhones, and on Friday the Wall Street Journal reported that it had engaged in talks with Baidu about using that company’s AI products.

Google DeepMind’s new AI assistant helps elite soccer coaches get even better

Soccer teams are always looking to get an edge over their rivals. Whether it’s studying players’ susceptibility to injury, or opponents’ tactics—top clubs look at reams of data to give them the best shot of winning. 

They might want to add a new AI assistant developed by Google DeepMind to their arsenal. It can suggest tactics for soccer set-pieces that are even better than those created by professional club coaches. 

The system, called TacticAI, works by analyzing a dataset of 7,176 corner kicks taken by players for Liverpool FC, one of the biggest soccer clubs in the world. 

Corner kicks are awarded to an attacking team when the ball passes over the goal line after touching a player on the defending team. In a sport as free-flowing and unpredictable as soccer, corners—like free kicks and penalties—are rare instances in the game when teams can try out pre-planned plays.

TacticAI uses predictive and generative AI models to convert each corner kick scenario—such as a receiver successfully scoring a goal, or a rival defender intercepting the ball and returning it to their team—into a graph, and the data from each player into a node on the graph, before modeling the interactions between each node. The work was published in Nature Communications today.

Using this data, the model provides recommendations about where to position players during a corner to give them, for example, the best shot at scoring a goal, or the best combination of players to get up front. It can also try to predict the outcomes of a corner, including whether a shot will take place, or which player is most likely to touch the ball first.

The main benefit is that the AI assistant reduces the workload of the coaches, says Ondřej Hubáček, an analyst at the sports data firm Ematiq who specializes in predictive models, and who did not work on the project. “An AI system can go through the data quickly and point out errors a team is making—I think that’s the added value you can get from AI assistants,” he says. 

To assess TacticAI’s suggestions, GoogleDeepMind presented them to five football experts: three data scientists, one video analyst, and one coaching assistant, all of whom work at Liverpool FC. Not only did these experts struggle to distinguish’s TacticAI’s suggestions from real game play scenarios, they also favored the system’s strategies over existing tactics 90% of the time.

These findings suggest that TacticAI’s strategies could be useful for human coaches in real-life games, says Petar Veličković, a staff research scientist at GoogleDeepMind who worked on the project. “Top clubs are always searching for an edge, and I think our results indicate that techniques like these are likely going to become a part of modern football going forward,” he says.

TacticAI’s powers of prediction aren’t just limited to corner kicks either—the same method could be easily applied to other set pieces, general play throughout a match, or even other sports entirely, such as American football, hockey, or basketball, says Veličković.

“As long as there’s a team-based sport where you believe that modeling relationships between players will be useful and you have a source of data, it’s applicable,” he says.

How AI taught Cassie the two-legged robot to run and jump

If you’ve watched Boston Dynamics’ slick videos of robots running, jumping and doing parkour, you might have the impression robots have learned to be amazingly agile. In fact, these robots are still coded by hand, and would struggle to deal with new obstacles they haven’t encountered before.

However, a new method of teaching robots to move could help to deal with new scenarios, through trial and error—just as humans learn and adapt to unpredictable events.  

Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie to run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. Reinforcement learning works by rewarding or penalizing an AI as it tries to carry out an objective. In this case, the approach taught the robot to generalize and respond in new scenarios, instead of freezing like its predecessors may have done. 

“We wanted to push the limits of robot agility,” says Zhongyu Li, a PhD student at University of California, Berkeley, who worked on the project, which has not yet been peer-reviewed. “The high-level goal was to teach the robot to learn how to do all kinds of dynamic motions the way a human does.”

The team used a simulation to train Cassie, an approach that dramatically speeds up the time it takes it to learn—from years to weeks—and enables the robot to perform those same skills in the real world without further fine-tuning.

Firstly, they trained the neural network that controlled Cassie to master a simple skill from scratch, such as jumping on the spot, walking forward, or running forward without toppling over. It was taught by being encouraged to mimic motions it was shown, which included motion capture data collected from a human and animations demonstrating the desired movement.

After the first stage was complete, the team presented the model with new commands encouraging the robot to perform tasks using its new movement skills. Once it became proficient at performing the new tasks in a simulated environment, they then diversified the tasks it had been trained on through a method called task randomization. 

This makes the robot much more prepared for unexpected scenarios. For example, the robot was able to maintain a steady running gait while being pulled sideways by a leash. “We allowed the robot to utilize the history of what it’s observed and adapt quickly to the real world,” says Li.

Cassie completed a 400-meter run in two minutes and 34 seconds, then jumped 1.4 meters in the long jump without needing additional training.

The researchers are now planning on studying how this kind of technique could be used to train robots equipped with on-board cameras. This will be more challenging than completing actions blind, adds Alan Fern, a professor of computer science at Oregon State University who helped to develop the Cassie robot but was not involved with this project.

“The next major step for the field is humanoid robots that do real work, plan out activities, and actually interact with the physical world in ways that are not just interactions between feet and the ground,” he says.

Africa’s push to regulate AI starts now        

In the Zanzibar archipelago of Tanzania, rural farmers are using an AI-assisted app called Nuru that works in their native language of Swahili to detect a devastating cassava disease before it spreads. In South Africa, computer scientists have built machine learning models to analyze the impact of racial segregation in housing. And in Nairobi, Kenya, AI classifies images from thousands of surveillance cameras perched on lampposts in the bustling city’s center. 

The projected benefit of AI adoption on Africa’s economy is tantalizing. Estimates suggest that four African countries alone—Nigeria, Ghana, Kenya, and South Africa—could rake in up to $136 billion worth of economic benefits by 2030 if businesses there begin using more AI tools.

Now, the African Union—made up of 55 member nations—is preparing an ambitious AI policy that envisions an Africa-centric path for the development and regulation of this emerging technology. But debates on when AI regulation is warranted and concerns about stifling innovation could pose a roadblock, while a lack of AI infrastructure could hold back the technology’s adoption.  

“We’re seeing a growth of AI in the continent;  it’s really important there be set rules in place to govern these technologies,” says Chinasa T. Okolo, a fellow in the Center for Technology Innovation at Brookings, whose research focuses on AI governance and policy development in Africa.

Some African countries have already begun to formulate their own legal and policy frameworks for AI. Seven have developed national AI policies and strategies, which are currently at different stages of implementation. 

On February 29, the African Union Development Agency published a policy draft that lays out a blueprint of AI regulations for African nations. The draft includes recommendations for industry-specific codes and practices, standards and certification bodies to assess and benchmark AI systems, regulatory sandboxes for safe testing of AI, and the establishment of national AI councils to oversee and monitor responsible deployment of AI. 

The heads of African governments are expected to eventually endorse the continental AI strategy, but not until February 2025, when they meet next at the AU’s annual summit in Addis Ababa, Ethiopia. Countries with no existing AI policies or regulations would then use this framework to develop their own national strategies, while those that already have will be encouraged to review and align their policies with the AU’s.

Elsewhere, major AI laws and policies are also taking shape. This week, the European Union passed the AI Act, set to become the world’s first comprehensive AI law. In October, the United States issued an executive order on AI. And the Chinese government is eyeing a sweeping AI law similar to the EU’s, while also setting rules that target specific AI products as they’re developed. 

If African countries don’t develop their own regulatory frameworks that protect citizens from the technology’s misuse, some experts worry that Africans will face social harms, including bias that could exacerbate inequalities. And if these countries don’t also find a way to harness AI’s benefits, others fear these economies could be left behind. 

“We want to be standard makers”

Some African researchers think it’s too early to be thinking about AI regulation. The industry is still nascent there due to the high cost of building data infrastructure, limited internet access, a lack of funding, and a dearth of powerful computers needed to train AI models. A lack of access to quality training data is also a problem. African data is largely concentrated in the hands of companies outside of Africa.

In February, just before the AU’s AI policy draft came out, Shikoh Gitau, a computer scientist who started the Nairobi-based AI research lab Qubit Hub, published a paper arguing that Africa should prioritize the development of an AI industry before trying to regulate the technology. 

“If we start by regulating, we’re not going to figure out the innovations and opportunities that exist for Africa,” says David Lemayian, a software engineer and one of the paper’s co-authors.  

Okolo, who consulted on the AU-AI draft policy, disagrees. Africa should be proactive in developing regulations, Okolo says. She suggests African countries reform existing laws such as policies on data privacy and digital governance to address AI. 

But Gitau is concerned that a hasty approach to regulating AI could hinder adoption of the technology. And she says it’s critical to build homegrown AI with applications tailored for Africans to harness the power of AI to improve economic growth. 

“Before we put regulations [in place], we need to do the hard work of understanding the full spectrum of the technology and invest in building the African AI ecosystem,” she says.

More than 50 countries and the EU have AI strategies in place, and more than 700 AI policy initiatives have been implemented since 2017, according to the Organisation for Economic Co-operation and Development’s AI Policy Observatory. But only five of those initiatives are from Africa and none of the OECD’s 38 member countries are African.

Africa’s voices and perspectives have largely been absent from global discussions on AI governance and regulation, says Melody Musoni, a policy and digital governance expert at ECDPM, an independent-policy think tank in Brussels.   

“We must contribute our perspectives and own our regulatory frameworks,” says Musoni. “We want to be standard makers, not standard takers.” 

Nyalleng Moorosi, a specialist in ethics and fairness in machine learning who is based in Hlotse, Lesotho and works at the Distributed AI Research Institute, says that some African countries are already seeing labor exploitation by AI companies. This includes poor wages and lack of psychological support for data labelers, who are largely from low-income countries but working for big tech companies. She argues regulation is needed to prevent that, and to protect communities against misuse by both large corporations and authoritarian governments. 

In Libya, autonomous lethal weapons systems have already been used in fighting, and in Zimbabwe, a controversial, military-driven national facial-recognition scheme has raised concerns over the technology’s alleged use as a surveillance tool by the government. The draft AU-AI policy didn’t explicitly address the use of AI by African governments for national security interests, but it acknowledges that there could be perilous AI risks. 

Barbara Glover, program officer for an African Union group that works on policies for emerging technologies, points out that the policy draft recommends that African countries invest in digital and data infrastructure, and collaborate with the private sector to build investment funds to support AI startups and innovation hubs on the continent. 

Unlike the EU, the AU lacks the power to enforce sweeping policies and laws across its member states. Even if the draft AI strategy wins endorsement of parliamentarians at the AU’s assembly next February, African nations must then implement the continental strategy through national AI policies and laws.

Meanwhile, tools powered by machine learning will continue to be deployed, raising ethical questions and regulatory needs and posing a challenge for policymakers across the continent. 

Moorosi says Africa must develop a model for local AI regulation and governance which balances the localized risks and rewards. “If it works with people and works for people, then it has to be regulated,” she says.             

This self-driving startup is using generative AI to predict traffic

Self-driving company Waabi is using a generative AI model to help predict the movement of vehicles, it announced today.

The new system, called Copilot4D, was trained on troves of data from lidar sensors, which use light to sense how far away objects are. If you prompt the model with a situation, like a driver recklessly merging onto a highway at high speed, it predicts how the surrounding vehicles will move, then generates a lidar representation of 5 to 10 seconds into the future (showing a pileup, perhaps). Today’s announcement is about the initial version of Copilot4D, but Waabi CEO Raquel Urtasun says a more advanced and interpretable version is deployed in Waabi’s testing fleet of autonomous trucks in Texas that helps the driving software decide how to react. 

While autonomous driving has long relied on machine learning to plan routes and detect objects, some companies and researchers are now betting that generative AI — models that take in data of their surroundings and generate predictions — will help bring autonomy to the next stage. Wayve, a Waabi competitor, released a comparable model last year that is trained on the video that its vehicles collect. 

Waabi’s model works in a similar way to image or video generators like OpenAI’s DALL-E and Sora. It takes point clouds of lidar data, which visualize a 3D map of the car’s surroundings, and breaks them into chunks, similar to how image generators break photos into pixels. Based on its training data, Copilot4D then predicts how all points of lidar data will move. Doing this continuously allows it to generate predictions 5-10 seconds into the future.

A diptych view of the same image via camera and LiDAR.

Waabi is one of a handful of autonomous driving companies, including competitors Wayve and Ghost, that describe their approach as “AI-first.” To Urtasun, that means designing a system that learns from data, rather than one that must be taught reactions to specific situations. The cohort is betting their methods might require fewer hours of road-testing self-driving cars, a charged topic following an October 2023 accident where a Cruise robotaxi dragged a pedestrian in San Francisco. 

Waabi is different from its competitors in building a generative model for lidar, rather than cameras. 

“If you want to be a Level 4 player, lidar is a must,” says Urtasun, referring to the automation level where the car does not require the attention of a human to drive safely. Cameras do a good job of showing what the car is seeing, but they’re not as adept at measuring distances or understanding the geometry of the car’s surroundings, she says.

Though Waabi’s model can generate videos showing what a car will see through its lidar sensors, those videos will not be used as training in the company’s driving simulator that it uses to build and test its driving model. That’s to ensure any hallucinations arising from Copilot4D do not get taught in the simulator.

The underlying technology is not new, says Bernard Adam Lange, a PhD student at Stanford who has built and researched similar models, but it’s the first time he’s seen a generative lidar model leave the confines of a research lab and be scaled up for commercial use. A model like this would generally help make the “brain” of any autonomous vehicle able to reason more quickly and accurately, he says.

“It is the scale that is transformative,” he says. “The hope is that these models can be utilized in downstream tasks” like detecting objects and predicting where people or things might move next.

Copilot4D can only estimate so far into the future, and motion prediction models in general degrade the farther they’re asked to project forward. Urtasun says that the model only needs to imagine what happens 5 to 10  seconds ahead for the majority of driving decisions, though the benchmark tests highlighted by Waabi are based on 3-second predictions. Chris Gerdes, co-director of Stanford’s Center for Automotive Research, says this metric will be key in determining how useful the model is at making decisions.

“If the 5-second predictions are solid but the 10-second predictions are just barely usable, there are a number of situations where this would not be sufficient on the road,” he says.

The new model resurfaces a question rippling through the world of generative AI: whether or not to make models open-source. Releasing Copilot4D would let academic researchers, who struggle with access to large data sets, peek under the hood at how it’s made, independently evaluate safety, and potentially advance the field. It would also do the same for Waabi’s competitors. Waabi has published a paper detailing the creation of the model but has not released the code, and Urtasun is unsure if they will. 

“We want academia to also have a say in the future of self-driving,” she says, adding that open-source models are more trusted. “But we also need to be a bit careful as we develop our technology so that we don’t unveil everything to our competitors.”

An AI that can play Goat Simulator is a step toward more useful machines

Fly, goat, fly! A new AI agent from Google DeepMind can play different games, including ones it has never seen before such as Goat Simulator 3, a fun action game with exaggerated physics. Researchers were able to get it to follow text commands to play seven different games and move around in three different 3D research environments. It’s a step toward more generalized AI that can transfer skills across multiple environments.  

Google DeepMind has had huge success developing game-playing AI systems. Its system AlphaGo, which beat top professional player Lee Sedol at the game Go in 2016, was a major milestone that showed the power of deep learning. But unlike earlier game-playing AI systems, which mastered only one game or could only follow single goals or commands, this new agent is able to play a variety of different games, including Valheim and No Man’s Sky. It’s called SIMA, an acronym for “scalable, instructable, multiworld agent.”

In training AI systems, games are a good proxy for real-world tasks. “A general game-playing agent could, in principle, learn a lot more about how to navigate our world than anything in a single environment ever could,” says Michael Bernstein, an associate professor of computer science at Stanford University, who was not part of the research. 

“One could imagine one day rather than having superhuman agents which you play against, we could have agents like SIMA playing alongside you in games with you and with your friends,” says Tim Harley, a research engineer at Google DeepMind who was part of the team that developed the agent. 

The team trained SIMA on lots of examples of humans playing video games, both individually and collaboratively, alongside keyboard and mouse input and annotations of what the players did in the game, says Frederic Besse, a research engineer at Google DeepMind.  

Then they used an AI technique called imitation learning to teach the agent to play games as humans would. SIMA can follow 600 basic instructions, such as “Turn left,” “Climb the ladder,” and “Open the map,” each of which can be completed in less than approximately 10 seconds.

The team found that a SIMA agent that was trained on many games was better than an agent that learned how to play just one. This is because it was able to take advantage of concepts shared between games to learn better skills and get better at carrying out instructions, says Besse. 

“This is again a really exciting key property, as we have an agent that can play games it has never seen before, essentially,” he says. 

Seeing this sort of knowledge transfer between games is a significant milestone for AI research, says Paulo Rauber, a lecturer in artificial Intelligence at Queen Mary University of London. 

The basic idea of learning to execute instructions on the basis of examples provided by humans could lead to more powerful systems in the future, especially with bigger data sets, Rauber says. SIMA’s relatively limited data set is what is holding back its performance, he says. 

Although the number of game environments it’s been trained on is still small, SIMA is on the right track for scaling up, says Jim Fan, a senior research scientist at Nvidia who runs its  AI Agents Initiative. 

But the AI system is still not close to human level, says Harley. For example, in the game No Man’s Sky, the AI agent could do just 60% of the tasks humans could do. And when the researchers removed the ability for humans to give SIMA instructions, they found the agent performed much worse than before. 

Next, Besse says, the team is working on improving the agent’s performance. The researchers want to get it to work in as many environments as possible and learn new skills, and they want people to be able to chat with the agent and get a response. The team also wants SIMA to have more generalized skills, allowing it to quickly pick up games it has never seen before, much like a human. 

Humans “can generalize very well to unseen environments and unseen situations,” says Besse. “And we want our agents to be just the same.”  

SIMA inches us closer to a “ChatGPT moment” for autonomous agents, says Roy Fox, an assistant professor at the University of California, Irvine.  

But it is a long way away from actual autonomous AI. That would be “a whole different ball game,” he says.