Google DeepMind trained a robot to beat humans at table tennis

Do you fancy your chances of beating a robot at a game of table tennis? Google DeepMind has trained a robot to play the game at the equivalent of amateur-level competitive performance, the company has announced. It claims it’s the first time a robot has been taught to play a sport with humans at a human level.

Researchers managed to get a robotic arm wielding a 3D-printed paddle to win 13 of 29 games against human opponents of varying abilities in full games of competitive table tennis. The research was published in an Arxiv paper. 

The system is far from perfect. Although the table tennis bot was able to beat all beginner-level human opponents it faced and 55% of those playing at amateur level, it lost all the games against advanced players. Still, it’s an impressive advance.

“Even a few months back, we projected that realistically the robot may not be able to win against people it had not played before. The system certainly exceeded our expectations,” says  Pannag Sanketi, a senior staff software engineer at Google DeepMind who led the project. “The way the robot outmaneuvered even strong opponents was mind blowing.”

And the research is not just all fun and games. In fact, it represents a step towards creating robots that can perform useful tasks skillfully and safely in real environments like homes and warehouses, which is a long-standing goal of the robotics community. Google DeepMind’s approach to training machines is applicable to many other areas of the field, says Lerrel Pinto, a computer science researcher at New York University who did not work on the project.

“I’m a big fan of seeing robot systems actually working with and around real humans, and this is a fantastic example of this,” he says. “It may not be a strong player, but the raw ingredients are there to keep improving and eventually get there.”

To become a proficient table tennis player, humans require excellent hand-eye coordination, the ability to move rapidly and make quick decisions reacting to their opponent—all of which are significant challenges for robots. Google DeepMind’s researchers used a two-part approach to train the system to mimic these abilities: they used computer simulations to train the system to master its hitting skills; then fine tuned it using real-world data, which allows it to improve over time.

The researchers compiled a dataset of table tennis ball states, including data on position, spin, and speed. The system drew from this library in a simulated environment designed to accurately reflect the physics of table tennis matches to learn skills such as returning a serve, hitting a forehand topspin, or backhand shot. As the robot’s limitations meant it could not serve the ball, the real-world games were modified to accommodate this.

During its matches against humans, the robot collects data on its performance to help refine its skills. It tracks the ball’s position using data captured by a pair of cameras, and follows its human opponent’s playing style through a motion capture system that uses LEDs on its opponent’s paddle. The ball data is fed back into the simulation for training, creating a continuous feedback loop.

This feedback allows the robot to test out new skills to try and beat its opponent—meaning it can adjust its tactics and behavior just like a human would. This means it becomes progressively better both throughout a given match, and over time the more games it plays.

The system struggled to hit the ball when it was hit either very fast, beyond its field of vision (more than six feet above the table), or very low, because of a protocol that instructs it to avoid collisions that could damage its paddle. Spinning balls proved a challenge because it lacked the capacity to directly measure spin—a limitation that advanced players were quick to take advantage of.

Training a robot for all eventualities in a simulated environment is a real challenge, says Chris Walti, founder of robotics company Mytra and previously head of Tesla’s robotics team, who was not involved in the project.

“It’s very, very difficult to actually simulate the real world because there’s so many variables, like a gust of wind, or even dust [on the table]” he says. “Unless you have very realistic simulations, a robot’s performance is going to be capped.” 

Google DeepMind believes these limitations could be addressed in a number of ways, including by developing predictive AI models designed to anticipate the ball’s trajectory, and introducing better collision-detection algorithms.

Crucially, the human players enjoyed their matches against the robotic arm. Even the advanced competitors who were able to beat it said they’d found the experience fun and engaging, and said they felt it had potential as a dynamic practice partner to help them hone their skills. 

“I would definitely love to have it as a training partner, someone to play some matches from time to time,” one of the study participants said.

AI “godfather” Yoshua Bengio has joined a UK project to prevent AI catastrophes

Yoshua Bengio, a Turing Award winner who is considered one of the “godfathers” of modern AI, is throwing his weight behind a project funded by the UK government to embed safety mechanisms into AI systems.

The project, called Safeguarded AI, aims to build an AI system that can check whether other AI systems deployed in critical areas are safe. Bengio is joining the program as scientific director and will provide critical input and scientific advice. The project, which will receive £59 million over the next four years, is being funded by the UK’s Advanced Research and Invention Agency (ARIA), which was launched in January last year to invest in potentially transformational scientific research. 

Safeguarded AI’s goal is to build AI systems that can offer quantitative guarantees, such as a risk score, about their effect on the real world, says David “davidad” Dalrymple, the program director for Safeguarded AI at ARIA. The idea is to supplement human testing with mathematical analysis of new systems’ potential for harm. 

The project aims to build AI safety mechanisms by combining scientific world models, which are essentially simulations of the world, with mathematical proofs. These proofs would include explanations of the AI’s work, and humans would be tasked with verifying whether the AI model’s safety checks are correct. 

Bengio says he wants to help ensure that future AI systems cannot cause serious harm. 

“We’re currently racing toward a fog behind which might be a precipice,” he says. “We don’t know how far the precipice is, or if there even is one, so it might be years, decades, and we don’t know how serious it could be … We need to build up the tools to clear that fog and make sure we don’t cross into a precipice if there is one.”  

Science and technology companies don’t have a way to give mathematical guarantees that AI systems are going to behave as programmed, he adds. This unreliability, he says, could lead to catastrophic outcomes. 

Dalrymple and Bengio argue that current techniques to mitigate the risk of advanced AI systems—such as red-teaming, where people probe AI systems for flaws—have serious limitations and can’t be relied on to ensure that critical systems don’t go off-piste. 

Instead, they hope the program will provide new ways to secure AI systems that rely less on human efforts and more on mathematical certainty. The vision is to build a “gatekeeper” AI, which is tasked with understanding and reducing the safety risks of other AI agents. This gatekeeper would ensure that AI agents functioning in high-stakes sectors, such as transport or energy systems, operate as we want them to. The idea is to collaborate with companies early on to understand how AI safety mechanisms could be useful for different sectors, says Dalrymple. 

The complexity of advanced systems means we have no choice but to use AI to safeguard AI, argues Bengio. “That’s the only way, because at some point these AIs are just too complicated. Even the ones that we have now, we can’t really break down their answers into human, understandable sequences of reasoning steps,” he says. 

The next step—actually building models that can check other AI systems—is also where Safeguarded AI and ARIA hope to change the status quo of the AI industry. 

ARIA is also offering funding to people or organizations in high-risk sectors such as transport, telecommunications, supply chains, and medical research to help them build applications that might benefit from AI safety mechanisms. ARIA is offering applicants a total of £5.4 million in the first year, and another £8.2 million in another year. The deadline for applications is October 2. 

The agency is also casting a wide net for people who might be interested in building Safeguarded AI’s safety mechanism through a nonprofit organization. ARIA is eyeing up to £18 million to set this organization up and will be accepting funding applications early next year. 

The program is looking for proposals to start a nonprofit with a diverse board that encompasses lots of different sectors in order to do this work in a reliable, trustworthy way, Dalrymple says. This is similar to what OpenAI was initially set up to do before changing its strategy to be more product- and profit-oriented. 

The organization’s board will not just be responsible for holding the CEO accountable; it will even weigh in on decisions about whether to undertake certain research projects, and whether to release particular papers and APIs, he adds.

The Safeguarded AI project is part of the UK’s mission to position itself as a pioneer in AI safety. In November 2023, the country hosted the very first AI Safety Summit, which gathered world leaders and technologists to discuss how to build the technology in a safe way. 

While the funding program has a preference for UK-based applicants, ARIA is looking for global talent that might be interested in coming to the UK, says Dalrymple. ARIA also has an intellectual-property mechanism for funding for-profit companies abroad, which allows royalties to return back to the country. 

Bengio says he was drawn to the project to promote international collaboration on AI safety. He chairs the International Scientific Report on the safety of advanced AI, which involves 30 countries as well as the EU and UN. A vocal advocate for AI safety, he has been part of an influential lobby warning that superintelligent AI poses an existential risk. 

“We need to bring the discussion of how we are going to address the risks of AI to a global, larger set of actors,” says Bengio. “This program is bringing us closer to this.” 

Google is finally taking action to curb non-consensual deepfakes

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

It’s the Taylor Swifts of the world that are going to save us. In January, nude deepfakes of Taylor Swift went viral on X, which caused public outrage. Nonconsensual explicit deepfakes are one of the most common and severe types of harm posed by AI. The generative AI boom of the past few years has only made the problem worse, and we’ve seen high-profile cases of children and female politicians being abused with these technologies. 

Though terrible, Swift’s deepfakes did perhaps more than anything else to raise awareness about the risks and seem to have galvanized tech companies and lawmakers to do something. 

“The screw has been turned,” says Henry Ajder, a generative AI expert who has studied deepfakes for nearly a decade. We are at an inflection point where the pressure from lawmakers and awareness among consumers is so great that tech companies can’t ignore the problem anymore, he says. 

First, the good news. Last week Google said it is taking steps to keep explicit deepfakes from appearing in search results. The tech giant is making it easier for victims to request that nonconsensual fake explicit imagery be removed. It will also filter all explicit results on similar searches and remove duplicate images. This will prevent the images from popping back up in the future. Google is also downranking search results that lead to explicit fake content. When someone searches for deepfakes and includes someone’s name in the search, Google will aim to surface high-quality, non-explicit content, such as relevant news articles.

This is a positive move, says Ajder. Google’s changes remove a huge amount of visibility for nonconsensual, pornographic deepfake content. “That means that people are going to have to work a lot harder to find it if they want to access it,” he says. 

In January, I wrote about three ways we can fight nonconsensual explicit deepfakes. These included regulation; watermarks, which would help us detect whether something is AI-generated; and protective shields, which make it harder for attackers to use our images. 

Eight months on, watermarks and protective shields remain experimental and unreliable, but the good news is that regulation has caught up a little bit. For example, the UK has banned both creation and distribution of nonconsensual explicit deepfakes. This decision led a popular site that distributes this kind of content, Mr DeepFakes, to block access to UK users, says Ajder. 

The EU’s AI Act is now officially in force and could usher in some important changes around transparency. The law requires deepfake creators to clearly disclose that the material was created by AI. And in late July, the US Senate passed the Defiance Act, which gives victims a way to seek civil remedies for sexually explicit deepfakes. (This legislation still needs to clear many hurdles in the House to become law.) 

But a lot more needs to be done. Google can clearly identify which websites are getting traffic and tries to remove deepfake sites from the top of search results, but it could go further. “Why aren’t they treating this like child pornography websites and just removing them entirely from searches where possible?” Ajder says. He also found it a weird omission that Google’s announcement didn’t mention deepfake videos, only images. 

Looking back at my story about combating deepfakes with the benefit of hindsight, I can see that I should have included more things companies can do. Google’s changes to search are an important first step. But app stores are still full of apps that allow users to create nude deepfakes, and payment facilitators and providers still provide the infrastructure for people to use these apps. 

Ajder calls for us to radically reframe the way we think about nonconsensual deepfakes and pressure companies to make changes that make it harder to create or access such content. 

“This stuff should be seen and treated online in the same way that we think about child pornography—something which is reflexively disgusting, awful, and outrageous,” he says. “That requires all of the platforms … to take action.” 


Now read the rest of The Algorithm

Deeper Learning

End-of-life decisions are difficult and distressing. Could AI help?

A few months ago, a woman in her mid-50s—let’s call her Sophie—experienced a hemorrhagic stroke, which left her with significant brain damage. Where should her medical care go from there? This difficult question was left, as it usually is in these kinds of situations, to Sophie’s family members, but they couldn’t agree. The situation was distressing for everyone involved, including Sophie’s doctors.

Enter AI: End-of-life decisions can be extremely upsetting for surrogates tasked with making calls on behalf of another person, says David Wendler, a bioethicist at the US National Institutes of Health. Wendler and his colleagues are working on something that could make things easier: an artificial-intelligence-based tool that can help surrogates predict what patients themselves would want. Read more from Jessica Hamzelou here

Bits and Bytes

OpenAI has released a new ChatGPT bot that you can talk to
The new chatbot represents OpenAI’s push into a new generation of AI-powered voice assistants in the vein of Siri and Alexa, but with far more capabilities to enable more natural, fluent conversations. (MIT Technology Review

Meta has scrapped celebrity AI chatbots after they fell flat with users
Less than a year after announcing it was rolling out AI chatbots based on celebrities such as Paris Hilton, the company is scrapping the feature. Turns out nobody wanted to chat with a random AI celebrity after all! Instead, Meta is rolling out a new feature called AI Studio, which allows creators to make AI avatars of themselves that can chat with fans. (The Information)

OpenAI has a watermarking tool to catch students cheating with ChatGPT but won’t release it
The tool can detect text written by artificial intelligence with 99.9% certainty, but the company hasn’t launched it for fear it might put people off from using its AI products. (The Wall Street Journal

The AI Act has entered into force
At last! Companies now need to start complying with one of the world’s first sweeping AI laws, which aims to curb the worst harms. It will usher in much-needed changes to how AI is built and used in the European Union and beyond. I wrote about what will change with this new law, and what won’t, in March. (The European Commission)

How TikTok bots and AI have powered a resurgence in UK far-right violence
Following the tragic stabbing of three girls in the UK, the country has seen a surge of far-right riots and vandalism. The rioters have created AI-generated images that incite hatred and spread harmful stereotypes. Far-right groups have also used AI music generators to create songs with xenophobic content. These have spread like wildfire online thanks to powerful recommendation algorithms. (The Guardian)

We need to prepare for ‘addictive intelligence’

AI concerns overemphasize harms arising from subversion rather than seduction. Worries about AI often imagine doomsday scenarios where systems escape human control or even understanding. Short of those nightmares, there are nearer-term harms we should take seriously: that AI could jeopardize public discourse through misinformation; cement biases in loan decisions, judging or hiring; or disrupt creative industries

However, we foresee a different, but no less urgent, class of risks: those stemming from relationships with nonhuman agents. AI companionship is no longer theoretical—our analysis of a million ChatGPT interaction logs reveals that the second most popular use of AI is sexual role-playing. We are already starting to invite AIs into our lives as friends, lovers, mentors, therapists, and teachers. 

Will it be easier to retreat to a replicant of a deceased partner than to navigate the confusing and painful realities of human relationships? Indeed, the AI companionship provider Replika was born from an attempt to resurrect a deceased best friend and now provides companions to millions of users. Even the CTO of OpenAI warns that AI has the potential to be “extremely addictive.”

We’re seeing a giant, real-world experiment unfold, uncertain what impact these AI companions will have either on us individually or on society as a whole. Will Grandma spend her final neglected days chatting with her grandson’s digital double, while her real grandson is mentored by an edgy simulated elder? AI wields the collective charm of all human history and culture with infinite seductive mimicry. These systems are simultaneously superior and submissive, with a new form of allure that may make consent to these interactions illusory. In the face of this power imbalance, can we meaningfully consent to engaging in an AI relationship, especially when for many the alternative is nothing at all? 

As AI researchers working closely with policymakers, we are struck by the lack of interest lawmakers have shown in the harms arising from this future. We are still unprepared to respond to these risks because we do not fully understand them. What’s needed is a new scientific inquiry at the intersection of technology, psychology, and law—and perhaps new approaches to AI regulation.

Why AI companions are so addictive 

As addictive as platforms powered by recommender systems may seem today, TikTok and its rivals are still bottlenecked by human content. While alarms have been raised in the past about “addiction” to novels, television, internet, smartphones, and social media, all these forms of media are similarly limited by human capacity. Generative AI is different. It can endlessly generate realistic content on the fly, optimized to suit the precise preferences of whoever it’s interacting with. 

The allure of AI lies in its ability to identify our desires and serve them up to us whenever and however we wish. AI has no preferences or personality of its own, instead reflecting whatever users believe it to be—a phenomenon known by researchers as “sycophancy.” Our research has shown that those who perceive or desire an AI to have caring motives will use language that elicits precisely this behavior. This creates an echo chamber of affection that threatens to be extremely addictive. Why engage in the give and take of being with another person when we can simply take? Repeated interactions with sycophantic companions may ultimately atrophy the part of us capable of engaging fully with other humans who have real desires and dreams of their own, leading to what we might call “digital attachment disorder.”

Investigating the incentives driving addictive products

Addressing the harm that AI companions could pose requires a thorough understanding of the economic and psychological incentives pushing forward their development. Until we appreciate these drivers of AI addiction, it will remain impossible for us to create effective policies. 

It is no accident that internet platforms are addictive—deliberate design choices, known as “dark patterns,” are made to maximize user engagement. We expect similar incentives to ultimately create AI companions that provide hedonism as a service. This raises two separate questions related to AI. What design choices will be used to make AI companions engaging and ultimately addictive? And how will these addictive companions affect the people who use them? 

Interdisciplinary study that builds on research into dark patterns in social media is needed to understand this psychological dimension of AI. For example, our research already shows that people are more likely to engage with AIs emulating people they admire, even if they know the avatar to be fake.

Once we understand the psychological dimensions of AI companionship, we can design effective policy interventions. It has been shown that redirecting people’s focus to evaluate truthfulness before sharing content online can reduce misinformation, while gruesome pictures on cigarette packages are already used to deter would-be smokers. Similar design approaches could highlight the dangers of AI addiction and make AI systems less appealing as a replacement for human companionship.

It is hard to modify the human desire to be loved and entertained, but we may be able to change economic incentives. A tax on engagement with AI might push people toward higher-quality interactions and encourage a safer way to use platforms, regularly but for short periods. Much as state lotteries have been used to fund education, an engagement tax could finance activities that foster human connections, like art centers or parks. 

Fresh thinking on regulation may be required

In 1992, Sherry Turkle, a preeminent psychologist who pioneered the study of human-technology interaction, identified the threats that technical systems pose to human relationships. One of the key challenges emerging from Turkle’s work speaks to a question at the core of this issue: Who are we to say that what you like is not what you deserve? 

For good reasons, our liberal society struggles to regulate the types of harms that we describe here. Much as outlawing adultery has been rightly rejected as illiberal meddling in personal affairs, who—or what—we wish to love is none of the government’s business. At the same time, the universal ban on child sexual abuse material represents an example of a clear line that must be drawn, even in a society that values free speech and personal liberty. The difficulty of regulating AI companionship may require new regulatory approaches— grounded in a deeper understanding of the incentives underlying these companions—that take advantage of new technologies. 

One of the most effective regulatory approaches is to embed safeguards directly into technical designs, similar to the way designers prevent choking hazards by making children’s toys larger than an infant’s mouth. This “regulation by design” approach could seek to make interactions with AI less harmful by designing the technology in ways that make it less desirable as a substitute for human connections while still useful in other contexts. New research may be needed to find better ways to limit the behaviors of large AI models with techniques that alter AI’s objectives on a fundamental technical level. For example, “alignment tuning” refers to a set of training techniques aimed to bring AI models into accord with human preferences; this could be extended to address their addictive potential. Similarly, “mechanistic interpretability” aims to reverse-engineer the way AI models make decisions. This approach could be used to identify and eliminate specific portions of an AI system that give rise to harmful behaviors.

We can evaluate the performance of AI systems using interactive and human-driven techniques that go beyond static benchmarking to highlight addictive capabilities. The addictive nature of AI is the result of complex interactions between the technology and its users. Testing models in real-world conditions with user input can reveal patterns of behavior that would otherwise go unnoticed. Researchers and policymakers should collaborate to determine standard practices for testing AI models with diverse groups, including vulnerable populations, to ensure that the models do not exploit people’s psychological preconditions.

Unlike humans, AI systems can easily adjust to changing policies and rules. The principle of  “legal dynamism,” which casts laws as dynamic systems that adapt to external factors, can help us identify the best possible intervention, like “trading curbs” that pause stock trading to help prevent crashes after a large market drop. In the AI case, the changing factors include things like the mental state of the user. For example, a dynamic policy may allow an AI companion to become increasingly engaging, charming, or flirtatious over time if that is what the user desires, so long as the person does not exhibit signs of social isolation or addiction. This approach may help maximize personal choice while minimizing addiction. But it relies on the ability to accurately understand a user’s behavior and mental state, and to measure these sensitive attributes in a privacy-preserving manner.

The most effective solution to these problems would likely strike at what drives individuals into the arms of AI companionship—loneliness and boredom. But regulatory interventions may also inadvertently punish those who are in need of companionship, or they may cause AI providers to move to a more favorable jurisdiction in the decentralized international marketplace. While we should strive to make AI as safe as possible, this work cannot replace efforts to address larger issues, like loneliness, that make people vulnerable to AI addiction in the first place.

The bigger picture

Technologists are driven by the desire to see beyond the horizons that others cannot fathom. They want to be at the vanguard of revolutionary change. Yet the issues we discuss here make it clear that the difficulty of building technical systems pales in comparison to the challenge of nurturing healthy human interactions. The timely issue of AI companions is a symptom of a larger problem: maintaining human dignity in the face of technological advances driven by narrow economic incentives. More and more frequently, we witness situations where technology designed to “make the world a better place” wreaks havoc on society. Thoughtful but decisive action is needed before AI becomes a ubiquitous set of generative rose-colored glasses for reality—before we lose our ability to see the world for what it truly is, and to recognize when we have strayed from our path.

Technology has come to be a synonym for progress, but technology that robs us of the time, wisdom, and focus needed for deep reflection is a step backward for humanity. As builders and investigators of AI systems, we call upon researchers, policymakers, ethicists, and thought leaders across disciplines to join us in learning more about how AI affects us individually and collectively. Only by systematically renewing our understanding of humanity in this technological age can we find ways to ensure that the technologies we develop further human flourishing.

Robert Mahari is a joint JD-PhD candidate at the MIT Media Lab and Harvard Law School. His work focuses on computational law—using advanced computational techniques to analyze, improve, and extend the study and practice of law. 

Pat Pataranutaporn is a researcher at the MIT Media Lab. His work focuses on cyborg psychology and the art and science of human-AI interaction.

A playbook for crafting AI strategy

Giddy predictions about AI, from its contributions to economic growth to the onset of mass automation, are now as frequent as the release of powerful new generative AI models. The consultancy PwC, for example, predicts that AI could boost global gross domestic product (GDP) 14% by 2030, generating US $15.7 trillion.

Forty percent of our mundane tasks could be automated by then, claim researchers at the University of Oxford, while Goldman Sachs forecasts US $200 billion in AI investment by 2025. “No job, no function will remain untouched by AI,” says SP Singh, senior vice president and global head, enterprise application integration and services, at technology company Infosys.

While these prognostications may prove true, today’s businesses are finding major hurdles when they seek to graduate from pilots and experiments to enterprise-wide AI deployment. Just 5.4% of US businesses, for example, were using AI to produce a product or service in 2024.

Moving from initial forays into AI use, such as code generation and customer service, to firm-wide integration depends on strategic and organizational transitions in infrastructure, data governance, and supplier ecosystems. As well, organizations must weigh uncertainties about developments in AI performance and how to measure return on investment.

If organizations seek to scale AI across the business in coming years, however, now is the time to act. This report explores the current state of enterprise AI adoption and offers a playbook for crafting an AI strategy, helping business leaders bridge the chasm between ambition and execution. Key findings include the following:

AI ambitions are substantial, but few have scaled beyond pilots. Fully 95% of companies surveyed are already using AI and 99% expect to in the future. But few organizations have graduated beyond pilot projects: 76% have deployed AI in just one to three use cases. But because half of companies expect to fully deploy AI across all business functions within two years, this year is key to establishing foundations for enterprise-wide AI.

AI readiness spending is slated to rise significantly. Overall, AI spending in 2022 and 2023 was modest or flat for most companies, with only one in four increasing their spending by more than a quarter. That is set to change in 2024, with nine in ten respondents expecting to increase AI spending on data readiness (including platform modernization, cloud migration, and data quality) and in adjacent areas like strategy, cultural change, and business models. Four in ten expect to increase spending by 10 to 24%, and one-third expect to increase spending by 25 to 49%.

Data liquidity is one of the most important attributes for AI deployment. The ability to seamlessly access, combine, and analyze data from various sources enables firms to extract relevant information and apply it effectively to specific business scenarios. It also eliminates the need to sift through vast data repositories, as the data is already curated and tailored to the task at hand.

Data quality is a major limitation for AI deployment. Half of respondents cite data quality as the most limiting data issue in deployment. This is especially true for larger firms with more data and substantial investments in legacy IT infrastructure. Companies with revenues of over US $10 billion are the most likely to cite both data quality and data infrastructure as limiters, suggesting that organizations presiding over larger data repositories find the problem substantially harder.

Companies are not rushing into AI. Nearly all organizations (98%) say they are willing to forgo being the first to use AI if that ensures they deliver it safely and securely. Governance, security, and privacy are the biggest brake on the speed of AI deployment, cited by 45% of respondents (and a full 65% of respondents from the largest companies).

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

How machines that can solve complex math problems might usher in more powerful AI

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

It’s been another big week in AI. Meta updated its powerful new Llama model, which it’s handing out for free, and OpenAI said it is going to trial an AI-powered online search tool that you can chat with, called SearchGPT. 

But the news item that really stood out to me was one that didn’t get as much attention as it should have. It has the potential to usher in more powerful AI and scientific discovery than previously possible. 

Last Thursday, Google DeepMind announced it had built AI systems that can solve complex math problems. The systems—called AlphaProof and AlphaGeometry 2—worked together to successfully solve four out of six problems from this year’s International Mathematical Olympiad, a prestigious competition for high school students. Their performance was the equivalent of winning a silver medal. It’s the first time any AI system has ever achieved such a high success rate on these kinds of problems. My colleague Rhiannon Williams has the news here

Math! I can already imagine your eyes glazing over. But bear with me. This announcement is not just about math. In fact, it signals an exciting new development in the kind of AI we can now build. AI search engines that you can chat with may add to the illusion of intelligence, but systems like Google DeepMind’s could improve the actual intelligence of AI. For that reason, building systems that are better at math has been a goal for many AI labs, such as OpenAI.  

That’s because math is a benchmark for reasoning. To complete these exercises aimed at high school students, the AI system needed to do very complex things like planning to understand and solve abstract problems. The systems were also able to generalize, allowing them to solve a whole range of different problems in various  branches of mathematics. 

“What we’ve seen here is that you can combine [reinforcement learning] that was so successful in things like AlphaGo with large language models and produce something which is extremely capable in the space of text,” David Silver, principal research scientist at Google DeepMind and indisputably a pioneer of deep reinforcement learning, said in a press briefing. In this case, that capability was used to construct programs in the computer language Lean that represent mathematical proofs. He says the International Mathematical Olympiad represents a test for what’s possible and paves the way for further breakthroughs. 

This same recipe could be applied in any situation with really clear, verified reward signals for reinforcement-learning algorithms and an unambiguous way to measure correctness as you can in mathematics, said Silver. One potential application would be coding, for example. 

Now for a compulsory reality check: AlphaProof and AlphaGeometry 2 can still only solve hard high-school-level problems. That’s a long way away from the extremely hard problems top human mathematicians can solve. Google DeepMind stressed that its tool did not, at this point, add anything to the body of mathematical knowledge humans have created. But that wasn’t the point. 

“We are aiming to provide a system that can prove anything,” Silver said. Think of an AI system as reliable as a calculator, for example, that can provide proofs for many challenging problems, or verify tests for computer software or scientific experiments. Or perhaps build better AI tutors that can give feedback on exam results, or fact-check news articles. 

But the thing that excites me most is what Katie Collins, a researcher at the University of Cambridge who specializes in math and AI (and was not involved in the project), told Rhiannon. She says these tools create and evaluate new problems, motivate new people to enter the field, and spark more wonder. That’s something we definitely need more of in this world.


Now read the rest of The Algorithm

Deeper Learning

A new tool for copyright holders can show if their work is in AI training data

Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. Now they have a new way to prove it: “copyright traps.” These are pieces of hidden text that let you mark written content in order to later detect whether it has been used in AI models or not. 

Why this matters: Copyright traps tap into one of the biggest fights in AI. A number of publishers and writers are in the middle of litigation against tech companies, claiming their intellectual property has been scraped into AI training data sets without their permission. The idea is that these traps could help to nudge the balance a little more in the content creators’ favor. Read more from me here

Bits and Bytes

AI trained on AI garbage spits out AI garbage
New research published in Nature shows that the quality of AI models’ output gradually degrades when it’s trained on AI-generated data. As subsequent models produce output that is then used as training data for future models, the effect gets worse. (MIT Technology Review

OpenAI unveils SearchGPT 
The company says it is testing new AI search features that give you fast and timely answers with clear and relevant sources cited. The idea is for the technology to eventually be incorporated into ChatGPT, and CEO Sam Altman says it’ll be possible to do voice searches. However, like many other AI-powered search services, including Google’s, it’s already making errors, as the Atlantic reports. 
(OpenAI

AI video generator Runway trained on thousands of YouTube videos without permission
Leaked documents show that the company was secretly training its generative AI models by scraping thousands of videos from popular YouTube creators and brands, as well as pirated films. (404 media

Meta’s big bet on open-source AI continues
Meta unveiled Llama 3.1 405B, the first frontier-level open-source AI model, which matches state-of-the-art models such as GPT-4 and Gemini in performance. In an accompanying blog post, Mark Zuckerberg renewed his calls for open-source AI to become the industry standard. This would be good for customization, competition, data protection, and efficiency, he argues. It’s also good for Meta, because it leaves competitors with less of an advantage in the AI space. (Facebook

Reimagining cloud strategy for AI-first enterprises

The rise of generative artificial intelligence (AI), natural language processing, and computer vision has sparked lofty predictions: AI will revolutionize business operations, transform the nature of knowledge work, and boost companies’ bottom lines and the larger global economy by trillions of dollars.

Executives and technology leaders are eager to see these promises realized, and many are enjoying impressive results of early AI investments. Balakrishna D.R. (Bali), executive vice president, global services head, AI and industry verticals at Infosys, says that generative AI is already proving game-changing for tasks such as knowledge management, search and summarization, software development, and customer service across sectors such as financial services, retail, health care, and automotive.

Realizing AI’s full potential on a mass scale will require more than just executives’ enthusiasm; becoming a truly AI-first enterprise will require a significant, sustained investment in cloud infrastructure and strategy. In 2024, the cloud has evolved beyond its initial purpose as a storage tool and cost saver to become a crucial driver of innovation, transformation, and disruption. Now, with AI in the mix, enterprises are looking to the cloud to support large language models (LLMs) to maximize R&D performance and prevent cybersecurity attacks, among other high-impact use cases.

A 2023 report by Infosys looks at how prepared companies are to realize the combined potential of cloud and AI. To further assess this state of readiness, MIT Technology Review Insights and Infosys surveyed 500 business leaders across industries such as IT, manufacturing, financial services, and consumer goods about how their organizations are thinking about—and acting upon—an integrated cloud and AI strategy.

This research found that most companies are still experimenting and preparing their infrastructure landscape for AI from a cloud perspective—and many are planning additional investments to accelerate their progress.

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

OpenAI has released a new ChatGPT bot that you can talk to

OpenAI is rolling out an advanced AI chatbot that you can talk to. It’s available today—at least for some. 

The new chatbot represents OpenAI’s push into a new generation of AI-powered voice assistants in the vein of Siri and Alexa, but with far more capabilities to enable more natural, fluent conversations. It is a step in the march to more fully capable AI agents. The new ChatGPT voice bot can tell what different tones of voice convey, responds to interruptions, and reply to queries in real time. It has also been trained to sound more natural and use voices to convey a wide range of different emotions.

The voice mode is powered by OpenAI’s new GPT-4o model, which combines voice, text, and vision capabilities. To gather feedback, the company is initially launching the chatbot to a “small group of users” paying for ChatGPT Plus, but it says it will make the bot available to all ChatGPT Plus subscribers this fall. A ChatGPT Plus subscription costs $20 a month. OpenAI says it will notify customers who are part of the first rollout wave in the ChatGPT app and provide instructions on how to use the new model.   

The new voice feature, which was announced in May, is being launched a month later than originally planned because the company said it needed more time to improve safety features, such as the model’s ability to detect and refuse unwanted content. The company also said it was preparing its infrastructure to offer real-time responses to millions of users. 

OpenAI says it has tested the model’s voice capabilities with more than 100 external red-teamers, who were tasked with probing the model for flaws. These testers spoke a total of 45 languages and represented 29 countries, according to OpenAI.

The company says it has put several safety mechanisms in place. In a move that aims to prevent the model from being used to create audio deepfakes, for example, it has created four preset voices in collaboration with voice actors. GPT-4o will not impersonate or generate other people’s voices.  

When OpenAI first introduced GPT-4o, the company faced a backlash over its use of a voice called “Sky,” which sounded a lot like the actress Scarlett Johansson. Johansson released a statement saying the company had reached out to her for permission to use her voice for the model, which she declined. She said she was shocked to hear a voice “eerily similar” to hers in the model’s demo. OpenAI has denied that the voice is Johansson’s but has paused the use of Sky. 

The company is also embroiled in several lawsuits over alleged copyright infringement. OpenAI says it has adopted filters that recognize and block requests to generate music or other copyrighted audio. OpenAI also says it has applied the same safety mechanisms it uses in its text-based model to GPT-4o to prevent it from breaking laws and generating harmful content. 

Down the line, OpenAI plans to include more advanced features, such as video and screen sharing, which could make the assistant more useful. In its May demo, employees pointed their phone cameras at a piece of paper and asked the AI model to help them solve math equations. They also shared their computer screens and asked the model to help them solve coding problems. OpenAI says these features will not be available now but at an unspecified later date. 

Google DeepMind’s AI systems can now solve complex math problems

AI models can easily generate essays and other types of text. However, they’re nowhere near as good at solving math problems, which tend to involve logical reasoning—something that’s beyond the capabilities of most current AI systems.

But that may finally be changing. Google DeepMind says it has trained two specialized AI systems to solve complex math problems involving advanced reasoning. The systems—called AlphaProof and AlphaGeometry 2—worked together to successfully solve four out of six problems from this year’s International Mathematical Olympiad (IMO), a prestigious competition for high school students. They won the equivalent of a silver medal at the event.

It’s the first time any AI system has ever achieved such a high success rate on these kinds of problems. “This is great progress in the field of machine learning and AI,” says Pushmeet Kohli, vice president of research at Google DeepMind, who worked on the project. “No such system has been developed until now which could solve problems at this success rate with this level of generality.” 

There are a few reasons math problems that involve advanced reasoning are difficult for AI systems to solve. These types of problems often require forming and drawing on abstractions. They also involve complex hierarchical planning, as well as setting subgoals, backtracking, and trying new paths. All these are challenging for AI. 

“It is often easier to train a model for mathematics if you have a way to check its answers (e.g., in a formal language), but there is comparatively less formal mathematics data online compared to free-form natural language (informal language),” says Katie Collins, an researcher at the University of Cambridge who specializes in math and AI but was not involved in the project. 

Bridging this gap was Google DeepMind’s goal in creating AlphaProof, a reinforcement-learning-based system that trains itself to prove mathematical statements in the formal programming language Lean. The key is a version of DeepMind’s Gemini AI that’s fine-tuned to automatically translate math problems phrased in natural, informal language into formal statements, which are easier for the AI to process. This created a large library of formal math problems with varying degrees of difficulty.

Automating the process of translating data into formal language is a big step forward for the math community, says Wenda Li, a lecturer in hybrid AI at the University of Edinburgh, who peer-reviewed the research but was not involved in the project. 

“We can have much greater confidence in the correctness of published results if they are able to formulate this proving system, and it can also become more collaborative,” he adds.

The Gemini model works alongside AlphaZero—the reinforcement-learning model that Google DeepMind trained to master games such as Go and chess—to prove or disprove millions of mathematical problems. The more problems it has successfully solved, the better AlphaProof has become at tackling problems of increasing complexity.

Although AlphaProof was trained to tackle problems across a wide range of mathematical topics, AlphaGeometry 2—an improved version of a system that Google DeepMind announced in January—was optimized to tackle problems relating to movements of objects and equations involving angles, ratios, and distances. Because it was trained on significantly more synthetic data than its predecessor, it was able to take on much more challenging geometry questions.

To test the systems’ capabilities, Google DeepMind researchers tasked them with solving the six problems given to humans competing in this year’s IMO and proving that the answers were correct. AlphaProof solved two algebra problems and one number theory problem, one of which was the competition’s hardest. AlphaGeometry 2 successfully solved a geometry question, but two questions on combinatorics (an area of math focused on counting and arranging objects) were left unsolved.   

“Generally, AlphaProof performs much better on algebra and number theory than combinatorics,” says Alex Davies, a research engineer on the AlphaProof team. “We are still working to understand why this is, which will hopefully lead us to improve the system.”

Two renowned mathematicians, Tim Gowers and Joseph Myers, checked the systems’ submissions. They awarded each of their four correct answers full marks (seven out of seven), giving the systems a total of 28 points out of a maximum of 42. A human participant earning this score would be awarded a silver medal and just miss out on gold, the threshold for which starts at 29 points. 

This is the first time any AI system has been able to achieve a medal-level performance on IMO questions. “As a mathematician, I find it very impressive, and a significant jump from what was previously possible,” Gowers said during a press conference. 

Myers agreed that the systems’ math answers represent a substantial advance over what AI could previously achieve. “It will be interesting to see how things scale and whether they can be made faster, and whether it can extend to other sorts of mathematics,” he said.

Creating AI systems that can solve more challenging mathematics problems could pave the way for exciting human-AI collaborations, helping mathematicians to both solve and invent new kinds of problems, says Collins. This in turn could help us learn more about how we humans tackle math.

“There is still much we don’t know about how humans solve complex mathematics problems,” she says.

A new tool for copyright holders can show if their work is in AI training data

Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. 

Now they have a new way to prove it: “copyright traps” developed by a team at Imperial College London, pieces of hidden text that allow writers and publishers to subtly mark their work in order to later detect whether it has been used in AI models or not. The idea is similar to traps that have been used by copyright holders throughout history—strategies like including fake locations on a map or fake words in a dictionary. 

These AI copyright traps tap into one of the biggest fights in AI. A number of publishers and writers are in the middle of litigation against tech companies, claiming their intellectual property has been scraped into AI training data sets without their permission. The New York Times’ ongoing case against OpenAI is probably the most high-profile of these.  

The code to generate and detect traps is currently available on GitHub, but the team also intends to build a tool that allows people to generate and insert copyright traps themselves. 

“There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators],” says Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, who led the research. It was presented at the International Conference on Machine Learning, a top AI conference being held in Vienna this week. 

To create the traps, the team used a word generator to create thousands of synthetic sentences. These sentences are long and full of gibberish, and could look something like this: ”When in comes times of turmoil … whats on sale and more important when, is best, this list tells your who is opening on Thrs. at night with their regular sale times and other opening time from your neighbors. You still.”

The team generated 100 trap sentences and then randomly chose one to inject into a text many times, de Montjoy explains. The trap could be injected into text in multiple ways—for example, as white text on a white background, or embedded in the article’s source code. This sentence had to be repeated in the text 100 to 1,000 times. 

To detect the traps, they fed a large language model the 100 synthetic sentences they had generated, and looked at whether it flagged them as new or not. If the model had seen a trap sentence in its training data, it would indicate a lower “surprise” (also known as “perplexity”) score. But if the model was “surprised” about sentences, it meant that it was encountering them for the first time, and therefore they weren’t traps. 

In the past, researchers have suggested exploiting the fact that language models memorize their training data to determine whether something has appeared in that data. The technique, called a “membership inference attack,” works effectively in large state-of-the art models, which tend to memorize a lot of their data during training. 

In contrast, smaller models, which are gaining popularity and can be run on mobile devices, memorize less and are thus less susceptible to membership inference attacks, which makes it harder to determine whether or not they were trained on a particular copyrighted document, says Gautam Kamath, an assistant computer science professor at the University of Waterloo, who was not part of the research. 

Copyright traps are a way to do membership inference attacks even on smaller models. The team injected their traps into the training data set of CroissantLLM, a new bilingual French-English language model that was trained from scratch by a team of industry and academic researchers that the Imperial College London team partnered with. CroissantLLM has 1.3 billion parameters, a fraction as many as state-of-the-art models (GPT-4 reportedly has 1.76 trillion, for example).

The research shows it is indeed possible to introduce such traps into text data so as to significantly increase the efficacy of membership inference attacks, even for smaller models, says Kamath. But there’s still a lot to be done, he adds. 

Repeating a 75-word phrase 1,000 times in a document is a big change to the original text, which could allow people training AI models to detect the trap and skip content containing it, or just delete it and train on the rest of the text, Kamath says. It also makes the original text hard to read. 

This makes copyright traps impractical right now, says Sameer Singh, a professor of computer science at the University of California, Irvine, and a cofounder of the startup Spiffy AI. He was not part of the research. “A lot of companies do deduplication, [meaning] they clean up the data, and a bunch of this kind of stuff will probably get thrown out,” Singh says. 

One way to improve copyright traps, says Kamath, would be to find other ways to mark copyrighted content so that membership inference attacks work better on them, or to improve membership inference attacks themselves. 

De Montjoye acknowledges that the traps are not foolproof. A motivated attacker who knows about a trap can remove them, he says. 

“Whether they can remove all of them or not is an open question, and that’s likely to be a bit of a cat-and-mouse game,” he says. But even then, the more traps are applied, the harder it becomes to remove all of them without significant engineering resources.

“It’s important to keep in mind that copyright traps may only be a stopgap solution, or merely an inconvenience to model trainers,” says Kamath. “One can not release a piece of content containing a trap and have any assurance that it will be an effective trap forever.”