Google DeepMind’s AI systems can now solve complex math problems

AI models can easily generate essays and other types of text. However, they’re nowhere near as good at solving math problems, which tend to involve logical reasoning—something that’s beyond the capabilities of most current AI systems.

But that may finally be changing. Google DeepMind says it has trained two specialized AI systems to solve complex math problems involving advanced reasoning. The systems—called AlphaProof and AlphaGeometry 2—worked together to successfully solve four out of six problems from this year’s International Mathematical Olympiad (IMO), a prestigious competition for high school students. They won the equivalent of a silver medal at the event.

It’s the first time any AI system has ever achieved such a high success rate on these kinds of problems. “This is great progress in the field of machine learning and AI,” says Pushmeet Kohli, vice president of research at Google DeepMind, who worked on the project. “No such system has been developed until now which could solve problems at this success rate with this level of generality.” 

There are a few reasons math problems that involve advanced reasoning are difficult for AI systems to solve. These types of problems often require forming and drawing on abstractions. They also involve complex hierarchical planning, as well as setting subgoals, backtracking, and trying new paths. All these are challenging for AI. 

“It is often easier to train a model for mathematics if you have a way to check its answers (e.g., in a formal language), but there is comparatively less formal mathematics data online compared to free-form natural language (informal language),” says Katie Collins, an researcher at the University of Cambridge who specializes in math and AI but was not involved in the project. 

Bridging this gap was Google DeepMind’s goal in creating AlphaProof, a reinforcement-learning-based system that trains itself to prove mathematical statements in the formal programming language Lean. The key is a version of DeepMind’s Gemini AI that’s fine-tuned to automatically translate math problems phrased in natural, informal language into formal statements, which are easier for the AI to process. This created a large library of formal math problems with varying degrees of difficulty.

Automating the process of translating data into formal language is a big step forward for the math community, says Wenda Li, a lecturer in hybrid AI at the University of Edinburgh, who peer-reviewed the research but was not involved in the project. 

“We can have much greater confidence in the correctness of published results if they are able to formulate this proving system, and it can also become more collaborative,” he adds.

The Gemini model works alongside AlphaZero—the reinforcement-learning model that Google DeepMind trained to master games such as Go and chess—to prove or disprove millions of mathematical problems. The more problems it has successfully solved, the better AlphaProof has become at tackling problems of increasing complexity.

Although AlphaProof was trained to tackle problems across a wide range of mathematical topics, AlphaGeometry 2—an improved version of a system that Google DeepMind announced in January—was optimized to tackle problems relating to movements of objects and equations involving angles, ratios, and distances. Because it was trained on significantly more synthetic data than its predecessor, it was able to take on much more challenging geometry questions.

To test the systems’ capabilities, Google DeepMind researchers tasked them with solving the six problems given to humans competing in this year’s IMO and proving that the answers were correct. AlphaProof solved two algebra problems and one number theory problem, one of which was the competition’s hardest. AlphaGeometry 2 successfully solved a geometry question, but two questions on combinatorics (an area of math focused on counting and arranging objects) were left unsolved.   

“Generally, AlphaProof performs much better on algebra and number theory than combinatorics,” says Alex Davies, a research engineer on the AlphaProof team. “We are still working to understand why this is, which will hopefully lead us to improve the system.”

Two renowned mathematicians, Tim Gowers and Joseph Myers, checked the systems’ submissions. They awarded each of their four correct answers full marks (seven out of seven), giving the systems a total of 28 points out of a maximum of 42. A human participant earning this score would be awarded a silver medal and just miss out on gold, the threshold for which starts at 29 points. 

This is the first time any AI system has been able to achieve a medal-level performance on IMO questions. “As a mathematician, I find it very impressive, and a significant jump from what was previously possible,” Gowers said during a press conference. 

Myers agreed that the systems’ math answers represent a substantial advance over what AI could previously achieve. “It will be interesting to see how things scale and whether they can be made faster, and whether it can extend to other sorts of mathematics,” he said.

Creating AI systems that can solve more challenging mathematics problems could pave the way for exciting human-AI collaborations, helping mathematicians to both solve and invent new kinds of problems, says Collins. This in turn could help us learn more about how we humans tackle math.

“There is still much we don’t know about how humans solve complex mathematics problems,” she says.

A new tool for copyright holders can show if their work is in AI training data

Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. 

Now they have a new way to prove it: “copyright traps” developed by a team at Imperial College London, pieces of hidden text that allow writers and publishers to subtly mark their work in order to later detect whether it has been used in AI models or not. The idea is similar to traps that have been used by copyright holders throughout history—strategies like including fake locations on a map or fake words in a dictionary. 

These AI copyright traps tap into one of the biggest fights in AI. A number of publishers and writers are in the middle of litigation against tech companies, claiming their intellectual property has been scraped into AI training data sets without their permission. The New York Times’ ongoing case against OpenAI is probably the most high-profile of these.  

The code to generate and detect traps is currently available on GitHub, but the team also intends to build a tool that allows people to generate and insert copyright traps themselves. 

“There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators],” says Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, who led the research. It was presented at the International Conference on Machine Learning, a top AI conference being held in Vienna this week. 

To create the traps, the team used a word generator to create thousands of synthetic sentences. These sentences are long and full of gibberish, and could look something like this: ”When in comes times of turmoil … whats on sale and more important when, is best, this list tells your who is opening on Thrs. at night with their regular sale times and other opening time from your neighbors. You still.”

The team generated 100 trap sentences and then randomly chose one to inject into a text many times, de Montjoy explains. The trap could be injected into text in multiple ways—for example, as white text on a white background, or embedded in the article’s source code. This sentence had to be repeated in the text 100 to 1,000 times. 

To detect the traps, they fed a large language model the 100 synthetic sentences they had generated, and looked at whether it flagged them as new or not. If the model had seen a trap sentence in its training data, it would indicate a lower “surprise” (also known as “perplexity”) score. But if the model was “surprised” about sentences, it meant that it was encountering them for the first time, and therefore they weren’t traps. 

In the past, researchers have suggested exploiting the fact that language models memorize their training data to determine whether something has appeared in that data. The technique, called a “membership inference attack,” works effectively in large state-of-the art models, which tend to memorize a lot of their data during training. 

In contrast, smaller models, which are gaining popularity and can be run on mobile devices, memorize less and are thus less susceptible to membership inference attacks, which makes it harder to determine whether or not they were trained on a particular copyrighted document, says Gautam Kamath, an assistant computer science professor at the University of Waterloo, who was not part of the research. 

Copyright traps are a way to do membership inference attacks even on smaller models. The team injected their traps into the training data set of CroissantLLM, a new bilingual French-English language model that was trained from scratch by a team of industry and academic researchers that the Imperial College London team partnered with. CroissantLLM has 1.3 billion parameters, a fraction as many as state-of-the-art models (GPT-4 reportedly has 1.76 trillion, for example).

The research shows it is indeed possible to introduce such traps into text data so as to significantly increase the efficacy of membership inference attacks, even for smaller models, says Kamath. But there’s still a lot to be done, he adds. 

Repeating a 75-word phrase 1,000 times in a document is a big change to the original text, which could allow people training AI models to detect the trap and skip content containing it, or just delete it and train on the rest of the text, Kamath says. It also makes the original text hard to read. 

This makes copyright traps impractical right now, says Sameer Singh, a professor of computer science at the University of California, Irvine, and a cofounder of the startup Spiffy AI. He was not part of the research. “A lot of companies do deduplication, [meaning] they clean up the data, and a bunch of this kind of stuff will probably get thrown out,” Singh says. 

One way to improve copyright traps, says Kamath, would be to find other ways to mark copyrighted content so that membership inference attacks work better on them, or to improve membership inference attacks themselves. 

De Montjoye acknowledges that the traps are not foolproof. A motivated attacker who knows about a trap can remove them, he says. 

“Whether they can remove all of them or not is an open question, and that’s likely to be a bit of a cat-and-mouse game,” he says. But even then, the more traps are applied, the harder it becomes to remove all of them without significant engineering resources.

“It’s important to keep in mind that copyright traps may only be a stopgap solution, or merely an inconvenience to model trainers,” says Kamath. “One can not release a piece of content containing a trap and have any assurance that it will be an effective trap forever.” 

AI trained on AI garbage spits out AI garbage

AI models work by training on huge swaths of data from the internet. But as AI is increasingly being used to pump out web pages filled with junk content, that process is in danger of being undermined.

New research published in Nature shows that the quality of the model’s output gradually degrades when AI trains on AI-generated data. As subsequent models produce output that is then used as training data for future models, the effect gets worse.  

Ilia Shumailov, a computer scientist from the University of Oxford, who led the study, likens the process to taking photos of photos. “If you take a picture and you scan it, and then you print it, and you repeat this process over time, basically the noise overwhelms the whole process,” he says. “You’re left with a dark square.” The equivalent of the dark square for AI is called “model collapse,” he says, meaning the model just produces incoherent garbage. 

This research may have serious implications for the largest AI models of today, because they use the internet as their database. GPT-3, for example, was trained in part on data from Common Crawl, an online repository of over 3 billion web pages. And the problem is likely to get worse as an increasing number of AI-generated junk websites start cluttering up the internet. 

Current AI models aren’t just going to collapse, says Shumailov, but there may still be substantive effects: The improvements will slow down, and performance might suffer. 

To determine the potential effect on performance, Shumailov and his colleagues fine-tuned a large language model (LLM) on a set of data from Wikipedia, then fine-tuned the new model on its own output over nine generations. The team measured how nonsensical the output was using a “perplexity score,” which measures an AI model’s confidence in its ability to predict the next part of a sequence; a higher score translates to a less accurate model. 

The models trained on other models’ outputs had higher perplexity scores. For example, for each generation, the team asked the model for the next sentence after the following input:

“some started before 1360—was typically accomplished by a master mason and a small team of itinerant masons, supplemented by local parish labourers, according to Poyntz Wright. But other authors reject this model, suggesting instead that leading architects designed the parish church towers based on early examples of Perpendicular.”

On the ninth and final generation, the model returned the following:

“architecture. In addition to being home to some of the world’s largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, yellow @-.”

Shumailov explains what he thinks is going on using this analogy: Imagine you’re trying to find the least likely name of a student in school. You could go through every student name, but it would take too long. Instead, you look at 100 of the 1,000 student names. You get a pretty good estimate, but it’s probably not the correct answer. Now imagine that another person comes and makes an estimate based on your 100 names, but only selects 50. This second person’s estimate is going to be even further off.

“You can certainly imagine that the same happens with machine learning models,” he says. “So if the first model has seen half of the internet, then perhaps the second model is not going to ask for half of the internet, but actually scrape the latest 100,000 tweets, and fit the model on top of it.”

Additionally, the internet doesn’t hold an unlimited amount of data. To feed their appetite for more, future AI models may need to train on synthetic data—or data that has been produced by AI.   

“Foundation models really rely on the scale of data to perform well,” says Shayne Longpre, who studies how LLMs are trained at the MIT Media Lab, and who didn’t take part in this research. “And they’re looking to synthetic data under curated, controlled environments to be the solution to that. Because if they keep crawling more data on the web, there are going to be diminishing returns.”

Matthias Gerstgrasser, an AI researcher at Stanford who authored a different paper examining model collapse, says adding synthetic data to real-world data instead of replacing it doesn’t cause any major issues. But he adds: “One conclusion all the model collapse literature agrees on is that high-quality and diverse training data is important.”

Another effect of this degradation over time is that information that affects minority groups is heavily distorted in the model, as it tends to overfocus on samples that are more prevalent in the training data. 

In current models, this may affect underrepresented languages as they require more synthetic (AI-generated) data sets, says Robert Mahari, who studies computational law at the MIT Media Lab (he did not take part in the research).

One idea that might help avoid degradation is to make sure the model gives more weight to the original human-generated data. Another part of Shumailov’s study allowed future generations to sample 10% of the original data set, which mitigated some of the negative effects. 

That would require making a trail from the original human-generated data to further generations, known as data provenance.

But provenance requires some way to filter the internet into human-generated and AI-generated content, which hasn’t been cracked yet. Though a number of tools now exist that aim to determine whether text is AI-generated, they are often inaccurate.

“Unfortunately, we have more questions than answers,” says Shumailov. “But it’s clear that it’s important to know where your data comes from and how much you can trust it to capture a representative sample of the data you’re dealing with.”

Google’s new weather prediction system combines AI with traditional physics

Researchers from Google have built a new weather prediction model that combines machine learning with more conventional techniques, potentially yielding accurate forecasts at a fraction of the current cost. 

The model, called NeuralGCM and described in a paper in Nature today, bridges a divide that’s grown among weather prediction experts in the last several years. 

While new machine-learning techniques that predict weather by learning from years of past data are extremely fast and efficient, they can struggle with long-term predictions. General circulation models, on the other hand, which have dominated weather prediction for the last 50 years, use complex equations to model changes in the atmosphere and give accurate projections, but they are exceedingly slow and expensive to run. Experts are divided on which tool will be most reliable going forward. But the new model from Google instead attempts to combine the two. 

“It’s not sort of physics versus AI. It’s really physics and AI together,” says Stephan Hoyer, an AI researcher at Google Research and a coauthor of the paper. 

The system still uses a conventional model to work out some of the large atmospheric changes required to make a prediction. It then incorporates AI, which tends to do well where those larger models fall flat—typically for predictions on scales smaller than about 25 kilometers, like those dealing with cloud formations or regional microclimates (San Francisco’s fog, for example). “That’s where we inject AI very selectively to correct the errors that accumulate on small scales,” Hoyer says.

The result, the researchers say, is a model that can produce quality predictions faster with less computational power. They say NeuralGCM is as accurate as one-to-15-day forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), which is a partner organization in the research. 

But the real promise of technology like this is not in better weather predictions for your local area, says Aaron Hill, an assistant professor at the School of Meteorology at the University of Oklahoma, who was not involved in this research. Instead, it’s in larger-scale climate events that are prohibitively expensive to model with conventional techniques. The possibilities could range from predicting tropical cyclones with more notice to modeling more complex climate changes that are years away. 

“It’s so computationally intensive to simulate the globe over and over again or for long periods of time,” Hill says. That means the best climate models are hamstrung by the high costs of computing power, which presents a real bottleneck to research. 

AI-based models are indeed more compact. Once trained, typically on 40 years of historical weather data from ECMWF, a machine-learning model like Google’s GraphCast can run on less than 5,500 lines of code, compared with the nearly 377,000 lines required for the model from the National Oceanic and Atmospheric Administration, according to the paper. 

NeuralGCM, according to Hill, seems to make a strong case that AI can be brought in for particular elements of weather modeling to make things faster, while still keeping the strengths of conventional systems.

“We don’t have to throw away all the knowledge that we’ve gained over the last 100 years about how the atmosphere works,” he says. “We can actually integrate that with the power of AI and machine learning as well.”

Hoyer says using the model to predict short-term weather has been useful for validating its predictions, but that the goal is indeed to be able to use it for longer-term modeling, particularly for extreme weather risk. 

NeuralGCM will be open source. While Hoyer says he looks forward to having climate scientists use it in their research, the model may also be of interest to more than just academics. Commodities traders and agricultural planners pay top dollar for high-resolution predictions, and the models used by insurance companies for products like flood or extreme weather insurance are struggling to account for the impact of climate change. 

While many of the AI skeptics in weather forecasting have been won over by recent developments, according to Hill, the fast pace is hard for the research community to keep up with. “It’s gangbusters,” he says—it seems as if a new model is released by Google, Nvidia, or Huawei every two months. That makes it difficult for researchers to actually sort out which of the new tools will be most useful and apply for research grants accordingly. 

“The appetite is there [for AI],” Hill says. “But I think a lot of us still are waiting to see what happens.”

Correction: This story was updated to clarify that Stephan Hoyer is a researcher at Google Research, not Google DeepMind.

AI companies promised to self-regulate one year ago. What’s changed?

One year ago, on July 21, 2023, seven leading AI companies—Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI—committed with the White House to a set of eight voluntary commitments on how to develop AI in a safe and trustworthy way.

These included promises to do things like improve the testing and transparency around AI systems, and share information on potential harms and risks. 

On the first anniversary of the voluntary commitments, MIT Technology Review asked the AI companies that signed the commitments for details on their work so far. Their replies show that the tech sector has made some welcome progress, with big caveats.

The voluntary commitments came at a time when generative AI mania was perhaps at its frothiest, with companies racing to launch their own models and make them bigger and better than their competitors’. At the same time, we started to see developments such as fights over copyright and deepfakes. A vocal lobby of influential tech players, such as Geoffrey Hinton, had also raised concerns that AI could pose an existential risk to humanity. Suddenly, everyone was talking about the urgent need to make AI safe, and regulators everywhere were under pressure to do something about it.

Until very recently, AI development has been a Wild West. Traditionally, the US has been loath to regulate its tech giants, instead relying on them to regulate themselves. The voluntary commitments are a good example of that: they were some of the first prescriptive rules for the AI sector in the US, but they remain voluntary and unenforceable. The White House has since issued an executive order, which expands on the commitments and also applies to other tech companies and government departments. 

“One year on, we see some good practices towards their own products, but [they’re] nowhere near where we need them to be in terms of good governance or protection of rights at large,” says Merve Hickok, the president and research director of the Center for AI and Digital Policy, who reviewed the companies’ replies as requested by MIT Technology Review. Many of these companies continue to push unsubstantiated claims about their products, such as saying that they can supersede human intelligence and capabilities, adds Hickok. 

One trend that emerged from the tech companies’ answers is that companies are doing more  to pursue technical fixes such as red-teaming (in which humans probe AI models for flaws) and watermarks for AI-generated content. 

But it’s not clear what the commitments have changed and whether the companies would have implemented these measures anyway, says Rishi Bommasani, the society lead at the Stanford Center for Research on Foundation Models, who also reviewed the responses for MIT Technology Review.  

One year is a long time in AI. Since the voluntary commitments were signed, Inflection AI founder Mustafa Suleyman has left the company and joined Microsoft to lead the company’s AI efforts. Inflection declined to comment. 

“We’re grateful for the progress leading companies have made toward fulfilling their voluntary commitments in addition to what is required by the executive order,” says Robyn Patterson, a spokesperson for the White House. But, Patterson adds, the president continues to call on Congress to pass bipartisan legislation on AI. 

Without comprehensive federal legislation, the best the US can do right now is to demand that companies follow through on these voluntary commitments, says Brandie Nonnecke, the director of the CITRIS Policy Lab at UC Berkeley. 

But it’s worth bearing in mind that “these are still companies that are essentially writing the exam by which they are evaluated,” says Nonnecke. “So we have to think carefully about whether or not they’re … verifying themselves in a way that is truly rigorous.” 

Here’s our assessment of the progress AI companies have made in the past year.

Commitment 1

The companies commit to internal and external security testing of their AI systems before their release. This testing, which will be carried out in part by independent experts, guards against some of the most significant sources of AI risks, such as biosecurity and cybersecurity, as well as its broader societal effects.

All the companies (excluding Inflection, which chose not to comment) say they conduct red-teaming exercises that get both internal and external testers to probe their models for flaws and risks. OpenAI says it has a separate preparedness team that tests models for cybersecurity, chemical, biological, radiological, and nuclear threats and for situations where a sophisticated AI model can do or persuade a person to do things that might lead to harm. Anthropic and OpenAI also say they conduct these tests with external experts before launching their new models. For example, for the launch of Anthropic’s latest model, Claude 3.5, the company conducted predeployment testing with experts at the UK’s AI Safety Institute. Anthropic has also allowed METR, a research nonprofit, to do an “initial exploration” of Claude 3.5’s capabilities for autonomy. Google says it also conducts internal red-teaming to test the boundaries of its model, Gemini, around election-related content, societal risks, and national security concerns. Microsoft says it has worked with third-party evaluators at NewsGuard, an organization advancing journalistic integrity, to evaluate risks and mitigate the risk of abusive deepfakes in Microsoft’s text-to-image tool. In addition to red-teaming, Meta says, it evaluated its latest model, Llama 3, to understand its performance in a series of risk areas like weapons, cyberattacks, and child exploitation. 

But when it comes to testing, it’s not enough to just report that a company is taking actions, says Bommasani. For example, Amazon and Anthropic said they had worked with the nonprofit Thorn to combat risks to child safety posed by AI. Bommasani would have wanted to see more specifics about how the interventions that companies are implementing actually reduce those risks. 

“It should become clear to us that it’s not just that companies are doing things but those things are having the desired effect,” Bommasani says.  

RESULT: Good. The push for red-teaming and testing for a wide range of risks is a good and important one. However, Hickok would have liked to see independent researchers get broader access to companies’ models. 

Commitment 2

The companies commit to sharing information across the industry and with governments, civil society, and academia on managing AI risks. This includes best practices for safety, information on attempts to circumvent safeguards, and technical collaboration.

After they signed the commitments, Anthropic, Google, Microsoft, and OpenAI founded the Frontier Model Forum, a nonprofit that aims to facilitate discussions and actions on AI safety and responsibility. Amazon and Meta have also joined.  

Engaging with nonprofits that the AI companies funded themselves may not be in the spirit of the voluntary commitments, says Bommasani. But the Frontier Model Forum could be a way for these companies to cooperate with each other and pass on information about safety, which they normally could not do as competitors, he adds. 

“Even if they’re not going to be transparent to the public, one thing you might want is for them to at least collectively figure out mitigations to actually reduce risk,” says Bommasani. 

All of the seven signatories are also part of the Artificial Intelligence Safety Institute Consortium (AISIC), established by the National Institute of Standards and Technology (NIST), which develops guidelines and standards for AI policy and evaluation of AI performance. It is a large consortium consisting of a mix of public- and private-sector players. Google, Microsoft, and OpenAI also have representatives at the UN’s High-Level Advisory Body on Artificial Intelligence

Many of the labs also highlighted their research collaborations with academics. For example, Google is part of MLCommons, where it worked with academics on a cross-industry AI Safety Benchmark. Google also says it actively contributes tools and resources, such as computing credit, to projects like the National Science Foundation’s National AI Research Resource pilot, which aims to democratize AI research in the US.

Many of the companies also contributed to guidance by the Partnership on AI, another nonprofit founded by Amazon, Facebook, Google, DeepMind, Microsoft, and IBM, on the deployment of foundation models. 

RESULT: More work is needed. More information sharing is a welcome step as the industry tries to collectively make AI systems safe and trustworthy. However, it’s unclear how much of the effort advertised will actually lead to meaningful changes and how much is window dressing. 

Commitment 3

The companies commit to investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights. These model weights are the most essential part of an AI system, and the companies agree that it is vital that the model weights be released only when intended and when security risks are considered.

Many of the companies have implemented new cybersecurity measures in the past year. For example, Microsoft has launched the Secure Future Initiative to address the growing scale of cyberattacks. The company says its model weights are encrypted to mitigate the potential risk of model theft, and it applies strong identity and access controls when deploying highly capable proprietary models. 

Google too has launched an AI Cyber Defense Initiative. In May OpenAI shared six new measures it is developing to complement its existing cybersecurity practices, such as extending cryptographic protection to AI hardware. It also has a Cybersecurity Grant Program, which gives researchers access to its models to build cyber defenses. 

Amazon mentioned that it has also taken specific measures against attacks specific to generative AI, such as data poisoning and prompt injection, in which someone uses prompts that direct the language model to ignore its previous directions and safety guardrails.

Just a couple of days after signing the commitments, Anthropic published details about its protections, which include common cybersecurity practices such as controlling who has access to the models and sensitive assets such as model weights, and inspecting and controlling the third-party supply chain. The company also works with independent assessors to evaluate whether the controls it has designed meet its cybersecurity needs.

RESULT: Good. All of the companies did say they had taken extra measures to protect their models, although it doesn’t seem there is much consensus on the best way to protect AI models. 

Commitment 4

The companies commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems. Some issues may persist even after an AI system is released and a robust reporting mechanism enables them to be found and fixed quickly. 

For this commitment, one of the most popular responses was to implement bug bounty programs, which reward people who find flaws in AI systems. Anthropic, Google, Microsoft, Meta, and OpenAI all have one for AI systems. Anthropic and Amazon also said they have forms on their websites where security researchers can submit vulnerability reports. 

It will likely take us years to figure out how to do third-party auditing well, says Brandie Nonnecke. “It’s not just a technical challenge. It’s a socio-technical challenge. And it just kind of takes years for us to figure out not only the technical standards of AI, but also socio-technical standards, and it’s messy and hard,” she says. 

Nonnecke says she worries that the first companies to implement third-party audits might set poor precedents for how to think about and address the socio-technical risks of AI. For example, audits might define, evaluate, and address some risks but overlook others.

RESULT: More work is needed. Bug bounties are great, but they’re nowhere near comprehensive enough. New laws, such as the EU’s AI Act, will require tech companies to conduct audits, and it would have been great to see tech companies share successful examples of such audits. 

Commitment 5

The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.

Many of the companies have built watermarks for AI-generated content. For example, Google launched SynthID, a watermarking tool for image, audio, text, and video generated by Gemini. Meta has a tool called Stable Signature for images, and AudioSeal for AI-generated speech. Amazon now adds an invisible watermark to all images generated by its Titan Image Generator. OpenAI also uses watermarks in Voice Engine, its custom voice model, and has built an image-detection classifier for images generated by DALL-E 3. Anthropic was the only company that hadn’t built a watermarking tool, because watermarks are mainly used in images, which the company’s Claude model doesn’t support. 

All the companies excluding Inflection, Anthropic, and Meta are also part of the Coalition for Content Provenance and Authenticity (C2PA), an industry coalition that embeds information about when content was created, and whether it was created or edited by AI, into an image’s metadata. Microsoft and OpenAI automatically attach the C2PA’s provenance metadata to images generated with DALL-E 3 and videos generated with Sora. While Meta is not a member, it announced it is using the C2PA standard to identify AI-generated images on its platforms. 

The six companies that signed the commitments have a “natural preference to more technical approaches to addressing risk,” says Bommasani, “and certainly watermarking in particular has this flavor.”  

“The natural question is: Does [the technical fix] meaningfully make progress and address the underlying social concerns that motivate why we want to know whether content is machine generated or not?” he adds. 

RESULT: Good. This is an encouraging result overall. While watermarking remains experimental and is still unreliable, it’s still good to see research around it and a commitment to the C2PA standard. It’s better than nothing, especially during a busy election year.  

Commitment 6

The companies commit to publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use. This report will cover both security risks and societal risks, such as the effects on fairness and bias.

The White House’s commitments leave a lot of room for interpretation. For example, companies can technically meet this public reporting commitment with widely varying levels of transparency, as long as they do something in that general direction. 

The most common solutions tech companies offered here were so-called model cards. Each company calls them by a slightly different name, but in essence they act as a kind of product description for AI models. They can address anything from the model’s capabilities and limitations (including how it measures up against benchmarks on fairness and explainability) to veracity, robustness, governance, privacy, and security. Anthropic said it also tests models for potential safety issues that may arise later.

Microsoft has published an annual Responsible AI Transparency Report, which provides insight into how the company builds applications that use generative AI, make decisions, and oversees the deployment of those applications. The company also says it gives clear notice on where and how AI is used within its products.

RESULT: More work is needed. One area of improvement for AI companies would be to increase transparency on their governance structures and on the financial relationships between companies, Hickok says. She would also have liked to see companies be more public about data provenance, model training processes, safety incidents, and energy use. 

Commitment 7

The companies commit to prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy. The track record of AI shows the insidiousness and prevalence of these dangers, and the companies commit to rolling out AI that mitigates them. 

Tech companies have been busy on the safety research front, and they have embedded their findings into products. Amazon has built guardrails for Amazon Bedrock that can detect hallucinations and can apply safety, privacy, and truthfulness protections. Anthropic says it employs a team of researchers dedicated to researching societal risks and privacy. In the past year, the company has pushed out research on deception, jailbreaking, strategies to mitigate discrimination, and emergent capabilities such as models’ ability to tamper with their own code or engage in persuasion. And OpenAI says it has trained its models to avoid producing hateful content and refuse to generate output on hateful or extremist content. It trained its GPT-4V to refuse many requests that require drawing from stereotypes to answer. Google DeepMind has also released research to evaluate dangerous capabilities, and the company has done a study on misuses of generative AI. 

All of them have poured a lot of money into this area of research. For example, Google has invested millions into creating a new AI Safety Fund to promote research in the field through the Frontier Model Forum. Microsoft says it has committed $20 million in compute credits to researching societal risks through the National AI Research Resource and started its own AI model research accelerator program for academics, called the Accelerating Foundation Models Research program. The company has also hired 24 research fellows focusing on AI and society. 

RESULT: Very good. This is an easy commitment to meet, as the signatories are some of the biggest and richest corporate AI research labs in the world. While more research into how to make AI systems safe is a welcome step, critics say that the focus on safety research takes attention and resources from AI research that focuses on more immediate harms, such as discrimination and bias. 

Commitment 8

The companies commit to develop and deploy advanced AI systems to help address society’s greatest challenges. From cancer prevention to mitigating climate change to so much in between, AI—if properly managed—can contribute enormously to the prosperity, equality, and security of all.

Since making this commitment, tech companies have tackled a diverse set of problems. For example, Pfizer used Claude to assess trends in cancer treatment research after gathering relevant data and scientific content, and Gilead, an American biopharmaceutical company, used generative AI from Amazon Web Services to do feasibility evaluations on clinical studies and analyze data sets. 

Google DeepMind has a particularly strong track record in pushing out AI tools that can help scientists. For example, AlphaFold 3 can predict the structure and interactions of all life’s molecules. AlphaGeometry can solve geometry problems at a level comparable with the world’s brightest high school mathematicians. And GraphCast is an AI model that is able to make medium-range weather forecasts. Meanwhile, Microsoft has used satellite imagery and AI to improve responses to wildfires in Maui and map climate-vulnerable populations, which helps researchers expose risks such as food insecurity, forced migration, and disease. 

OpenAI, meanwhile, has announced partnerships and funding for various research projects, such as one looking at how multimodal AI models can be used safely by educators and by scientists in laboratory settings It has also offered credits to help researchers use its platforms during hackathons on clean energy development.  

RESULT: Very good. Some of the work on using AI to boost scientific discovery or predict weather events is genuinely exciting. AI companies haven’t used AI to prevent cancer yet, but that’s a pretty high bar. 

Overall, there have been some positive changes in the way AI has been built, such as red-teaming practices, watermarks and new ways for industry to share best practices. However, these are only a couple of neat technical solutions to the messy socio-technical problem that is AI harm, and a lot more work is needed. One year on, it is also odd to see the commitments talk about a very particular type of AI safety that focuses on hypothetical risks, such bioweapons, and completely fail to mention consumer protection, nonconsensual deepfakes, data and copyright, and the environmental footprint of AI models. These seem like weird omissions today. 

A short history of AI, and what it is (and isn’t)

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

It’s the simplest questions that are often the hardest to answer. That applies to AI, too. Even though it’s a technology being sold as a solution to the world’s problems, nobody seems to know what it really is. It’s a label that’s been slapped on technologies ranging from self-driving cars to facial recognition, chatbots to fancy Excel. But in general, when we talk about AI, we talk about technologies that make computers do things we think need intelligence when done by people. 

For months, my colleague Will Douglas Heaven has been on a quest to go deeper to understand why everybody seems to disagree on exactly what AI is, why nobody even knows, and why you’re right to care about it. He’s been talking to some of the biggest thinkers in the field, asking them, simply: What is AI? It’s a great piece that looks at the past and present of AI to see where it is going next. You can read it here

Here’s a taste of what to expect: 

Artificial intelligence almost wasn’t called “artificial intelligence” at all. The computer scientist John McCarthy is credited with coming up with the term in 1955 when writing a funding application for a summer research program at Dartmouth College in New Hampshire. But more than one of McCarthy’s colleagues hated it. “The word ‘artificial’ makes you think there’s something kind of phony about this,” said one. Others preferred the terms “automata studies,” “complex information processing,” “engineering psychology,” “applied epistemology,” “neural cybernetics,”  “non-numerical computing,” “neuraldynamics,” “advanced automatic programming,” and “hypothetical automata.” Not quite as cool and sexy as AI.

AI has several zealous fandoms. AI has acolytes, with a faith-like belief in the technology’s current power and inevitable future improvement. The buzzy popular narrative is shaped by a pantheon of big-name players, from Big Tech marketers in chief like Sundar Pichai and Satya Nadella to edgelords of industry like Elon Musk and Sam Altman to celebrity computer scientists like Geoffrey Hinton. As AI hype has ballooned, a vocal anti-hype lobby has risen in opposition, ready to smack down its ambitious, often wild claims. As a result, it can feel as if different camps are talking past one another, not always in good faith.

This sometimes seemingly ridiculous debate has huge consequences that affect us all. AI has a lot of big egos and vast sums of money at stake. But more than that, these disputes matter when industry leaders and opinionated scientists are summoned by heads of state and lawmakers to explain what this technology is and what it can do (and how scared we should be). They matter when this technology is being built into software we use every day, from search engines to word-processing apps to assistants on your phone. AI is not going away. But if we don’t know what we’re being sold, who’s the dupe?

For example, meet the TESCREALists. A clunky acronym (pronounced “tes-cree-all”) replaces an even clunkier list of labels: transhumanism, extropianism, singularitarianism, cosmism, rationalism, effective altruism, and longtermism. It was coined by Timnit Gebru, who founded the Distributed AI Research Institute and was Google’s former ethical AI co-lead, and Émile Torres, a philosopher and historian at Case Western Reserve University. Some anticipate human immortality; others predict humanity’s colonization of the stars. The common tenet is that an all-powerful technology is not only within reach but inevitable. TESCREALists believe that artificial general intelligence, or AGI, could not only fix the world’s problems but level up humanity. Gebru and Torres link several of these worldviews—with their common focus on “improving” humanity—to the racist eugenics movements of the 20th century.

Is AI math or magic? Either way, people have strong, almost religious beliefs in one or the other. “It’s offensive to some people to suggest that human intelligence could be re-created through these kinds of mechanisms,” Ellie Pavlick, who studies neural networks at Brown University, told Will. “People have strong-held beliefs about this issue—it almost feels religious. On the other hand, there’s people who have a little bit of a God complex. So it’s also offensive to them to suggest that they just can’t do it.”

Will’s piece really is the definitive look at this whole debate. No spoilers—there are no simple answers, but lots of fascinating characters and viewpoints. I’d recommend you read the whole thing here—and see if you can make your mind up about what AI really is.


Now read the rest of The Algorithm

Deeper Learning

AI can make you more creative—but it has limits

Generative AI models have made it simpler and quicker to produce everything from text passages and images to video clips and audio tracks. But while AI’s output can certainly seem creative, do these models actually boost human creativity?  

A new study looked at how people used OpenAI’s large language model GPT-4 to write short stories. The model was helpful—but only to an extent. The researchers found that while AI improved the output of less creative writers, it made little difference to the quality of the stories produced by writers who were already creative. The stories in which AI had played a part were also more similar to each other than those dreamed up entirely by humans. Read more from Rhiannon Williams.

Bits and Bytes

Robot-packed meals are coming to the frozen-food aisle
Found everywhere from airplanes to grocery stores, prepared meals are usually packed by hand. AI-powered robotics is changing that. (MIT Technology Review

AI is poised to automate today’s most mundane manual warehouse task
Pallets are everywhere, but training robots to stack them with goods takes forever. Fixing that could be a tangible win for commercial AI-powered robots. (MIT Technology Review)

The Chinese government is going all-in on autonomous vehicles
The government is finally allowing Tesla to bring its Full Self-Driving feature to China. New government permits let companies test driverless cars on the road and allow cities to build smart road infrastructure that will tell these cars where to go. (MIT Technology Review

The US and its allies took down a Russian AI bot farm on X
The US seized control of a sophisticated Russian operation that used AI to push propaganda through nearly a thousand covert accounts on the social network X. Western intelligence agencies traced the propaganda mill to an officer of the Russian FSB intelligence force and to a former senior editor at state-controlled publication RT, formerly called Russia Today. (The Washington Post)

AI investors are starting to wonder: Is this just a bubble?
After a massive investment in the language-model boom, the biggest beneficiary is Nvidia, which designs and sells the best chips for training and running modern AI models. Investors are now starting to ask what LLMs are actually going to be used for, and when they will start making them money. (New York magazine

Goldman Sachs thinks AI is overhyped, wildly expensive, and unreliable
Meanwhile, the major investment bank published a research paper about the economic viability of generative AI. It notes that there is “little to show for” the huge amount of spending on generative AI infrastructure and questions “whether this large spend will ever pay off in terms of AI benefits and returns.” (404 Media

The UK politician accused of being AI is actually a real person
A hilarious story about how Mark Matlock, a candidate for the far-right Reform UK party, was accused of being a fake candidate created with AI after he didn’t show up to campaign events. Matlock has assured the press he is a real person, and he wasn’t around because he had pneumonia. (The Verge

Building supply chain resilience with AI

If the last five years have taught businesses with complex supply chains anything, it is that resilience is crucial. In the first three months of the covid-19 pandemic, for example, supply-chain leader Amazon grew its business 44%. Its investments in supply chain resilience allowed it to deliver when its competitors could not, says Sanjeev Maddila, worldwide head of supply chain solutions at Amazon Web Services (AWS), increasing its market share and driving profits up 220%. A resilient supply chain ensures that a company can meet its customers’ needs despite inevitable disruption.

Today, businesses of all sizes must deliver to their customers against a backdrop of supply chain disruptions, with technological changes, shifting labor pools, geopolitics, and climate change adding new complexity and risk at a global scale. To succeed, they need to build resilient supply chains: fully digital operations that prioritize customers and their needs while establishing a fast, reliable, and sustainable delivery network.

The Canadian fertilizer company Nutrien, for example, operates two dozen manufacturing and processing facilities spread across the globe and nearly 2,000 retail stores in the Americas and Australia. To collect underutilized data from its industrial operations, and gain greater visibility into its supply chain, the company relies on a combination of cloud technology and artificial intelligence/machine learning (AI/ML) capabilities.

“A digital supply chain connects us from grower to manufacturer, providing visibility throughout the value chain,” says Adam Lorenz, senior director for strategic fleet and indirect procurement at Nutrien. This visibility is critical when it comes to navigating the company’s supply chain challenges, which include seasonal demands, weather dependencies, manufacturing capabilities, and product availability. The company requires real-time visibility into its fleets, for example, to identify the location of assets, see where products are moving, and determine inventory requirements.

Currently, Nutrien can locate a fertilizer or nutrient tank in a grower’s field and determine what Nutrien products are in it. By achieving that “real-time visibility” into a tank’s location and a customer’s immediate needs, Lorenz says the company “can forecast where assets are from a fill-level perspective and plan accordingly.” In turn, Nutrien can respond immediately to emerging customer needs, increasing company revenue while enhancing customer satisfaction, improving inventory management, and optimizing supply chain operations.

“For us, it’s about starting with data creation and then adding a layer of AI on top to really drive recommendations,” says Lorenz. In addition to improving product visibility and asset utilization, Lorenz says that Nutrien plans to add AI capabilities to its collaboration platforms that will make it easier for less-tech-savvy customers to take advantage of self-service capabilities and automation that accelerates processes and improves compliance with complex policies.

To meet and exceed customer expectations with differentiated service, speed, and reliability, all companies need to similarly modernize their supply chain operations. The key to doing so—and to increasing organizational resilience and sustainability—will be applying AI/ML to their extensive operational data in the cloud.

Resilience as a business differentiator

Like Nutrien, a wide variety of organizations from across industries are discovering the competitive advantages of modernizing their supply chains. A pharmaceutical company that aggregates its supply chain data for greater end-to-end visibility, for example, can provide better product tracking for critically ill customers. A retail startup undergoing meteoric growth can host its workloads in the cloud to support sudden upticks in demand while minimizing operating costs. And a transportation company can achieve inbound supply chain savings by evaluating the total distance its fleet travels to reduce mileage costs and CO2 emissions.

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

AI is poised to automate today’s most mundane manual warehouse task

Before almost any item reaches your door, it traverses the global supply chain on a pallet. More than 2 billion pallets are in circulation in the United States alone, and $400 billion worth of goods are exported on them annually. However, loading boxes onto these pallets is a task stuck in the past: Heavy loads and repetitive movements leave workers at high risk of injury, and in the rare instances when robots are used, they take months to program using handheld computers that have changed little since the 1980s.

Jacobi Robotics, a startup spun out of the labs of the University of California, Berkeley, says it can vastly speed up that process with AI command-and-control software. The researchers approached palletizing—one of the most common warehouse tasks—as primarily an issue of motion planning: How do you safely get a robotic arm to pick up boxes of different shapes and stack them efficiently on a pallet without getting stuck? And all that computation also has to be fast, because factory lines are producing more varieties of products than ever before—which means boxes of more shapes and sizes.

After much trial and error, Jacobi’s founders, including roboticist Ken Goldberg, say they’ve cracked it. Their software, built upon research from a paper they published in Science Robotics in 2020, is designed to work with the four leading makers of robotic palletizing arms. It uses deep learning to generate a “first draft” of how an arm might move an item onto the pallet. Then it uses more traditional robotics methods, like optimization, to check whether the movement can be done safely and without glitches. 

Jacobi aims to replace the legacy methods customers are currently using to train their bots. In the conventional approach, robots are programmed using tools called “teaching pendants,” and customers usually have to manually guide the robot to demonstrate how to pick up each individual box and place it on the pallet. The entire coding process can take months. Jacobi says its AI-driven solution promises to cut that time down to a day and can compute motions in less than a millisecond. The company says it plans to launch its product later this month.

Billions of dollars are being poured into AI-powered robotics, but most of the excitement is geared toward next-generation robots that promise to be capable of many different tasks—like the humanoid robot that has helped Figure raise $675 million from investors, including Microsoft and OpenAI, and reach a $2.6 billion evaluation in February. Against this backdrop, using AI to train a better box-stacking robot might feel pretty basic. 

Indeed, Jacobi’s seed funding round is trivial in comparison: $5 million led by Moxxie Ventures. But amid hype around promised robotics breakthroughs that could take years to materialize, palletizing might be the warehouse problem AI is best poised to solve in the short term. 

“We have a very pragmatic approach,” says Max Cao, Jacobi’s co-founder and CEO. “These tasks are within reach, and we can get a lot of adoption within a short time frame, versus some of the moonshots out there.”

Jacobi’s software product includes a virtual studio where customers can build replicas of their setups, capturing factors like which robot models they have, what types of boxes will come off the conveyor belt, and which direction the labels should face. A warehouse moving sporting goods, say, might use the program to figure out the best way to stack a mixed pallet of tennis balls, rackets, and apparel. Then Jacobi’s algorithms will automatically plan the many movements the robotic arm should take to stack the pallet, and the instructions will be transmitted to the robot.

JACOBI ROBOTICS

The approach merges the benefits of fast computing provided by AI with the accuracy of more traditional robotics techniques, says Dmitry Berenson, a professor of robotics at the University of Michigan, who is not involved with the company.

“They’re doing something very reasonable here,” he says. A lot of modern robotics research is betting big on AI, hoping that deep learning can augment or replace more manual training by having the robot learn from past examples of a given motion or task. But by making sure the predictions generated by deep learning are checked against the results of more traditional methods, Jacobi is developing planning algorithms that will likely be less prone to error, Berenson says.

The planning speed that could result “is pushing this into a new category,” he adds. “You won’t even notice the time it takes to compute a motion. That’s really important in the industrial setting, where every pause means delays.”

AI can make you more creative—but it has limits

Generative AI models have made it simpler and quicker to produce everything from text passages and images to video clips and audio tracks. Texts and media that might have taken years for humans to create can now be generated in seconds.

But while AI’s output can certainly seem creative, do these models actually boost human creativity?  

That’s what two researchers set out to explore in new research published today in Science Advances, studying how people used OpenAI’s large language model GPT-4 to write short stories.

The model was helpful—but only to an extent. They found that while AI improved the output of less creative writers, it made little difference to the quality of the stories produced by writers who were already creative. The stories in which AI had played a part were also more similar to each other than those dreamed up entirely by humans. 

The research adds to the growing body of work investigating how generative AI affects human creativity, suggesting that although access to AI can offer a creative boost to an individual, it reduces creativity in the aggregate. 

To understand generative AI’s effect on humans’ creativity, we first need to determine how creativity is measured. This study used two metrics: novelty and usefulness. Novelty refers to a story’s originality, while usefulness in this context reflects the possibility that each resulting short story could be developed into a book or other publishable work. 

First, the authors recruited 293 people through the research platform Prolific to complete a task designed to measure their inherent creativity. Participants were instructed to provide 10 words that were as different from each other as possible.

Next, the participants were asked to write an eight-sentence story for young adults on one of three topics: an adventure in the jungle, on open seas, or on a different planet. First, though, they were randomly sorted into three groups. The first group had to rely solely on their own ideas, while the second group was given the option to receive a single story idea from GPT-4. The third group could elect to receive up to five story ideas from the AI model.

Of the participants with the option of AI assistance, the vast majority—88.4%—took advantage of it. They were then asked to evaluate how creative they thought their stories were, before a separate group of 600 recruits reviewed their efforts. Each reviewer was shown six stories and asked to give feedback on the stylistic characteristics, novelty, and usefulness of the story.

The researchers found that the writers with the greatest level of access to the AI model were evaluated as showing the most creativity. Of these, the writers who had scored as less creative on the first test benefited the most. 

However, the stories produced by writers who were already creative didn’t get the same boost. “We see this leveling effect where the least creative writers get the biggest benefit,” says Anil Doshi, an assistant professor at the UCL School of Management in the UK, who coauthored the paper. “But we don’t see any kind of respective benefit to be gained from the people who are already inherently creative.”

The findings make sense, given that people who are already creative don’t really need to use AI to be creative, says Tuhin Chakrabarty, a computer science researcher at Columbia University, who specializes in AI and creativity but wasn’t involved in the study. 

There are some potential drawbacks to taking advantage of the model’s help, too. AI-generated stories across the board are similar in terms of semantics and content, Chakrabarty says, and AI-generated writing is full of telltale giveaways, such as very long, exposition-heavy sentences that contain lots of stereotypes.   

“These kinds of idiosyncrasies probably also reduce the overall creativity,” he says. “Good writing is all about showing, not telling. AI is always telling.”

Because stories generated by AI models can only draw from the data that those models have been trained on, those produced in the study were less distinctive than the ideas the human participants came up with entirely on their own. If the publishing industry were to embrace generative AI, the books we read could become more homogenous, because they would all be produced by models trained on the same corpus.

This is why it’s essential to study what AI models can and, crucially, can’t do well as we grapple with what the rapidly evolving technology means for society and the economy, says Oliver Hauser, a professor at the University of Exeter Business School, another coauthor of the study. “Just because technology can be transformative, it doesn’t mean it will be,” he says.

Robot-packed meals are coming to the frozen-food aisle

Advances in artificial intelligence are coming to your freezer, in the form of robot-assembled prepared meals. 

Chef Robotics, a San Francisco–based startup, has launched a system of AI-powered robotic arms that can be quickly programmed with a recipe to dole out accurate portions of everything from tikka masala to pesto tortellini. After experiments with leading brands, including Amy’s Kitchen, the company says its robots have proved their worth and are being rolled out at scale to more production facilities. They are also being offered to new customers in the US and Canada. 

You might think the meals that end up in the grocery store’s frozen aisle, at Starbucks, or on airplanes are robot-packed already, but that’s rarely the case. Workers are often much more flexible than robots and can handle production lines that frequently rotate recipes. Not only that, but certain ingredients, like rice or shredded cheese, are hard to portion out with robotic arms. That means the vast majority of meals from recognizable brands are still typically hand-packed. 

However, advancements from AI have changed the calculus, making robots more useful on production lines, says David Griego, senior director of engineering at Amy’s.

“Before Silicon Valley got involved, the industry was much more about ‘Okay, we’re gonna program—a robot is gonna do this and do this only,’” he says. For a brand with so many different meals, that wasn’t very helpful. But the robots Griego is now able to add to the production line can learn how scooping a portion of peas is different from scooping cauliflower, and they can improve their accuracy for next time. “It’s astounding just how they can adapt to all the different types of ingredients that we use,” he says. Meal-packing robots suddenly make much more financial sense. 

Rather than selling the machines outright, Chef uses a service model, where customers pay a yearly fee that covers maintenance and training. Amy’s currently uses eight systems (each with two robotic arms) spread across two of its plants. One of these systems can now do the work of two to four workers depending on which ingredients are being packed, Griego says. The robots also reduce waste, since they can pack more consistent portions than their human counterparts. One-arm systems typically cost less than $135,000 per year, according to Chef CEO Rajat Bhageria.

With these advantages in mind, Griego imagines the robots handling more and more of the meal assembly process. “I have a vision,” he says, “where the only thing people would do is run the systems.” They’d make sure the hoppers of ingredients and packaging materials were full, for example, and the robots would do the rest. 

Robot chefs have been getting more skilled in recent years thanks to AI, and some companies have promised that burger-flipping and nugget-frying robots can provide cost savings to restaurants. But much of this technology has seen little adoption in the restaurant industry so far, says Bhageria. That’s because fast-casual restaurants often only need one cook running the grill, and if a robot cannot fully replace that person because it still needs supervision, it makes little sense to use it. Packaged meal companies, however, have a larger source of labor costs that they want to bring down: plating and assembly.

“That’s going to be the highest bang for our buck for our customers,” Bhageria says. 

CHEF

The notion that more flexible robots could mean broader adoption in new industries is no surprise, says Lerrel Pinto, who leads the General-Purpose Robotics and AI Lab at New York University and is not involved with Chef or Amy’s Kitchen. 

“A lot of robots deployed in the real world are used in a very repetitive way, where they’re supposed to do the same thing over and over again,” he says. Deep learning has caused a paradigm shift over the past few years, sparking the idea that more generally capable robots might be not only possible but necessary for more widespread adoption. If Chef’s robots can perform without frequent stops for repair or training, they could deliver material savings to food companies and shift how they use human labor, Pinto says: “In the next few years, we will probably see a lot more companies trying to actually deploy these types of learning-based robots in the real world.”

One new challenge the robots have created for Amy’s, Griego says, is maintaining the look of a hand-packed meal when it was assembled by a robot. The company’s cheese enchilada dish in particular was causing trouble: it’s finished with a hand-distributed sprinkling of cheddar on top, but Amy’s panel of examiners said the cheese on the robot-packed dish looked too machine-spread, sending Griego back to the drawing board.

“The first few tests went pretty well,” he says. After a couple of changes, the robots are ready to take over. Amy’s plans to bring them to more of its facilities and train them on a growing list of ingredients, meaning your frozen meals are increasingly likely to be packed by a robot.

Update: This story has been amended to include updating pricing information from Chef.