The next generation of neural networks could live in hardware

Networks programmed directly into computer chip hardware can identify images faster, and use much less energy, than the traditional neural networks that underpin most modern AI systems. That’s according to work presented at a leading machine learning conference in Vancouver last week.

Neural networks, from GPT-4 to Stable Diffusion, are built by wiring together perceptrons, which are highly simplified simulations of the neurons in our brains. In very large numbers, perceptrons are powerful, but they also consume enormous volumes of energy—so much that Microsoft has penned a deal that will reopen Three Mile Island to power its AI advancements.

Part of the trouble is that perceptrons are just software abstractions—running a perceptron network on a GPU requires translating that network into the language of hardware, which takes time and energy. Building a network directly from hardware components does away with a lot of those costs. One day, they could even be built directly into chips used in smartphones and other devices, dramatically reducing the need to send data to and from servers.

Felix Petersen, who did this work as a postdoctoral researcher at Stanford University, has a strategy for making that happen. He designed networks composed of logic gates, which are some of the basic building blocks of computer chips. Made up of a few transistors apiece, logic gates accept two bits—1s or 0s—as inputs and, according to a rule determined by their specific pattern of transistors, output a single bit. Just like perceptrons, logic gates can be chained up into networks. And running logic-gate networks is cheap, fast, and easy: in his talk at the Neural Information Processing Systems (NeurIPS) conference, Petersen said that they consume less energy than perceptron networks by a factor of hundreds of thousands.

Logic-gate networks don’t perform nearly as well as traditional neural networks on tasks like image labeling. But the approach’s speed and efficiency make it promising, according to Zhiru Zhang, a professor of electrical and computer engineering at Cornell University. “If we can close the gap, then this could potentially open up a lot of possibilities on this edge of machine learning,” he says.

Petersen didn’t go looking for ways to build energy-efficient AI networks. He came to logic gates through an interest in “differentiable relaxations,” or strategies for wrangling certain classes of mathematical problems into a form that calculus can solve. “It really started off as a mathematical and methodological curiosity,” he says.

Backpropagation, the training algorithm that made the deep-learning revolution possible, was an obvious use case for this approach. Because backpropagation runs on calculus, it can’t be used directly to train logic-gate networks. Logic gates only work with 0s and 1s, and calculus demands answers about all the fractions in between. Petersen devised a way to “relax” logic-gate networks enough for backpropagation by creating functions that work like logic gates on 0s and 1s but also give answers for intermediate values. He ran simulated networks with those gates through training and then  converted the relaxed logic-gate network back into something that he could implement in computer hardware.

One challenge with this approach  is that training the relaxed networks is tough. Each node in the network could end up as any one of 16 different logic gates, and the 16 probabilities associated with each of those gates must be kept track of and continually adjusted. That takes a huge amount of time and energy—during his NeurIPS talk, Petersen said that training his networks takes hundreds of times longer than training conventional neural networks on GPUs. At universities, which can’t afford to amass hundreds of thousands of GPUs, that amount of GPU time can be tough to swing—Petersen developed these networks, in collaboration with his colleagues, at Stanford University and the University of Konstanz. “It definitely makes the research tremendously hard,” he says. 

Once the network has been trained, though, things get way, way cheaper. Petersen compared his logic-gate networks with a cohort of other ultra-efficient networks, such as binary neural networks, which use simplified perceptrons that can process only binary values. The logic-gate networks did just as well as these other efficient methods at classifying images in the CIFAR-10 data set, which includes 10 different categories of low-resolution pictures, from “frog” to “truck.” It achieved this with fewer than a tenth of the logic gates required by those other methods, and in less than a thousandth of the time. Petersen tested his networks using programmable computer chips called FPGAs, which can be used to emulate many different potential patterns of logic gates; implementing the networks in non-programmable ASIC chips would reduce costs even further, because programmable chips need to use more components in order to achieve their flexibility.

Farinaz Koushanfar, a professor of electrical and computer engineering at the University of California, San Diego, says she isn’t convinced that logic-gate networks will be able to perform when faced with more realistic problems. “It’s a cute idea, but I’m not sure how well it scales,” she says. She notes that the logic-gate networks can only be trained approximately, via the relaxation strategy, and approximations can fail. That hasn’t caused issues yet, but Koushanfar says that it could prove more problematic as the networks grow. 

Nevertheless, Petersen is ambitious. He plans to continue pushing the abilities of his logic-gate networks, and he hopes, eventually, to create what he calls a “hardware foundation model.” A powerful, general-purpose logic-gate network for vision could be mass-produced directly on computer chips, and those chips could be integrated into devices like personal phones and computers. That could reap enormous energy benefits, Petersen says. If those networks could effectively reconstruct photos and videos from low-resolution information, for example, then far less data would need to be sent between servers and personal devices. 

Petersen acknowledges that logic-gate networks will never compete with traditional neural networks on performance, but that isn’t his goal. Making something that works, and that is as efficient as possible, should be enough. “It won’t be the best model,” he says. “But it should be the cheapest.”

Accelerating AI innovation through application modernization

Business applications powered by AI are revolutionizing customer experiences, accelerating the speed of business, and driving employee productivity. In fact, according to research firm Frost & Sullivan’s 2024 Global State of AI report, 89% of organizations believe AI and machine learning will help them grow revenue, boost operational efficiency, and improve customer experience.

Take for example, Vodafone. The telecommunications company is using a suite of Azure AI services, such as Azure OpenAI Service, to deliver real-time, hyper-personalized experiences across all of its customer touchpoints, including its digital chatbot TOBi. By leveraging AI to increase customer satisfaction, Naga Surendran, senior director of product marketing for Azure Application Services at Microsoft, says Vodafone has managed to resolve 70% of its first-stage inquiries through AI-powered digital channels. It has also boosted the productivity of support agents by providing them with access to AI capabilities that mirror those of Microsoft Copilot, an AI-powered productivity tool.

“The result is a 20-point increase in net promotor score,” he says. “These benefits are what’s driving AI infusion into every business process and application.”

Yet realizing measurable business value from AI-powered applications requires a new game plan. Legacy application architectures simply aren’t capable of meeting the high demands of AI-enhanced applications. Rather, the time is now for organizations to modernize their infrastructure, processes, and application architectures using cloud native technologies to stay competitive.

The time is now for modernization

Today’s organizations exist in an era of geopolitical shifts, growing competition, supply chain disruptions, and evolving consumer preferences. AI applications can help by supporting innovation, but only if they have the flexibility to scale when needed. Fortunately, by modernizing applications, organizations can achieve the agile development, scalability, and fast compute performance needed to support rapid innovation and accelerate the delivery of AI applications. David Harmon, director of software development for AMD says companies, “really want to make sure that they can migrate their current [environment] and take advantage of all the hardware changes as much as possible.” The result is not only a reduction in the overall development lifecycle of new applications but a speedy response to changing world circumstances.

Beyond building and deploying intelligent apps quickly, modernizing applications, data, and infrastructure can significantly improve customer experience. Consider, for example, Coles, an Australian supermarket that invested in modernization and is using data and AI to deliver dynamic e-commerce experiences to its customers both online and in-store. With Azure DevOps, Coles has shifted from monthly to weekly deployments of applications while, at the same time, reducing build times by hours. What’s more, by aggregating views of customers across multiple channels, Coles has been able to deliver more personalized customer experiences. In fact, according to a 2024 CMSWire Insights report, there is a significant rise in the use of AI across the digital customer experience toolset, with 55% of organizations now using it to some degree, and more beginning their journey.

But even the most carefully designed applications are vulnerable to cybersecurity attacks. If given the opportunity, bad actors can extract sensitive information from machine learning models or maliciously infuse AI systems with corrupt data. “AI applications are now interacting with your core organizational data,” says Surendran. “Having the right guard rails is important to make sure the data is secure and built on a platform that enables you to do that.” The good news is modern cloud based architectures can deliver robust security, data governance, and AI guardrails like content safety to protect AI applications from security threats and ensure compliance with industry standards.

The answer to AI innovation

New challenges, from demanding customers to ill-intentioned hackers, call for a new approach to modernizing applications. “You have to have the right underlying application architecture to be able to keep up with the market and bring applications faster to market,” says Surendran. “Not having that foundation can slow you down.”

Enter cloud native architecture. As organizations increasingly adopt AI to accelerate innovation and stay competitive, there is a growing urgency to rethink how applications are built and deployed in the cloud. By adopting cloud native architectures, Linux, and open source software, organizations can better facilitate AI adoption and create a flexible platform purpose built for AI and optimized for the cloud. Harmon explains that open source software creates options, “And the overall open source ecosystem just thrives on that. It allows new technologies to come into play.”

Application modernization also ensures optimal performance, scale, and security for AI applications. That’s because modernization goes beyond just lifting and shifting application workloads to cloud virtual machines. Rather, a cloud native architecture is inherently designed to provide developers with the following features:

  • The flexibility to scale to meet evolving needs
  • Better access to the data needed to drive intelligent apps
  • Access to the right tools and services to build and deploy intelligent applications easily
  • Security embedded into an application to protect sensitive data

Together, these cloud capabilities ensure organizations derive the greatest value from their AI applications. “At the end of the day, everything is about performance and security,” says Harmon. Cloud is no exception.

What’s more, Surendran notes that “when you leverage a cloud platform for modernization, organizations can gain access to AI models faster and get to market faster with building AI-powered applications. These are the factors driving the modernization journey.”

Best practices in play

For all the benefits of application modernization, there are steps organizations must take to ensure both technological and operational success. They are:

Train employees for speed. As modern infrastructure accelerates the development and deployment of AI-powered applications, developers must be prepared to work faster and smarter than ever. For this reason, Surendran warns, “Employees must be skilled in modern application development practices to support the digital business needs.” This includes developing expertise in working with loosely coupled microservices to build scalable and flexible application and AI integration.

Start with an assessment. Large enterprises are likely to have “hundreds of applications, if not thousands,” says Surendran. As a result, organizations must take the time to evaluate their application landscape before embarking on a modernization journey. “Starting with an assessment is super important,” continues Surendran. “Understanding, taking inventory of the different applications, which team is using what, and what this application is driving from a business process perspective is critical.”

Focus on quick wins. Modernization is a huge, long-term transformation in how companies build, deliver, and support applications. Most businesses are still learning and developing the right strategy to support innovation. For this reason, Surendran recommends focusing on quick wins while also working on a larger application estate transformation. “You have to show a return on investment for your organization and business leaders,” he says. For example, modernize some apps quickly with re-platforming and then infuse them with AI capabilities.

Partner up. “Modernization can be daunting,” says Surendran. Selecting the right strategy, process, and platform to support innovation is only the first step. Organizations must also “bring on the right set of partners to help them go through change management and the execution of this complex project.”

Address all layers of security. Organizations must be unrelenting when it comes to protecting their data. According to Surendran, this means adopting a multi-layer approach to security that includes: security by design, in which products and services are developed from the get-go with security in mind; security by default, in which protections exist at every layer and interaction where data exists; and security by ongoing operations, which means using the right tools and dashboards to govern applications throughout their lifecycle.

A look to the future

Most organizations are already aware of the need for application modernization. But with the arrival of AI comes the startling revelation that modernization efforts must be done right, and that AI applications must be built and deployed for greater business impact. Adopting a cloud native architecture can help by serving as a platform for enhanced performance, scalability, security, and ongoing innovation. “As soon as you modernize your infrastructure with a cloud platform, you have access to these rapid innovations in AI models,” says Surendran. “It’s about being able to continuously innovate with AI.”

Read more about how to accelerate app and data estate readiness for AI innovation with Microsoft Azure and AMD. Explore Linux on Azure.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

AI is changing how we study bird migration

A small songbird soars above Ithaca, New York, on a September night. He is one of 4 billion birds, a great annual river of feathered migration across North America. Midair, he lets out what ornithologists call a nocturnal flight call to communicate with his flock. It’s the briefest of signals, barely 50 milliseconds long, emitted in the woods in the middle of the night. But humans have caught it nevertheless, with a microphone topped by a focusing funnel. Moments later, software called BirdVoxDetect, the result of a collaboration between New York University, the Cornell Lab of Ornithology, and École Centrale de Nantes, identifies the bird and classifies it to the species level.

Biologists like Cornell’s Andrew Farnsworth had long dreamed of snooping on birds this way. In a warming world increasingly full of human infrastructure that can be deadly to them, like glass skyscrapers and power lines, migratory birds are facing many existential threats. Scientists rely on a combination of methods to track the timing and location of their migrations, but each has shortcomings. Doppler radar, with the weather filtered out, can detect the total biomass of birds in the air, but it can’t break that total down by species. GPS tags on individual birds and careful observations by citizen-scientist birders help fill in that gap, but tagging birds at scale is an expensive and invasive proposition. And there’s another key problem: Most birds migrate at night, when it’s more difficult to identify them visually and while most birders are in bed. For over a century, acoustic monitoring has hovered tantalizingly out of reach as a method that would solve ornithologists’ woes.

In the late 1800s, scientists realized that migratory birds made species-specific nocturnal flight calls—“acoustic fingerprints.” When microphones became commercially available in the 1950s, scientists began recording birds at night. Farnsworth led some of this acoustic ecology research in the 1990s. But even then it was challenging to spot the short calls, some of which are at the edge of the frequency range humans can hear. Scientists ended up with thousands of tapes they had to scour in real time while looking at spectrograms that visualize audio. Though digital technology made recording easier, the “perpetual problem,” Farnsworth says, “was that it became increasingly easy to collect an enormous amount of audio data, but increasingly difficult to analyze even some of it.”

Then Farnsworth met Juan Pablo Bello, director of NYU’s Music and Audio Research Lab. Fresh off a project using machine learning to identify sources of urban noise pollution in New York City, Bello agreed to take on the problem of nocturnal flight calls. He put together a team including the French machine-listening expert Vincent Lostanlen, and in 2015, the BirdVox project was born to automate the process. “Everyone was like, ‘Eventually, when this nut is cracked, this is going to be a super-rich source of information,’” Farnsworth says. But in the beginning, Lostanlen recalls, “there was not even a hint that this was doable.” It seemed unimaginable that machine learning could approach the listening abilities of experts like Farnsworth.

“Andrew is our hero,” says Bello. “The whole thing that we want to imitate with computers is Andrew.”

They started by training BirdVoxDetect, a neural network, to ignore faults like low buzzes caused by rainwater damage to microphones. Then they trained the system to detect flight calls, which differ between (and even within) species and can easily be confused with the chirp of a car alarm or a spring peeper. The challenge, Lostanlen says, was similar to the one a smart speaker faces when listening for its unique “wake word,” except in this case the distance from the target noise to the microphone is far greater (which means much more background noise to compensate for). And, of course, the scientists couldn’t choose a unique sound like “Alexa” or “Hey Google” for their trigger. “For birds, we don’t really make that choice. Charles Darwin made that choice for us,” he jokes. Luckily, they had a lot of training data to work with—Farnsworth’s team had hand-annotated thousands of hours of recordings collected by the microphones in Ithaca.

With BirdVoxDetect trained to detect flight calls, another difficult task lay ahead: teaching it to classify the detected calls by species, which few expert birders can do by ear. To deal with uncertainty, and because there is not training data for every species, they decided on a hierarchical system. For example, for a given call, BirdVoxDetect might be able to identify the bird’s order and family, even if it’s not sure about the species—just as a birder might at least identify a call as that of a warbler, whether yellow-rumped or chestnut-sided. In training, the neural network was penalized less when it mixed up birds that were closer on the taxonomical tree.  

Last August, capping off eight years of research, the team published a paper detailing BirdVoxDetect’s machine-learning algorithms. They also released the software as a free, open-source product for ornithologists to use and adapt. In a test on a full season of migration recordings totaling 6,671 hours, the neural network detected 233,124 flight calls. In a 2022 study in the Journal of Applied Ecology, the team that tested BirdVoxDetect found acoustic data as effective as radar for estimating total biomass.

BirdVoxDetect works on a subset of North American migratory songbirds. But through “few-shot” learning, it can be trained to detect other, similar birds with just a few training examples. It’s like learning a language similar to one you already speak, Bello says. With cheap microphones, the system could be expanded to places around the world without birders or Doppler radar, even in vastly different recording conditions. “If you go to a bioacoustics conference and you talk to a number of people, they all have different use cases,” says Lostanlen. The next step for bioacoustics, he says, is to create a foundation model, like the ones scientists are working on for natural-language processing and image and video analysis, that would be reconfigurable for any species—even beyond birds. That way, scientists won’t have to build a new BirdVoxDetect for every animal they want to study.

The BirdVox project is now complete, but scientists are already building on its algorithms and approach. Benjamin Van Doren, a migration biologist at the University of Illinois Urbana-Champaign who worked on BirdVox, is using Nighthawk, a new user-friendly neural network based on both BirdVoxDetect and the popular birdsong ID app Merlin, to study birds migrating over Chicago and elsewhere in North and South America. And Dan Mennill, who runs a bioacoustics lab at the University of Windsor, says he’s excited to try Nighthawk on flight calls his team currently hand-­annotates after they’re recorded by microphones on the Canadian side of the Great Lakes. One weakness of acoustic monitoring is that unlike radar, a single microphone can’t detect the altitude of a bird overhead or the direction in which it is moving. Mennill’s lab is experimenting with an array of eight microphones that can triangulate to solve that problem. Sifting through recordings has been slow. But with Nighthawk, the analysis will speed dramatically.

With birds and other migratory animals under threat, Mennill says, BirdVoxDetect came at just the right time. Knowing exactly which birds are flying over in real time can help scientists keep tabs on how species are doing and where they’re going. That can inform practical conservation efforts like “Lights Out” initiatives that encourage skyscrapers to go dark at night to prevent bird collisions. “Bioacoustics is the future of migration research, and we’re really just getting to the stage where we have the right tools,” he says. “This ushers us into a new era.”

Christian Elliott is a science and environmental reporter based in Illinois.  

This is where the data to build AI comes from

AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from. 

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations. 

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI’s data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies. 

In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, a researcher at MIT who is part of the project. 

It came not just from encyclopedias and the web, but also from sources such as parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and collected from different sources to suit individual tasks, Longpre says.

Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector started seeing performance get better the bigger the models and data sets were. Today, most AI data sets are built by indiscriminately hoovering material from the internet. Since 2018, the web has been the dominant source for data sets used in all media, such as audio, images, and video, and a gap between scraped data and more curated data sets has emerged and widened.

“In foundation model development, nothing seems to matter more for the capabilities than the scale and heterogeneity of the data and the web,” says Longpre. The need for scale has also boosted the use of synthetic data massively.

The past few years have also seen the rise of multimodal generative AI models, which can generate videos and images. Like large language models, they need as much data as possible, and the best source for that has become YouTube. 

For video models, as you can see in this chart, over 70% of data for both speech and image data sets comes from one source.

This could be a boon for Alphabet, Google’s parent company, which owns YouTube. Whereas text is distributed across the web and controlled by many different websites and platforms, video data is extremely concentrated in one platform.

“It gives a huge concentration of power over a lot of the most important data on the web to one company,” says Longpre. 

And because Google is also developing its own AI models, its massive advantage also raises questions about how the company will make this data available for competitors, says Sarah Myers West, the co–executive director at the AI Now Institute.

“It’s important to think about data not as though it’s sort of this naturally occurring resource, but it’s something that is created through particular processes,” says Myers West.

“If the data sets on which most of the AI that we’re interacting with reflect the intentions and the design of big, profit-motivated corporations—that’s reshaping the infrastructures of our world in ways that reflect the interests of those big corporations,” she says.

This monoculture also raises questions about how accurately the human experience is portrayed in the data set and what kinds of models we are building, says Sara Hooker, the vice president of research at the technology company Cohere, who is also part of the Data Provenance Initiative.

People upload videos to YouTube with a particular audience in mind, and the way people act in those videos is often intended for very specific effect. “Does [the data] capture all the nuances of humanity and all the ways that we exist?” says Hooker. 

Hidden restrictions

AI companies don’t usually share what data they used to train their models. One reason is that they want to protect their competitive edge. The other is that because of the complicated and opaque way data sets are bundled, packaged, and distributed, they likely don’t even know where all the data came from.

They also probably don’t have complete information about any constraints on how that data is supposed to be used or shared. The researchers at the Data Provenance Initiative found that data sets often have restrictive licenses or terms attached to them, which should limit their use for commercial purposes, for example.

“This lack of consistency across the data lineage makes it very hard for developers to make the right choice about what data to use,” says Hooker.

It also makes it almost impossible to be completely certain you haven’t trained your model on copyrighted data, adds Longpre.

More recently, companies such as OpenAI and Google have struck exclusive data-sharing deals with publishers, major forums such as Reddit, and social media platforms on the web. But this becomes another way for them to concentrate their power.

“These exclusive contracts can partition the internet into various zones of who can get access to it and who can’t,” says Longpre.

The trend benefits the biggest AI players, who can afford such deals, at the expense of researchers, nonprofits, and smaller companies, who will struggle to get access. The largest companies also have the best resources for crawling data sets.

“This is a new wave of asymmetric access that we haven’t seen to this extent on the open web,” Longpre says.

The West vs. the rest

The data that is used to train AI models is also heavily skewed to the Western world. Over 90% of the data sets that the researchers analyzed came from Europe and North America, and fewer than 4% came from Africa. 

“These data sets are reflecting one part of our world and our culture, but completely omitting others,” says Hooker.

The dominance of the English language in training data is partly explained by the fact that the internet is still over 90% in English, and there are still a lot of places on Earth where there’s really poor internet connection or none at all, says Giada Pistilli, principal ethicist at Hugging Face, who was not part of the research team. But another reason is convenience, she adds: Putting together data sets in other languages and taking other cultures into account requires conscious intention and a lot of work. 

The Western focus of these data sets becomes particularly clear with multimodal models. When an AI model is prompted for the sights and sounds of a wedding, for example, it might only be able to represent Western weddings, because that’s all that it has been trained on, Hooker says. 

This reinforces biases and could lead to AI models that push a certain US-centric worldview, erasing other languages and cultures.

“We are using these models all over the world, and there’s a massive discrepancy between the world we’re seeing and what’s invisible to these models,” Hooker says. 

AI’s search for more energy is growing more urgent

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

If you drove by one of the 2,990 data centers in the United States, you’d probably think little more than “Huh, that’s a boring-looking building.” You might not even notice it at all. However, these facilities underpin our entire digital world, and they are responsible for tons of greenhouse-gas emissions. New research shows just how much those emissions have skyrocketed during the AI boom. 

Since 2018, carbon emissions from data centers in the US have tripled, according to new research led by a team at the Harvard T.H. Chan School of Public Health. That puts data centers slightly below domestic commercial airlines as a source of this pollution.

That leaves a big problem for the world’s leading AI companies, which are caught between pressure to meet their own sustainability goals and the relentless competition in AI that’s leading them to build bigger models requiring tons of energy. The trend toward ever more energy-intensive new AI models, including video generators like OpenAI’s Sora, will only send those numbers higher. 

A growing coalition of companies is looking toward nuclear energy as a way to power artificial intelligence. Meta announced on December 3 it was looking for nuclear partners, and Microsoft is working to restart the Three Mile Island nuclear plant by 2028. Amazon signed nuclear agreements in October. 

However, nuclear plants take ages to come online. And though public support has increased in recent years, and president-elect Donald Trump has signaled support, only a slight majority of Americans say they favor more nuclear plants to generate electricity. 

Though OpenAI CEO Sam Altman pitched the White House in September on an unprecedented effort to build more data centers, the AI industry is looking far beyond the United States. Countries in Southeast Asia, like Malaysia, Indonesia, Thailand, and Vietnam, are all courting AI companies, hoping to be their new data center hubs. 

In the meantime, AI companies will continue to use up power from their current sources, which are far from renewable. Since so many data centers are located in coal-producing regions, like Virginia, the “carbon intensity” of the energy they use is 48% higher than the national average. The researchers found that 95% of data centers in the US are built in places with sources of electricity that are dirtier than the national average. Read more about the new research here.


Deeper Learning

We saw a demo of the new AI system powering Anduril’s vision for war

We’re living through the first drone wars, but AI is poised to change the future of warfare even more drastically. I saw that firsthand during a visit to a test site in Southern California run by Anduril, the maker of AI-powered drones, autonomous submarines, and missiles. Anduril has built a way for the military to command much of its hardware—from drones to radars to unmanned fighter jets—from a single computer screen. 

Why it matters: Anduril, other companies in defense tech, and growing numbers of people within the Pentagon itself are increasingly adopting a new worldview: A future “great power” conflict—military jargon for a global war involving multiple countries—will not be won by the entity with the most advanced drones or firepower, or even the cheapest firepower. It will be won by whoever can sort through and share information the fastest. The Pentagon is betting lots of energy and money that AI—despite its flaws and risks—will be what puts the US and its allies ahead in that fight. Read more here.

Bits and Bytes

Bluesky has an impersonator problem 

The platform’s rise has brought with it a surge of crypto scammers, as my colleague Melissa Heikkilä experienced firsthand. (MIT Technology Review)

Tech’s elite make large donations to Trump ahead of his inauguration 

Leaders in Big Tech, who have been lambasted by Donald Trump, have made sizable donations to his ​​inauguration committee. (The Washington Post)

Inside the premiere of the first commercially streaming AI-generated movies

The films, according to writer Jason Koebler, showed the telltale flaws of AI-generated video: dead eyes, vacant expressions, unnatural movements, and a reliance on voice-overs, since dialogue doesn’t work well. The company behind the films is confident viewers will stomach them anyway. (404 Media)

Meta asked California’s attorney general to stop OpenAI from becoming for-profit

Meta now joins Elon Musk in alleging that OpenAI has improperly enjoyed the benefits of nonprofit status while developing its technology. (Wall Street Journal)

How Silicon Valley is disrupting democracy

Two books explore the price we’ve paid for handing over unprecedented power to Big Tech—and explain why it’s imperative we start taking it back. (MIT Technology Review)

AI’s emissions are about to skyrocket even further

It’s no secret that the current AI boom is using up immense amounts of energy. Now we have a better idea of how much. 

A new paper, from a team at the Harvard T.H. Chan School of Public Health, examined 2,132 data centers operating in the United States (78% of all facilities in the country). These facilities—essentially buildings filled to the brim with rows of servers—are where AI models get trained, and they also get “pinged” every time we send a request through models like ChatGPT. They require huge amounts of energy both to power the servers and to keep them cool. 

Since 2018, carbon emissions from data centers in the US have tripled. For the 12 months ending August 2024, data centers were responsible for 105 million metric tons of CO2, accounting for 2.18% of national emissions (for comparison, domestic commercial airlines are responsible for about 131 million metric tons). About 4.59% of all the energy used in the US goes toward data centers, a figure that’s doubled since 2018.

It’s difficult to put a number on how much AI in particular, which has been booming since ChatGPT launched in November 2022, is responsible for this surge. That’s because data centers process lots of different types of data—in addition to training or pinging AI models, they do everything from hosting websites to storing your photos in the cloud. However, the researchers say, AI’s share is certainly growing rapidly as nearly every segment of the economy attempts to adopt the technology.

“It’s a pretty big surge,” says Eric Gimon, a senior fellow at the think tank Energy Innovation, who was not involved in the research. “There’s a lot of breathless analysis about how quickly this exponential growth could go. But it’s still early days for the business in terms of figuring out efficiencies, or different kinds of chips.”

Notably, the sources for all this power are particularly “dirty.” Since so many data centers are located in coal-producing regions, like Virginia, the “carbon intensity” of the energy they use is 48% higher than the national average. The paper, which was published on arXiv and has not yet been peer-reviewed, found that 95% of data centers in the US are built in places with sources of electricity that are dirtier than the national average. 

There are causes other than simply being located in coal country, says Falco Bargagli-Stoffi, an author of the paper. “Dirtier energy is available throughout the entire day,” he says, and plenty of data centers require that to maintain peak operation 24-7. “Renewable energy, like wind or solar, might not be as available.” Political or tax incentives, and local pushback, can also affect where data centers get built.  

One key shift in AI right now means that the field’s emissions are soon likely to skyrocket. AI models are rapidly moving from fairly simple text generators like ChatGPT toward highly complex image, video, and music generators. Until now, many of these “multimodal” models have been stuck in the research phase, but that’s changing. 

OpenAI released its video generation model Sora to the public on December 9, and its website has been so flooded with traffic from people eager to test it out that it is still not functioning properly. Competing models, like Veo from Google and Movie Gen from Meta, have still not been released publicly, but if those companies follow OpenAI’s lead as they have in the past, they might be soon. Music generation models from Suno and Udio are growing (despite lawsuits), and Nvidia released its own audio generator last month. Google is working on its Astra project, which will be a video-AI companion that can converse with you about your surroundings in real time. 

“As we scale up to images and video, the data sizes increase exponentially,” says Gianluca Guidi, a PhD student in artificial intelligence at University of Pisa and IMT Lucca, who is the paper’s lead author. Combine that with wider adoption, he says, and emissions will soon jump. 

One of the goals of the researchers was to build a more reliable way to get snapshots of just how much energy data centers are using. That’s been a more complicated task than you might expect, given that the data is dispersed across a number of sources and agencies. They’ve now built a portal that shows data center emissions across the country. The long-term goal of the data pipeline is to inform future regulatory efforts to curb emissions from data centers, which are predicted to grow enormously in the coming years. 

“There’s going to be increased pressure, between the environmental and sustainability-conscious community and Big Tech,” says Francesca Dominici, director of the Harvard Data Science Initiative and another coauthor. “But my prediction is that there is not going to be regulation. Not in the next four years.”

AI’s hype and antitrust problem is coming under scrutiny

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The AI sector is plagued by a lack of competition and a lot of deceit—or at least that’s one way to interpret the latest flurry of actions taken in Washington. 

Last Thursday, Senators Elizabeth Warren and Eric Schmitt introduced a bill aimed at stirring up more competition for Pentagon contracts awarded in AI and cloud computing. Amazon, Microsoft, Google, and Oracle currently dominate those contracts. “The way that the big get bigger in AI is by sucking up everyone else’s data and using it to train and expand their own systems,” Warren told the Washington Post

The new bill would “require a competitive award process” for contracts, which would ban the use of “no-bid” awards by the Pentagon to companies for cloud services or AI foundation models. (The lawmakers’ move came a day after OpenAI announced that its technology would be deployed on the battlefield for the first time in a partnership with Anduril, completing a year-long reversal of its policy against working with the military.)

While Big Tech is hit with antitrust investigations—including the ongoing lawsuit against Google about its dominance in search, as well as a new investigation opened into Microsoft—regulators are also accusing AI companies of, well, just straight-up lying. 

On Tuesday, the Federal Trade Commission took action against the smart-camera company IntelliVision, saying that the company makes false claims about its facial recognition technology. IntelliVision has promoted its AI models, which are used in both home and commercial security camera systems, as operating without gender or racial bias and being trained on millions of images, two claims the FTC says are false. (The company couldn’t support the bias claim and the system was trained on only 100,000 images, the FTC says.)

A week earlier, the FTC made similar claims of deceit against the security giant Evolv, which sells AI-powered security scanning products to stadiums, K-12 schools, and hospitals. Evolv advertises its systems as offering better protection than simple metal detectors, saying they use AI to accurately screen for guns, knives, and other threats while ignoring harmless items. The FTC alleges that Evolv has inflated its accuracy claims, and that its systems failed in consequential cases, such as a 2022 incident when they failed to detect a seven-inch knife that was ultimately used to stab a student. 

Those add to the complaints the FTC made back in September against a number of AI companies, including one that sold a tool to generate fake product reviews and one selling “AI lawyer” services. 

The actions are somewhat tame. IntelliVision and Evolv have not actually been served fines. The FTC has simply prohibited the companies from making claims that they can’t back up with evidence, and in the case of Evolv, it requires the company to allow certain customers to get out of contracts if they wish to. 

However, they do represent an effort to hold the AI industry’s hype to account in the final months before the FTC’s chair, Lina Khan, is likely to be replaced when Donald Trump takes office. Trump has not named a pick for FTC chair, but he said on Thursday that Gail Slater, a tech policy advisor and a former aide to vice president–elect JD Vance, was picked to head the Department of Justice’s Antitrust Division. Trump has signaled that the agency under Slater will keep tech behemoths like Google, Amazon, and Microsoft in the crosshairs. 

“Big Tech has run wild for years, stifling competition in our most innovative sector and, as we all know, using its market power to crack down on the rights of so many Americans, as well as those of Little Tech!” Trump said in his announcement of the pick. “I was proud to fight these abuses in my First Term, and our Department of Justice’s antitrust team will continue that work under Gail’s leadership.”

That said, at least some of Trump’s frustrations with Big Tech are different—like his concerns that conservatives could be targets of censorship and bias. And that could send antitrust efforts in a distinctly new direction on his watch. 


Now read the rest of The Algorithm

Deeper Learning

The US Department of Defense is investing in deepfake detection

The Pentagon’s Defense Innovation Unit, a tech accelerator within the military, has awarded its first contract for deepfake detection. Hive AI will receive $2.4 million over two years to help detect AI-generated video, image, and audio content. 

Why it matters: As hyperrealistic deepfakes get cheaper and easier to produce, they hurt our ability to tell what’s real. The military’s investment in deepfake detection shows that the problem has national security implications as well. The open question is how accurate these detection tools are, and whether they can keep up with the unrelenting pace at which deepfake generation techniques are improving. Read more from Melissa Heikkilä

Bits and Bytes

The owner of the LA Times plans to add an AI-powered “bias meter” to its news stories

Patrick Soon-Shiong is building a tool that will allow readers to “press a button and get both sides” of a story. But trying to create an AI model that can somehow provide an objective view of news events is controversial, given that models are biased both by their training data and by fine-tuning methods. (Yahoo

Google DeepMind’s new AI model is the best yet at weather forecasting

It’s the second AI weather model that Google has launched in just the past few months. But this one’s different: It leaves out traditional physics models and relies on AI methods alone. (MIT Technology Review)

How the Ukraine-Russia war is reshaping the tech sector in Eastern Europe

Startups in Latvia and other nearby countries see the mobilization of Ukraine as a warning and an inspiration. They are now changing consumer products—from scooters to recreational drones—for use on the battlefield. (MIT Technology Review)

How Nvidia’s Jensen Huang is avoiding $8 billion in taxes

Jensen Huang runs Nvidia, the world’s top chipmaker and most valuable company. His wealth has soared during the AI boom, and he has taken advantage of a number of tax dodges “that will enable him to pass on much of his fortune tax free,” according to the New York Times. (The New York Times)

Meta is pursuing nuclear energy for its AI ambitions
Meta wants more of its AI training and development to be powered by nuclear energy, joining the ranks of Amazon and Microsoft. The news comes as many companies in Big Tech struggle to meet their sustainability goals amid the soaring energy demands from AI. (Meta)

Correction: A previous version of this article stated that Gail Slater was picked by Donald Trump to be the head of the FTC. Slater was in fact picked to lead the Department of Justice’s Antitrust Division. We apologize for the error.

We saw a demo of the new AI system powering Anduril’s vision for war

One afternoon in late November, I visited a weapons test site in the foothills east of San Clemente, California, operated by Anduril, a maker of AI-powered drones and missiles that recently announced a partnership with OpenAI. I went there to witness a new system it’s expanding today, which allows external parties to tap into its software and share data in order to speed up decision-making on the battlefield. If it works as planned over the course of a new three-year contract with the Pentagon, it could embed AI more deeply into the theater of war than ever before. 

Near the site’s command center, which looked out over desert scrubs and sage, sat pieces of Anduril’s hardware suite that have helped the company earn its $14 billion valuation. There was Sentry, a security tower of cameras and sensors currently deployed at both US military bases and the US-Mexico border, and advanced radars. Multiple drones, including an eerily quiet model called Ghost, sat ready to be deployed. What I was there to watch, though, was a different kind of weapon, displayed on two large television screens positioned at the test site’s command station. 

I was here to examine the pitch being made by Anduril, other companies in defense tech, and growing numbers of people within the Pentagon itself: A future “great power” conflict—military jargon for a global war involving competition between multiple countries—will not be won by the entity with the most advanced drones or firepower, or even the cheapest firepower. It will be won by whoever can sort through and share information the fastest. And that will have to be done “at the edge” where threats arise, not necessarily at a command post in Washington. 

A desert drone test

“You’re going to need to really empower lower levels to make decisions, to understand what’s going on, and to fight,” Anduril CEO Brian Schimpf says. “That is a different paradigm than today.” Currently, information flows poorly among people on the battlefield and decision-makers higher up the chain. 

To show how the new tech will fix that, Anduril walked me through an exercise demonstrating how its system would take down an incoming drone threatening a base of the US military or its allies (the scenario at the center of Anduril’s new partnership with OpenAI). It began with a truck in the distance, driving toward the base. The AI-powered Sentry tower automatically recognized the object as a possible threat, highlighting it as a dot on one of the screens. Anduril’s software, called Lattice, sent a notification asking the human operator if he would like to send a Ghost drone to monitor. After a click of his mouse, the drone piloted itself autonomously toward the truck, as information on its location gathered by the Sentry was sent to the drone by the software.

The truck disappeared behind some hills, so the Sentry tower camera that was initially trained on it lost contact. But the surveillance drone had already identified it, so its location stayed visible on the screen. We watched as someone in the truck got out and launched a drone, which Lattice again labeled as a threat. It asked the operator if he’d like to send a second attack drone, which then piloted autonomously and locked onto the threatening drone. With one click, it could be instructed to fly into it fast enough to take it down. (We stopped short here, since Anduril isn’t allowed to actually take down drones at this test site.) The entire operation could have been managed by one person with a mouse and computer.

Anduril is building on these capabilities further by expanding Lattice Mesh, a software suite that allows other companies to tap into Anduril’s software and share data, the company announced today. More than 10 companies are now building their hardware into the system—everything from autonomous submarines to self-driving trucks—and Anduril has released a software development kit to help them do so. Military personnel operating hardware can then “publish” their own data to the network and “subscribe” to receive data feeds from other sensors in a secure environment. On December 3, the Pentagon’s Chief Digital and AI Office awarded a three-year contract to Anduril for Mesh. 

Anduril’s offering will also join forces with Maven, a program operated by the defense data giant Palantir that fuses information from different sources, like satellites and geolocation data. It’s the project that led Google employees in 2018 to protest against working in warfare. Anduril and Palantir announced on December 6 that the military will be able to use the Maven and Lattice systems together. 

The military’s AI ambitions

The aim is to make Anduril’s software indispensable to decision-makers. It also represents a massive expansion of how the military is currently using AI. You might think the US Department of Defense, advanced as it is, would already have this level of hardware connectivity. We have some semblance of it in our daily lives, where phones, smart TVs, laptops, and other devices can talk to each other and share information. But for the most part, the Pentagon is behind.

“There’s so much information in this battle space, particularly with the growth of drones, cameras, and other types of remote sensors, where folks are just sopping up tons of information,” says Zak Kallenborn, a warfare analyst who works with the Center for Strategic and International Studies. Sorting through to find the most important information is a challenge. “There might be something in there, but there’s so much of it that we can’t just set a human down and to deal with it,” he says. 

Right now, humans also have to translate between systems made by different manufacturers. One soldier might have to manually rotate a camera to look around a base and see if there’s a drone threat, and then manually send information about that drone to another soldier operating the weapon to take it down. Those instructions might be shared via a low-tech messenger app—one on par with AOL Instant Messenger. That takes time. It’s a problem the Pentagon is attempting to solve through its Joint All-Domain Command and Control plan, among other initiatives.

“For a long time, we’ve known that our military systems don’t interoperate,” says Chris Brose, former staff director of the Senate Armed Services Committee and principal advisor to Senator John McCain, who now works as Anduril’s chief strategy officer. Much of his work has been convincing Congress and the Pentagon that a software problem is just as worthy of a slice of the defense budget as jets and aircraft carriers. (Anduril spent nearly $1.6 million on lobbying last year, according to data from Open Secrets, and has numerous ties with the incoming Trump administration: Anduril founder Palmer Luckey has been a longtime donor and supporter of Trump, and JD Vance spearheaded an investment in Anduril in 2017 when he worked at venture capital firm Revolution.) 

Defense hardware also suffers from a connectivity problem. Tom Keane, a senior vice president in Anduril’s connected warfare division, walked me through a simple example from the civilian world. If you receive a text message while your phone is off, you’ll see the message when you turn the phone back on. It’s preserved. “But this functionality, which we don’t even think about,” Keane says, “doesn’t really exist” in the design of many defense hardware systems. Data and communications can be easily lost in challenging military networks. Anduril says its system instead stores data locally. 

An AI data treasure trove

The push to build more AI-connected hardware systems in the military could spark one of the largest data collection projects the Pentagon has ever undertaken, and companies like Anduril and Palantir have big plans. 

“Exabytes of defense data, indispensable for AI training and inferencing, are currently evaporating,” Anduril said on December 6, when it announced it would be working with Palantir to compile data collected in Lattice, including highly sensitive classified information, to train AI models. Training on a broader collection of data collected by all these sensors will also hugely boost the model-building efforts that Anduril is now doing in a partnership with OpenAI, announced on December 4. Earlier this year, Palantir also offered its AI tools to help the Pentagon reimagine how it categorizes and manages classified data. When Anduril founder Palmer Luckey told me in an interview in October that “it’s not like there’s some wealth of information on classified topics and understanding of weapons systems” to train AI models on, he may have been foreshadowing what Anduril is now building. 

Even if some of this data from the military is already being collected, AI will suddenly make it much more useful. “What is new is that the Defense Department now has the capability to use the data in new ways,” Emelia Probasco, a senior fellow at the Center for Security and Emerging Technology at Georgetown University, wrote in an email. “More data and ability to process it could support great accuracy and precision as well as faster information processing.”

The sum of these developments might be that AI models are brought more directly into military decision-making. That idea has brought scrutiny, as when Israel was found last year to have been using advanced AI models to process intelligence data and generate lists of targets. Human Rights Watch wrote in a report that the tools “rely on faulty data and inexact approximations.”

“I think we are already on a path to integrating AI, including generative AI, into the realm of decision-making,” says Probasco, who authored a recent analysis of one such case. She examined a system built within the military in 2023 called Maven Smart System, which allows users to “access sensor data from diverse sources [and] apply computer vision algorithms to help soldiers identify and choose military targets.”

Probasco said that building an AI system to control an entire decision pipeline, possibly without human intervention, “isn’t happening” and that “there are explicit US policies that would prevent it.”

A spokesperson for Anduril said that the purpose of Mesh is not to make decisions. “The Mesh itself is not prescribing actions or making recommendations for battlefield decisions,” the spokesperson said. “Instead, the Mesh is surfacing time-sensitive information”—information that operators will consider as they make those decisions.

Bluesky has an impersonator problem 

Like many others, I recently fled the social media platform X for Bluesky. In the process, I started following many of the people I followed on X. On Thanksgiving, I was delighted to see a private message from a fellow AI reporter, Will Knight from Wired. Or at least that’s who I thought I was talking to. I became suspicious when the person claiming to be Knight mentioned being from Miami, when Knight is, in fact, from the UK. The account handle was almost identical to the real Will Knight’s handle, and the profile used his profile photo. 

Then more messages started to appear. Paris Marx, a prominent tech critic, slid into my DMs to ask me how I was doing. “Things are going splendid over here,” he replied to me. Then things got suspicious again. “How are your trades going?” fake-Marx asked me. This account was far more sophisticated than Knight’s; it had meticulously copied every single tweet and retweet from Marx’s real page over the past few weeks.

Both accounts were eventually deleted, but not before trying to get me to set up a crypto wallet and a “cloud mining pool” account. Knight and Marx confirmed to us that these accounts did not belong to them, and that they have been fighting impersonator accounts of themselves for weeks. 

They are not the only ones. The New York Times tech journalist Sheera Frankel and Molly White, a researcher and cryptocurrency critic, have also experienced people impersonating them on Bluesky, most likely to scam people. This tracks with research from Alexios Mantzarlis, the director of the Security, Trust, and Safety Initiative at Cornell Tech, who manually went through the top 500 Bluesky users by follower count and found that of the 305 accounts belonging to a named person, at least 74 had been impersonated by at least one other account. 

The platform has had to suddenly cater to an influx of millions of new users in recent months as people leave X in protest of Elon Musk’s takeover of the platform. Its user base has more than doubled since September, from 10 million users to over 20 million. This sudden wave of new users—and the inevitable scammers—means Bluesky is still playing catch-up, says White. 

“These accounts block me as soon as they’re created, so I don’t initially see them,” Marx says. Both Marx and White describe a frustrating pattern: When one account is taken down, another one pops up soon after. White says she had experienced a similar phenomenon on X and TikTok too. 

A way to prove that people are who they say they are would help. Before Musk took the reins of the platform, employees at X, previously known as Twitter, verified users such as journalists and politicians, and gave them a blue tick next to their handles so people knew they were dealing with credible news sources. After Musk took over, he scrapped the old verification system and offered blue ticks to all paying customers. 

The ongoing crypto-impersonation scams have raised calls for Bluesky to initiate something similar to Twitter’s original verification program. Some users, such as the investigative journalist Hunter Walker, have set up their own initiatives to verify journalists. However, users are currently limited in the ways they can verify themselves on the platform. By default, usernames on Bluesky end with the suffix bsky.social. The platform recommends that news organizations and high-profile people verify their identities by setting up their own websites as their usernames. For example, US senators have verified their accounts with the suffix senate.gov. But this technique isn’t foolproof. For one, it doesn’t actually verify people’s identity—only their affiliation with a particular website. 

Bluesky did not respond to MIT Technology Review’s requests for comment, but the company’s safety team posted that the platform had updated its impersonation policy to be more aggressive and would remove impersonation and handle-squatting accounts. The company says it has also quadrupled its moderation team to take action on impersonation reports more quickly. But it seems to be struggling to keep up. “We still have a large backlog of moderation reports due to the influx of new users as we shared previously, though we are making progress,” the company continued. 

Bluesky’s decentralized nature makes kicking out impersonators a trickier problem to solve. Competitors such as X and Threads rely on centralized teams within the company who moderate unwanted content and behavior, such as impersonation. But Bluesky is built on the AT Protocol, a decentralized, open-source technology, which allows users more control over what kind of content they see and enables them to build communities around particular content. Most people sign up to Bluesky Social, the main social network, whose community guidelines ban impersonation. However, Bluesky Social is just one of the services or “clients” that people can use, and other services have their own moderation practices and terms. 

This approach means that until now, Bluesky itself hasn’t needed an army of content moderators to weed out unwanted behaviors because it relies on this community-led approach, says Wayne Chang, the founder and CEO of SpruceID, a digital identity company. That might have to change.

“In order to make these apps work at all, you need some level of centralization,” says Chang. Despite community guidelines, it’s hard to stop people from creating impersonation accounts, and Bluesky is engaged in a cat-and-mouse game trying to take suspicious accounts down. 

Cracking down on a problem such as impersonation is important because it poses a serious problem for the credibility of Bluesky, says Chang. “It’s a legitimate complaint as a Bluesky user that ‘Hey, all those scammers are basically harassing me.’ You want your brand to be tarnished? Or is there something we can do about this?” he says.

A fix for this is urgently needed, because attackers might abuse Bluesky’s open-source code to create spam and disinformation campaigns at a much larger scale, says Francesco Pierri, an assistant professor at Politecnico di Milano who has researched Bluesky. His team found that the platform has seen a rise in suspicious accounts since it was made open to the public earlier this year. 

Bluesky acknowledges that its current practices are not enough. In a post, the company said it has received feedback that users want more ways to confirm their identities beyond domain verification, and it is “exploring additional options to enhance account verification.” 

In a livestream at the end of November, Bluesky CEO Jay Graber said the platform was considering becoming a verification provider, but because of its decentralized approach it would also allow others to offer their own user verification services. “And [users] can choose to trust us—the Bluesky team’s verification—or they could do their own. Or other people could do their own,” Graber said. 

But at least Bluesky seems to “have some willingness to actually moderate content on the platform,” says White. “I would love to see something a little bit more proactive that didn’t require me to do all of this reporting,” she adds. 

As for Marx, “I just hope that no one truly falls for it and gets tricked into crypto scams,” he says. 

Google’s new Project Astra could be generative AI’s killer app

Google DeepMind has announced an impressive grab bag of new products and prototypes that may just let it seize back its lead in the race to turn generative artificial intelligence into a mass-market concern. 

Top billing goes to Gemini 2.0—the latest iteration of Google DeepMind’s family of multimodal large language models, now redesigned around the ability to control agents—and a new version of Project Astra, the experimental everything app that the company teased at Google I/O in May.

MIT Technology Review got to try out Astra in a closed-door live demo last week. It was a stunning experience, but there’s a gulf between polished promo and live demo.

Astra uses Gemini 2.0’s built-in agent framework to answer questions and carry out tasks via text, speech, image, and video, calling up existing Google apps like Search, Maps, and Lens when it needs to. “It’s merging together some of the most powerful information retrieval systems of our time,” says Bibo Xu, product manager for Astra.

Gemini 2.0 and Astra are joined by Mariner, a new agent built on top of Gemini that can browse the web for you; Jules, a new Gemini-powered coding assistant; and Gemini for Games, an experimental assistant that you can chat to and ask for tips as you play video games. 

(And let’s not forget that in the last week Google DeepMind also announced Veo, a new video generation model; Imagen 3, a new version of its image generation model; and Willow, a new kind of chip for quantum computers. Whew. Meanwhile, CEO Demis Hassabis was in Sweden yesterday receiving his Nobel Prize.)

Google DeepMind claims that Gemini 2.0 is twice as fast as the previous version, Gemini 1.5, and outperforms it on a number of standard benchmarks, including MMLU-Pro, a large set of multiple-choice questions designed to test the abilities of large language models across a range of subjects, from math and physics to health, psychology, and philosophy. 

But the margins between top-end models like Gemini 2.0 and those from rival labs like OpenAI and Anthropic are now slim. These days, advances in large language models are less about how good they are and more about what you can do with them. 

And that’s where agents come in. 

Hands on with Project Astra 

Last week I was taken through an unmarked door on an upper floor of a building in London’s King’s Cross district into a room with strong secret-project vibes. The word “ASTRA” was emblazoned in giant letters across one wall. Xu’s dog, Charlie, the project’s de facto mascot, roamed between desks where researchers and engineers were busy building a product that Google is betting its future on.

“The pitch to my mum is that we’re building an AI that has eyes, ears, and a voice. It can be anywhere with you, and it can help you with anything you’re doing” says Greg Wayne, co-lead of the Astra team. “It’s not there yet, but that’s the kind of vision.” 

The official term for what Xu, Wayne, and their colleagues are building is “universal assistant.” Exactly what that means in practice, they’re still figuring out. 

At one end of the Astra room were two stage sets that the team uses for demonstrations: a drinks bar and a mocked-up art gallery. Xu took me to the bar first. “A long time ago we hired a cocktail expert and we got them to instruct us to make cocktails,” said Praveen Srinivasan, another co-lead. “We recorded those conversations and used that to train our initial model.”

Xu opened a cookbook to a recipe for a chicken curry, pointed her phone at it, and woke up Astra. “Ni hao, Bibo!” said a female voice. 

“Oh! Why are you speaking to me in Mandarin?” Xu asked her phone. “Can you speak to me in English, please?”

“My apologies, Bibo. I was following a previous instruction to speak in Mandarin. I will now speak in English as you have requested.”

Astra remembers previous conversations, Xu told me. It also keeps track of the previous 10 minutes of video. (There’s a remarkable moment in the promo video that Google put out in May when Astra tells the person giving the demo where she had left her glasses, having spotted them on a desk a few seconds earlier. But I saw nothing like this in the live demo.)

Back to the cookbook. Moving her phone camera over the page for a few seconds, Xu asked Astra to read the recipe and tell her what spices were in it. “I recall the recipe mentioning a teaspoon of black peppercorns, a teaspoon of hot chili powder, and a cinnamon stick,” it replied.

“I think you’re missing a few,” said Xu. “Take another look.”

“You are correct—I apologize. I also see ground turmeric and curry leaves in the ingredients.” 

Seeing this tech in action, two things hit you straight away. First, it’s glitchy and often needs correcting. Second, those glitches can be corrected with just a few spoken words. You simply interrupt the voice, repeat your instructions, and move on. It feels more like coaching a child than butting heads with broken software.   

Next Xu pointed her phone at a row of wine bottles and asked Astra to pick the one that would go best with the chicken curry. It went for a rioja and explained why. Xu asked how much a bottle would cost. Astra said it would need to use Search to look prices up online. A few seconds later it came back with its answer. 

We moved to the art gallery, and Xu showed Astra a number of screens with famous paintings on them: the Mona Lisa, Munch’s The Scream, a Vermeer, a Seurat, and several others. “Ni hao, Bibo!” the voice said. 

“You’re speaking to me in Mandarin again,” Xu said. “Try to speak to me in English, please.” 

“My apologies, I seem to have misunderstood. Yes, I will respond in English.” (I should know better, but I could swear I heard the snark.)

It was my turn. Xu handed me her phone. 

I tried to trip Astra up, but it was having none of it. I asked it what famous art gallery we were in, but it refused to hazard a guess. I asked why it had identified the paintings as replicas and it started to apologize for its mistake (Astra apologizes a lot). I was compelled to interrupt: “No, no—you’re right, it’s not a mistake. You’re correct to identify paintings on screens as fake paintings.” I couldn’t help feeling a bit bad: I’d confused an app that exists only to please. 

When it works well, Astra is enthralling. The experience of striking up a conversation with your phone about whatever you’re pointing it at feels fresh and seamless. In a media briefing yesterday, Google DeepMind shared a video showing off other uses: reading an email on your phone’s screen to find a door code (and then reminding you of that code later), pointing a phone at a passing bus and asking where it goes, quizzing it about a public artwork as you walk past. This could be generative AI’s killer app. 

And yet there’s a long way to go before most people get their hands on tech like this. There’s no mention of a release date. Google DeepMind has also shared videos of Astra working on a pair of smart glasses, but that tech is even further down the company’s wish list.

Mixing it up

For now, researchers outside Google DeepMind are keeping a close eye on its progress. “The way that things are being combined is impressive,” says Maria Liakata, who works on large language models at Queen Mary University of London and the Alan Turing Institute. “It’s hard enough to do reasoning with language, but here you need to bring in images and more. That’s not trivial.”

Liakata is also impressed by Astra’s ability to recall things it has seen or heard. She works on what she calls long-range context, getting models to keep track of information that they have come across before. “This is exciting,” says Liakata. “Even doing it in a single modality is exciting.”

But she admits that a lot of her assessment is guesswork. “Multimodal reasoning is really cutting-edge,” she says. “But it’s very hard to know exactly where they’re at, because they haven’t said a lot about what is in the technology itself.”

For Bodhisattwa Majumder, a researcher who works on multimodal models and agents at the Allen Institute for AI, that’s a key concern. “We absolutely don’t know how Google is doing it,” he says. 

He notes that if Google were to be a little more open about what it is building, it would help consumers understand the limitations of the tech they could soon be holding in their hands. “They need to know how these systems work,” he says. “You want a user to be able to see what the system has learned about you, to correct mistakes, or to remove things you want to keep private.”

Liakata is also worried about the implications for privacy, pointing out that people could be monitored without their consent. “I think there are things I’m excited about and things that I’m concerned about,” she says. “There’s something about your phone becoming your eyes—there’s something unnerving about it.” 

“The impact these products will have on society is so big that it should be taken more seriously,” she says. “But it’s become a race between the companies. It’s problematic, especially since we don’t have any agreement on how to evaluate this technology.”

Google DeepMind says it takes a long, hard look at privacy, security, and safety for all its new products. Its tech will be tested by teams of trusted users for months before it hits the public. “Obviously, we’ve got to think about misuse. We’ve got to think about, you know, what happens when things go wrong,” says Dawn Bloxwich, director of responsible development and innovation at Google DeepMind. “There’s huge potential. The productivity gains are huge. But it is also risky.”

No team of testers can anticipate all the ways that people will use and misuse new technology. So what’s the plan for when the inevitable happens? Companies need to design products that can be recalled or switched off just in case, says Bloxwich: “If we need to make changes quickly or pull something back, then we can do that.”