What are AI agents? 

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

When ChatGPT was first released, everyone in AI was talking about the new generation of AI assistants. But over the past year, that excitement has turned to a new target: AI agents. 

Agents featured prominently in Google’s annual I/O conference in May, when the company unveiled its new AI agent called Astra, which allows users to interact with it using audio and video. OpenAI’s new GPT-4o model has also been called an AI agent.  

And it’s not just hype, although there is definitely some of that too. Tech companies are plowing vast sums into creating AI agents, and their research efforts could usher in the kind of useful AI we have been dreaming about for decades. Many experts, including Sam Altman, say they are the next big thing.   

But what are they? And how can we use them? 

How are they defined? 

It is still early days for research into AI agents, and the field does not have a definitive definition for them. But simply, they are AI models and algorithms that can autonomously make decisions in a dynamic world, says Jim Fan, a senior research scientist at Nvidia who leads the company’s AI agents initiative. 

The grand vision for AI agents is a system that can execute a vast range of tasks, much like a human assistant. In the future, it could help you book your vacation, but it will also remember if you prefer swanky hotels, so it will only suggest hotels that have four stars or more and then go ahead and book the one you pick from the range of options it offers you. It will then also suggest flights that work best with your calendar, and plan the itinerary for your trip according to your preferences. It could make a list of things to pack based on that plan and the weather forecast. It might even send your itinerary to any friends it knows live in your destination and invite them along. In the workplace, it  could analyze your to-do list and execute tasks from it, such as sending calendar invites, memos, or emails. 

One vision for agents is that they are multimodal, meaning they can process language, audio, and video. For example, in Google’s Astra demo, users could point a smartphone camera at things and ask the agent questions. The agent could respond to text, audio, and video inputs. 

These agents could also make processes smoother for businesses and public organizations, says David Barber, the director of the University College London Centre for Artificial Intelligence. For example, an AI agent might be able to function as a more sophisticated customer service bot. The current generation of language-model-based assistants can only generate the next likely word in a sentence. But an AI agent would have the ability to act on natural-language commands autonomously and process customer service tasks without supervision. For example, the agent would be able to analyze customer complaint emails and then know to check the customer’s reference number, access databases such as customer relationship management and delivery systems to see whether the complaint is legitimate, and process it according to the company’s policies, Barber says. 

Broadly speaking, there are two different categories of agents, says Fan: software agents and embodied agents. 

Software agents run on computers or mobile phones and use apps, much as in the travel agent example above. “Those agents are very useful for office work or sending emails or having this chain of events going on,” he says. 

Embodied agents are agents that are situated in a 3D world such as a video game, or in a robot. These kinds of agents might make video games more engaging by letting people play with nonplayer characters controlled by AI. These sorts of agents could also help build more useful robots that could help us with everyday tasks at home, such as folding laundry and cooking meals. 

Fan was part of a team that built an embodied AI agent called MineDojo in the popular computer game Minecraft. Using a vast trove of data collected from the internet, Fan’s AI agent was able to learn new skills and tasks that allowed it to freely explore the virtual 3D world and complete complex tasks such as encircling llamas with fences or scooping lava into a bucket. Video games are good proxies for the real world, because they require agents to understand physics, reasoning, and common sense. 

In a new paper, which has not yet been peer-reviewed, researchers at Princeton say that AI agents tend to have three different characteristics. AI systems are considered “agentic” if they can pursue difficult goals without being instructed in complex environments. They also qualify if they can be instructed in natural language and act autonomously without supervision. And finally, the term “agent” can also apply to systems that are able to use tools, such as web search or programming, or are capable of planning. 

Are they a new thing?

The term “AI agents” has been around for years and has meant different things at different times, says Chirag Shah, a computer science professor at the University of Washington. 

There have been two waves of agents, says Fan. The current wave is thanks to the language model boom and the rise of systems such as ChatGPT. 

The previous wave was in 2016, when Google DeepMind introduced AlphaGo, its AI system that can play—and win—the game Go. AlphaGo was able to make decisions and plan strategies. This relied on reinforcement learning, a technique that rewards AI algorithms for desirable behaviors. 

“But these agents were not general,” says Oriol Vinyals, vice president of research at Google DeepMind. They were created for very specific tasks—in this case, playing Go. The new generation of foundation-model-based AI makes agents more universal, as they can learn from the world humans interact with. 

“You feel much more that the model is interacting with the world and then giving back to you better answers or better assisted assistance or whatnot,” says Vinyals. 

What are the limitations? 

There are still many open questions that need to be answered. Kanjun Qiu, CEO and founder of the AI startup Imbue, which is working on agents that can reason and code, likens the state of agents to where self-driving cars were just over a decade ago. They can do stuff, but they’re unreliable and still not really autonomous. For example, a coding agent can generate code, but it sometimes gets it wrong, and it doesn’t know how to test the code it’s creating, says Qiu. So humans still need to be actively involved in the process. AI systems still can’t fully reason, which is a critical step in operating in a complex and  ambiguous human world. 

“We’re nowhere close to having an agent that can just automate all of these chores for us,” says Fan. Current systems “hallucinate and they also don’t always follow instructions closely,” Fan says. “And that becomes annoying.”  

Another limitation is that after a while, AI agents lose track of what they are working on. AI systems are limited by their context windows, meaning the amount of data they can take into account at any given time. 

“ChatGPT can do coding, but it’s not able to do long-form content well. But for human developers, we look at an entire GitHub repository that has tens if not hundreds of lines of code, and we have no trouble navigating it,” says Fan. 

To tackle this problem, Google has increased its models’ capacity to process data, which allows users to have longer interactions with them in which they remember more about past interactions. The company said it is working on making its context windows infinite in the future.

For embodied agents such as robots, there are even more limitations. There is not enough training data to teach them, and researchers are only just starting to harness the power of foundation models in robotics. 

So amid all the hype and excitement, it’s worth bearing in mind that research into AI agents is still in its very early stages, and it will likely take years until we can experience their full potential. 

That sounds cool. Can I try an AI agent now? 

Sort of. You’ve most likely tried their early prototypes, such as OpenAI’s ChatGPT and GPT-4. “If you’re interacting with software that feels smart, that is kind of an agent,” says Qiu. 

Right now the best agents we have are systems with very narrow and specific use cases, such as coding assistants, customer service bots, or workflow automation software like Zapier, she says. But these are a far cry from a universal AI agent that can do complex tasks. 

“Today we have these computers and they’re really powerful, but we have to micromanage them,” says Qiu. 

OpenAI’s ChatGPT plug-ins, which allow people to create AI-powered assistants for web browsers, were an attempt at agents, says Qiu. But these systems are still clumsy, unreliable, and not capable of reasoning, she says. 

Despite that, these systems will one day change the way we interact with technology, Qiu believes, and it is a trend people need to pay attention to. 

“It’s not like, ‘Oh my God, all of a sudden we have AGI’ … but more like ‘Oh my God, my computer can do way more than it did five years ago,’” she says.

Why does AI hallucinate?

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

The World Health Organization’s new chatbot launched on April 2 with the best of intentions. 

A fresh-faced virtual avatar backed by GPT-3.5, SARAH (Smart AI Resource Assistant for Health) dispenses health tips in eight different languages, 24/7, about how to eat well, quit smoking, de-stress, and more, for millions around the world.

But like all chatbots, SARAH can flub its answers. It was quickly found to give out incorrect information. In one case, it came up with a list of fake names and addresses for nonexistent clinics in San Francisco. The World Health Organization warns on its website that SARAH may not always be accurate.

Here we go again. Chatbot fails are now a familiar meme. Meta’s short-lived scientific chatbot Galactica made up academic papers and generated wiki articles about the history of bears in space. In February, Air Canada was ordered to honor a refund policy invented by its customer service chatbot. Last year, a lawyer was fined for submitting court documents filled with fake judicial opinions and legal citations made up by ChatGPT. 

The problem is, large language models are so good at what they do that what they make up looks right most of the time. And that makes trusting them hard.

This tendency to make things up—known as hallucination—is one of the biggest obstacles holding chatbots back from more widespread adoption. Why do they do it? And why can’t we fix it?

Magic 8 Ball

To understand why large language models hallucinate, we need to look at how they work. The first thing to note is that making stuff up is exactly what these models are designed to do. When you ask a chatbot a question, it draws its response from the large language model that underpins it. But it’s not like looking up information in a database or using a search engine on the web. 

Peel open a large language model and you won’t see ready-made information waiting to be retrieved. Instead, you’ll find billions and billions of numbers. It uses these numbers to calculate its responses from scratch, producing new sequences of words on the fly. A lot of the text that a large language model generates looks as if it could have been copy-pasted from a database or a real web page. But as in most works of fiction, the resemblances are coincidental. A large language model is more like an infinite Magic 8 Ball than an encyclopedia. 

Large language models generate text by predicting the next word in a sequence. If a model sees “the cat sat,” it may guess “on.” That new sequence is fed back into the model, which may now guess “the.” Go around again and it may guess “mat”—and so on. That one trick is enough to generate almost any kind of text you can think of, from Amazon listings to haiku to fan fiction to computer code to magazine articles and so much more. As Andrej Karpathy, a computer scientist and cofounder of OpenAI, likes to put it: large language models learn to dream internet documents. 

Think of the billions of numbers inside a large language model as a vast spreadsheet that captures the statistical likelihood that certain words will appear alongside certain other words. The values in the spreadsheet get set when the model is trained, a process that adjusts those values over and over again until the model’s guesses mirror the linguistic patterns found across terabytes of text taken from the internet. 

To guess a word, the model simply runs its numbers. It calculates a score for each word in its vocabulary that reflects how likely that word is to come next in the sequence in play. The word with the best score wins. In short, large language models are statistical slot machines. Crank the handle and out pops a word. 

It’s all hallucination

The takeaway here? It’s all hallucination, but we only call it that when we notice it’s wrong. The problem is, large language models are so good at what they do that what they make up looks right most of the time. And that makes trusting them hard. 

Can we control what large language models generate so they produce text that’s guaranteed to be accurate? These models are far too complicated for their numbers to be tinkered with by hand. But some researchers believe that training them on even more text will continue to reduce their error rate. This is a trend we’ve seen as large language models have gotten bigger and better. 

Another approach involves asking models to check their work as they go, breaking responses down step by step. Known as chain-of-thought prompting, this has been shown to increase the accuracy of a chatbot’s output. It’s not possible yet, but future large language models may be able to fact-check the text they are producing and even rewind when they start to go off the rails.

But none of these techniques will stop hallucinations fully. As long as large language models are probabilistic, there is an element of chance in what they produce. Roll 100 dice and you’ll get a pattern. Roll them again and you’ll get another. Even if the dice are, like large language models, weighted to produce some patterns far more often than others, the results still won’t be identical every time. Even one error in 1,000—or 100,000—adds up to a lot of errors when you consider how many times a day this technology gets used. 

The more accurate these models become, the more we will let our guard down. Studies show that the better chatbots get, the more likely people are to miss an error when it happens.  

Perhaps the best fix for hallucination is to manage our expectations about what these tools are for. When the lawyer who used ChatGPT to generate fake documents was asked to explain himself, he sounded as surprised as anyone by what had happened. “I heard about this new site, which I falsely assumed was, like, a super search engine,” he told a judge. “I did not comprehend that ChatGPT could fabricate cases.” 

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

After weeks of drawn-out congressional debate over how much the United States should spend on conflicts abroad, President Joe Biden signed a $95.3 billion aid package into law on Wednesday.

The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. It’s also sparked renewed calls for stronger crackdowns on Iranian-produced drones. 

Though much of the money will go toward replenishing fairly standard munitions and supplies, the spending bill provides a window into US strategies around four key defense technologies that continue to reshape how today’s major conflicts are being fought.

For a closer look at the military technology at the center of the aid package, I spoke with Andrew Metrick, a fellow with the defense program at the Center for a New American Security, a think tank.

Ukraine and the role of long-range missiles

Ukraine has long sought the Army Tactical Missile System (ATACMS), a long-range ballistic missile made by Lockheed Martin. First debuted in Operation Desert Storm in Iraq in 1990, it’s 13 feet high, two feet wide, and over 3,600 pounds. It can use GPS to accurately hit targets 190 miles away. 

Last year, President Biden was apprehensive about sending such missiles to Ukraine, as US stockpiles of the weapons were relatively low. In October, the administration changed tack. The US sent shipments of ATACMS, a move celebrated by President Volodymyr Zelensky of Ukraine, but they came with restrictions: the missiles were older models with a shorter range, and Ukraine was instructed not to fire them into Russian territory, only Ukrainian territory. 

This week, just hours before the new aid package was signed, multiple news outlets reported that the US had secretly sent more powerful long-range ATACMS to Ukraine several weeks before. They were used on Tuesday, April 23, to target a Russian airfield in Crimea and Russian troops in Berdiansk, 50 miles southwest of Mariupol.

The long range of the weapons has proved essential for Ukraine, says Metrick. “It allows the Ukrainians to strike Russian targets at ranges for which they have very few other options,” he says. That means being able to hit locations like supply depots, command centers, and airfields behind Russia’s front lines in Ukraine. This capacity has grown more important as Ukraine’s troop numbers have waned, Metrick says.

Replenishing Israel’s Iron Dome

On April 13, Iran launched its first-ever direct attack on Israeli soil. In the attack, which Iran says was retaliation for Israel’s airstrike on its embassy in Syria, hundreds of missiles were lobbed into Israeli airspace. Many of them were neutralized by the web of cutting-edge missile launchers dispersed throughout Israel that can automatically detonate incoming strikes before they hit land. 

One of those systems is Israel’s Iron Dome, in which radar systems detect projectiles and then signal units to launch defensive missiles that detonate the target high in the sky before it strikes populated areas. Israel’s other system, called David’s Sling, works a similar way but can identify rockets coming from a greater distance, upwards of 180 miles. 

Both systems are hugely costly to research and build, and the new US aid package allocates $15 billion to replenish their missile stockpile. The missiles can cost anywhere from $100,000 to $10 million each, and a system like Iron Dome might fire them daily during intense periods of conflict. 

The aid comes as funding for Israel has grown more contentious amid the dire conditions faced by displaced Palestinians in Gaza. While the spending bill worked its way through Congress, increasing numbers of Democrats sought to put conditions on the military aid to Israel, particularly after an Israeli air strike on April 1 killed seven aid workers from World Central Kitchen, an international food charity. The funding package does provide $9 billion in humanitarian assistance for the conflict, but the efforts to impose conditions for Israeli military aid failed. 

Taiwan and underwater defenses against China

A rising concern for the US defense community—and a subject of “wargaming” simulations that Metrick has carried out—is an amphibious invasion of Taiwan from China. The rising risk of that scenario has driven the US to build and deploy larger numbers of advanced submarines, Metrick says. A bigger fleet of these submarines would be more likely to keep attacks from China at bay, thereby protecting Taiwan.

The trouble is that the US shipbuilding effort, experts say, is too slow. It’s been hampered by budget cuts and labor shortages, but the new aid bill aims to jump-start it. It will provide $3.3 billion to do so, specifically for the production of Columbia-class submarines, which carry nuclear weapons, and Virginia-class submarines, which carry conventional weapons. 

Though these funds aim to support Taiwan by building up the US supply of submarines, the package also includes more direct support, like $2 billion to help it purchase weapons and defense equipment from the US. 

The US’s Iranian drone problem 

Shahed drones are used almost daily on the Russia-Ukraine battlefield, and Iran launched more than 100 against Israel earlier this month. Produced by Iran and resembling model planes, the drones are fast, cheap, and lightweight, capable of being launched from the back of a pickup truck. They’re used frequently for potent one-way attacks, where they detonate upon reaching their target. US experts say the technology is tipping the scales toward Russian and Iranian military groups and their allies. 

The trouble of combating them is partly one of cost. Shooting down the drones, which can be bought for as little as $40,000, can cost millions in ammunition.

“Shooting down Shaheds with an expensive missile is not, in the long term, a winning proposition,” Metrick says. “That’s what the Iranians, I think, are banking on. They can wear people down.”

This week’s aid package renewed White House calls for stronger sanctions aimed at curbing production of the drones. The United Nations previously passed rules restricting any drone-related material from entering or leaving Iran, but those expired in October. The US now wants them reinstated. 

Even if that happens, it’s unlikely the rules would do much to contain the Shahed’s dominance. The components of the drones are not all that complex or hard to obtain to begin with, but experts also say that Iran has built a sprawling global supply chain to acquire the materials needed to manufacture them and has worked with Russia to build factories. 

“Sanctions regimes are pretty dang leaky,” Metrick says. “They [Iran] have friends all around the world.”

How virtual power plants are shaping tomorrow’s energy system

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

For more than a century, the prevalent image of power plants has been characterized by towering smokestacks, endless coal trains, and loud spinning turbines. But the plants powering our future will look radically different—in fact, many may not have a physical form at all. Welcome to the era of virtual power plants (VPPs).

The shift from conventional energy sources like coal and gas to variable renewable alternatives such as solar and wind means the decades-old way we operate the energy system is changing. 

Governments and private companies alike are now counting on VPPs’ potential to help keep costs down and stop the grid from becoming overburdened. 

Here’s what you need to know about VPPs—and why they could be the key to helping us bring more clean power and energy storage online.

What are virtual power plants and how do they work?

A virtual power plant is a system of distributed energy resources—like rooftop solar panels, electric vehicle chargers, and smart water heaters—that work together to balance energy supply and demand on a large scale. They are usually run by local utility companies who oversee this balancing act.

A VPP is a way of “stitching together” a portfolio of resources, says Rudy Shankar, director of Lehigh University’s Energy Systems Engineering, that can help the grid respond to high energy demand while reducing the energy system’s carbon footprint.

The “virtual” nature of VPPs comes from its lack of a central physical facility, like a traditional coal or gas plant. By generating electricity and balancing the energy load, the aggregated batteries and solar panels provide many of the functions of conventional power plants.

They also have unique advantages.

Kevin Brehm, a manager at Rocky Mountain Institute who focuses on carbon-free electricity, says comparing VPPs to traditional plants is a “helpful analogy,” but VPPs “do certain things differently and therefore can provide services that traditional power plants can’t.”

One significant difference is VPPs’ ability to shape consumers’ energy use in real time. Unlike conventional power plants, VPPs can communicate with distributed energy resources and allow grid operators to control the demand from end users.

For example, smart thermostats linked to air conditioning units can adjust home temperatures and manage how much electricity the units consume. On hot summer days these thermostats can pre-cool homes before peak hours, when air conditioning usage surges. Staggering cooling times can help prevent abrupt demand hikes that might overwhelm the grid and cause outages. Similarly, electric vehicle chargers can adapt to the grid’s requirements by either supplying or utilizing electricity. 

These distributed energy sources connect to the grid through communication technologies like Wi-Fi, Bluetooth, and cellular services. In aggregate, adding VPPs can increase overall system resilience. By coordinating hundreds of thousands of devices, VPPs have a meaningful impact on the grid—they shape demand, supply power, and keep the electricity flowing reliably.

How popular are VPPs now?

Until recently, VPPs were mostly used to control consumer energy use. But because solar and battery technology has evolved, utilities can now use them to supply electricity back to the grid when needed.

In the United States, the Department of Energy estimates VPP capacity at around 30 to 60 gigawatts. This represents about 4% to 8% of peak electricity demand nationwide, a minor fraction within the overall system. However, some states and utility companies are moving quickly to add more VPPs to their grids.

Green Mountain Power, Vermont’s largest utility company, made headlines last year when it expanded its subsidized home battery program. Customers have the option to lease a Tesla home battery at a discounted rate or purchase their own, receiving assistance of up to $10,500, if they agree to share stored energy with the utility as required. The Vermont Public Utility Commission, which approved the program, said it can also provide emergency power during outages.

In Massachusetts, three utility companies (National Grid, Eversource, and Cape Light Compact) have implemented a VPP program that pays customers in exchange for utility control of their home batteries.

Meanwhile, in Colorado efforts are underway to launch the state’s first VPP system. The Colorado Public Utilities Commission is urging Xcel Energy, its largest utility company, to develop a fully operational VPP pilot by this summer.

Why are VPPs important for the clean energy transition?

Grid operators must meet the annual or daily “peak load,” the moment of highest electricity demand. To do that, they often resort to using gas “peaker” plants, ones that remain dormant most of the year that they can switch during in times of high demand. VPPs will reduce the grids’ reliance on these plants.

The Department of Energy currently aims to expand national VPP capacity to 80 to 160 GW by 2030. That’s roughly equivalent to 80 to 160 fossil fuel plants that need not be built, says Brehm.

Many utilities say VPPs can lower energy bills for consumers in addition to reducing emissions. Research suggests that leveraging distributed sources during peak demand is up to 60% more cost effective than relying on gas plants.

Another significant, if less tangible, advantage of VPPs is that they encourage people to be more involved in the energy system. Usually, customers merely receive electricity. Within a VPP system, they both consume power and contribute it back to the grid. This dual role can improve their understanding of the grid and get them more invested in the transition to clean energy.

What’s next for VPPs?

The capacity of distributed energy sources is expanding rapidly, according to the Department of Energy, owing to the widespread adoption of electric vehicles, charging stations, and smart home devices. Connecting these to VPP systems enhances the grid’s ability to balance electricity demand and supply in real time. Better AI can also help VPPs become more adept at coordinating diverse assets, says Shankar.

Regulators are also coming on board. The National Association of Regulatory Utility Commissioners has started holding panels and workshops to educate its members about VPPs and how to implement them in their states. The California Energy Commission is set to fund research exploring the benefits of integrating VPPs into its grid system. This kind of interest from regulators is new but promising, says Brehm.

Still, hurdles remain. Enrolling in a VPP can be confusing for consumers because the process varies among states and companies. Simplifying it for people will help utility companies make the most of distributed energy resources such as EVs and heat pumps. Standardizing the deployment of VPPs can also speed up their growth nationally by making it easier to replicate successful projects across regions.

“It really comes down to policy,” says Brehm. “The technology is in place. We are continuing to learn about how to best implement these solutions and how to interface with consumers.”

A controversial US surveillance program is up for renewal. Critics are speaking out.

This article is from The Technocrat, MIT Technology Review’s weekly tech policy newsletter about power, politics, and Silicon Valley. To receive it in your inbox every Friday, sign up here.

For the past week my social feeds have been filled with a pretty important tech policy debate that I want to key you in on: the renewal of a controversial program of American surveillance.

The program, outlined in Section 702 of the Foreign Intelligence Surveillance Act (FISA), was created in 2008. It was designed to expand the power of US agencies to collect electronic “foreign intelligence information,” whether about spies, terrorists, or cybercriminals abroad, and to do so without a warrant. 

Tech companies, in other words, are compelled to hand over communications records like phone calls, texts, and emails to US intelligence agencies including the FBI, CIA, and NSA. A lot of data about Americans who communicate with people internationally gets swept up in these searches. Critics say that is unconstitutional

Despite a history of abuses by intelligence agencies, Section 702 was successfully renewed in both 2012 and 2017. The program, which has to be periodically renewed by Congress, is set to expire again at the end of December. But a broad group that transcends parties is calling for reforming the program, out of concern about the vast surveillance it enables. Here is what you need to know.

What do the critics of Section 702 say?

Of particular concern is that while the program intends to target people who aren’t Americans, a lot of data from US citizens gets swept up if they communicate with anyone abroad—and, again, this is without a warrant. The 2022 annual report on the program revealed that intelligence agencies ran searches on an estimated 3.4 million “US persons” during the previous year; that’s an unusually high number for the program, though the FBI attributed it to an uptick in investigations of Russia-based cybercrime that targeted US infrastructure. Critics have raised alarms about the ways the FBI has used the program to surveil Americans including Black Lives Matter activists and a member of Congress.  

In a letter to Senate Majority Leader Chuck Schumer this week, over 25 civil society organizations, including the American Civil Liberties Union (ACLU), the Center for Democracy & Technology, and the Freedom of the Press Foundation, said they “strongly oppose even a short-term reauthorization of Section 702.”

Wikimedia, the foundation that runs Wikipedia, also opposes the program in its current form, saying it leaves international open-source projects vulnerable to surveillance. “Wikimedia projects are edited and governed by nearly 300,000 volunteers around the world who share free knowledge and serve billions of readers globally. Under Section 702, every interaction on these projects is currently subject to surveillance by the NSA,” says a spokesperson for the Wikimedia Foundation. “Research shows that online surveillance has a ‘chilling effect’ on Wikipedia users, who will engage in self-censorship to avoid the threat of governmental reprisals for accurately documenting or accessing certain kinds of information.”

And what about the proponents?

The main supporters of the program’s reauthorization are the intelligence agencies themselves, which say it enables them to gather critical information about foreign adversaries and online criminal activities like ransomware and cyberattacks. 

In defense of the provision, FBI director Christopher Wray has also pointed to procedural changes at the bureau in recent years that have reduced the number of Americans being surveilled from 3.4 million in 2021 to 200,000 in 2022. 

The Biden administration has also broadly pushed for the reauthorization of Section 702 without reform.  

“Section 702 is a necessary instrument within the intelligence community, leveraging the United States’ global telecommunication footprint through legal and court-approved means,” says Sabine Neschke, a senior policy analyst at the Bipartisan Policy Center. “Ultimately, Congress must strike a balance between ensuring national security and safeguarding individual rights.”

What would reform look like?

The proposal to reform the program, called the Government Surveillance Reform Act, was announced last week and focuses on narrowing the government’s authority to collect information on US citizens.

It would require warrants to collect Americans’ location data and web browsing or search records under the program and documentation that the queries were “reasonably likely to retrieve foreign intelligence information.” In a hearing before the House Committee on Homeland Security on Wednesday, Wray said that a warrant requirement would be a “significant blow” to the program, calling it a “de facto ban.”

Senator Ron Wyden, who cosponsored the reform bill and sits on the Senate Select Committee on Intelligence, has said he won’t vote to renew the program unless some of its powers are curbed. “Congress must have a real debate about reforming warrantless government surveillance of Americans,” Wyden said in a statement to MIT Technology Review. “Therefore, the administration and congressional leaders should listen to the overwhelming bipartisan coalition that supports adopting common-sense protections for Americans’ privacy and extending key national security authorities at the same time.”

The reform bill does not, as some civil society groups had hoped, limit the government’s powers for surveillance of people outside of the US. 

While it’s not yet clear whether these reforms will pass, intelligence agencies have never faced such a broad, bipartisan coalition of opponents. As for what happens next, we’ll have to wait and see. 

What else I’m reading

  • Here’s a great story from the New Yorker about how facial recognition searches can lead police to ignore other pieces of an investigation. 
  • I loved this excerpt of Broken Code, a new book from reporter Jeff Horwitz, who broke the Facebook Files revealed by whistleblower Frances Haugen. It’s a nice insidery look at the company’s AI strategy. 
  • Meta says that age verification requirements, such as those being proposed by child online safety bills, should be up to app stores like Apple’s and Google’s. It’s an interesting stance that the company says would help take the burden off individual websites to comply with the new regulations. 

What I learned this week

Some researchers and technologists have been calling for new and more precise language around artificial intelligence. This week, Google DeepMind released a paper outlining different levels of artificial general intelligence, often referred to as AGI, as my colleague Will Douglas Heaven reports.

“The team outlines five ascending levels of AGI: emerging (which in their view includes cutting-edge chatbots like ChatGPT and Bard), competent, expert, virtuoso, and superhuman (performing a wide range of tasks better than all humans, including tasks humans cannot do at all, such as decoding other people’s thoughts, predicting future events, and talking to animals),” Will writes. “They note that no level beyond emerging AGI has been achieved.” We’ll certainly be hearing more about what words we should use when referring to AI in the future.