Five things you need to know about AI right now

Last month I gave a talk at SXSW London called “Five things you need to know about AI”—my personal picks for the five most important ideas in AI right now. 

I aimed the talk at a general audience, and it serves as a quick tour of how I’m thinking about AI in 2025. I’m sharing it here in case you’re interested. I think the talk has something for everyone. There’s some fun stuff in there. I even make jokes!

The video is now available (thank you, SXSW London). Below is a quick look at my top five. Let me know if you would have picked different ones!

1. Generative AI is now so good it’s scary.

Maybe you think that’s obvious. But I am constantly having to check my assumptions about how fast this technology is progressing—and it’s my job to keep up. 

A few months ago, my colleague—and your regular Algorithm writer—James O’Donnell shared 10 music tracks with the MIT Technology Review editorial team and challenged us to pick which ones had been produced using generative AI and which had been made by people. Pretty much everybody did worse than chance.

What’s happening with music is happening across media, from code to robotics to protein synthesis to video. Just look at what people are doing with new video-generation tools like Google DeepMind’s Veo 3. And this technology is being put into everything.

My point here? Whether you think AI is the best thing to happen to us or the worst, do not underestimate it. It’s good, and it’s getting better.

2. Hallucination is a feature, not a bug.

Let’s not forget the fails. When AI makes up stuff, we call it hallucination. Think of customer service bots offering nonexistent refunds, lawyers submitting briefs filled with nonexistent cases, or RFK Jr.’s government department publishing a report that cites nonexistent academic papers. 

You’ll hear a lot of talk that makes hallucination sound like it’s a problem we need to fix. The more accurate way to think about hallucination is that this is exactly what generative AI does—what it’s meant to do—all the time. Generative models are trained to make things up.

What’s remarkable is not that they make up nonsense, but that the nonsense they make up so often matches reality. Why does this matter? First, we need to be aware of what this technology can and can’t do. But also: Don’t hold out for a future version that doesn’t hallucinate.

3. AI is power hungry and getting hungrier.

You’ve probably heard that AI is power hungry. But a lot of that reputation comes from the amount of electricity it takes to train these giant models, though giant models only get trained every so often.

What’s changed is that these models are now being used by hundreds of millions of people every day. And while using a model takes far less energy than training one, the energy costs ramp up massively with those kinds of user numbers. 

ChatGPT, for example, has 400 million weekly users. That makes it the fifth-most-visited website in the world, just after Instagram and ahead of X. Other chatbots are catching up. 

So it’s no surprise that tech companies are racing to build new data centers in the desert and revamp power grids.

The truth is we’ve been in the dark about exactly how much energy it takes to fuel this boom because none of the major companies building this technology have shared much information about it. 

That’s starting to change, however. Several of my colleagues spent months working with researchers to crunch the numbers for some open source versions of this tech. (Do check out what they found.)

4. Nobody knows exactly how large language models work.

Sure, we know how to build them. We know how to make them work really well—see no. 1 on this list.

But how they do what they do is still an unsolved mystery. It’s like these things have arrived from outer space and scientists are poking and prodding them from the outside to figure out what they really are.

It’s incredible to think that never before has a mass-market technology used by billions of people been so little understood.

Why does that matter? Well, until we understand them better we won’t know exactly what they can and can’t do. We won’t know how to control their behavior. We won’t fully understand hallucinations.

5. AGI doesn’t mean anything.

Not long ago, talk of AGI was fringe, and mainstream researchers were embarrassed to bring it up. But as AI has got better and far more lucrative, serious people are happy to insist they’re about to create it. Whatever it is.

AGI—or artificial general intelligence—has come to mean something like: AI that can match the performance of humans on a wide range of cognitive tasks.

But what does that mean? How do we measure performance? Which humans? How wide a range of tasks? And performance on cognitive tasks is just another way of saying intelligence—so the definition is circular anyway.

Essentially, when people refer to AGI they now tend to just mean AI, but better than what we have today.

There’s this absolute faith in the progress of AI. It’s gotten better in the past, so it will continue to get better. But there is zero evidence that this will actually play out. 

So where does that leave us? We are building machines that are getting very good at mimicking some of the things people do, but the technology still has serious flaws. And we’re only just figuring out how it actually works.

Here’s how I think about AI: We have built machines with humanlike behavior, but we haven’t shrugged off the habit of imagining a humanlike mind behind them. This leads to exaggerated assumptions about what AI can do and plays into the wider culture wars between techno-optimists and techno-skeptics.

It’s right to be amazed by this technology. It’s also right to be skeptical of many of the things said about it. It’s still very early days, and it’s all up for grabs.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

AI’s giants want to take over the classroom

School’s out and it’s high summer, but a bunch of teachers are plotting how they’re going to use AI this upcoming school year. God help them. 

On July 8, OpenAI, Microsoft, and Anthropic announced a $23 million partnership with one of the largest teachers’ unions in the United States to bring more AI into K–12 classrooms. Called the National Academy for AI Instruction, the initiative will train teachers at a New York City headquarters on how to use AI both for teaching and for tasks like planning lessons and writing reports, starting this fall

The companies could face an uphill battle. Right now, most of the public perceives AI’s use in the classroom as nothing short of ruinous—a surefire way to dampen critical thinking and hasten the decline of our collective attention span (a viral story from New York magazine, for example, described how easy it now is to coast through college thanks to constant access to ChatGPT). 

Amid that onslaught, AI companies insist that AI promises more individualized learning, faster and more creative lesson planning, and quicker grading. The companies sponsoring this initiative are, of course, not doing it out of the goodness of their hearts.

No—as they hunt for profits, their goal is to make users out of teachers and students. Anthropic is pitching its AI models to universities, and OpenAI offers free courses for teachers. In an initial training session for teachers by the new National Academy for AI Instruction, representatives from Microsoft showed teachers how to use the company’s AI tools for lesson planning and emails, according to the New York Times

It’s early days, but what does the evidence actually say about whether AI is helping or hurting students? There’s at least some data to support the case made by tech companies: A recent survey of 1,500 teens conducted by Harvard’s Graduate School of Education showed that kids are using AI to brainstorm and answer questions they’re afraid to ask in the classroom. Studies examining settings ranging from math classes in Nigeria to colleges physics courses at Harvard have suggested that AI tutors can lead students to become more engaged. 

And yet there’s more to the story. The same Harvard survey revealed that kids are also frequently using AI for cheating and shortcuts. And an oft-cited paper from Microsoft found that relying on AI can reduce critical thinking. Not to mention the fact that “hallucinations” of incorrect information are an inevitable part of how large language models work.

There’s a lack of clear evidence that AI can be a net benefit for students, and it’s hard to trust that the AI companies funding this initiative will give honest advice on when not to use AI in the classroom.

Despite the fanfare around the academy’s launch, and the fact the first teacher training is scheduled to take place in just a few months, OpenAI and Anthropic told me they couldn’t share any specifics. 

It’s not as if teachers themselves aren’t already grappling with how to approach AI. One such teacher, Christopher Harris, who leads a library system covering 22 rural school districts in New York, has created a curriculum aimed at AI literacy. Topics range from privacy when using smart speakers (a lesson for second graders) to misinformation and deepfakes (instruction for high schoolers). I asked him what he’d like to see in the curriculum used by the new National Academy for AI Instruction.

“The real outcome should be teachers that are confident enough in their understanding of how AI works and how it can be used as a tool that they can teach students about the technology as well,” he says. The thing to avoid would be overfocusing on tools and pre-built prompts that teachers are instructed to use without knowing how they work. 

But all this will be for naught without an adjustment to how schools evaluate students in the age of AI, Harris says: “The bigger issue will be shifting the fundamental approaches to how we assign and assess student work in the face of AI cheating.”

The new initiative is led by the American Federation of Teachers, which represents 1.8 million members, as well as the United Federation of Teachers, which represents 200,000 members in New York. If they win over these groups, the tech companies will have significant influence over how millions of teachers learn about AI. But some educators are resisting the use of AI entirely, including several hundred who signed an open letter last week.

Helen Choi is one of them. “I think it is incumbent upon educators to scrutinize the tools that they use in the classroom to look past hype,” says Choi, an associate professor at the University of Southern California, where she teaches writing. “Until we know that something is useful, safe, and ethical, we have a duty to resist mass adoption of tools like large language models that are not designed by educators with education in mind.”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How scientists are trying to use AI to unlock the human mind 

Today’s AI landscape is defined by the ways in which neural networks are unlike human brains. A toddler learns how to communicate effectively with only a thousand calories a day and regular conversation; meanwhile, tech companies are reopening nuclear power plants, polluting marginalized communities, and pirating terabytes of books in order to train and run their LLMs.

But neural networks are, after all, neural—they’re inspired by brains. Despite their vastly different appetites for energy and data, large language models and human brains do share a good deal in common. They’re both made up of millions of subcomponents: biological neurons in the case of the brain, simulated “neurons” in the case of networks. They’re the only two things on Earth that can fluently and flexibly produce language. And scientists barely understand how either of them works.

I can testify to those similarities: I came to journalism, and to AI, by way of six years of neuroscience graduate school. It’s a common view among neuroscientists that building brainlike neural networks is one of the most promising paths for the field, and that attitude has started to spread to psychology. Last week, the prestigious journal Nature published a pair of studies showcasing the use of neural networks for predicting how humans and other animals behave in psychological experiments. Both studies propose that these trained networks could help scientists advance their understanding of the human mind. But predicting a behavior and explaining how it came about are two very different things.

In one of the studies, researchers transformed a large language model into what they refer to as a “foundation model of human cognition.” Out of the box, large language models aren’t great at mimicking human behavior—they behave logically in settings where humans abandon reason, such as casinos. So the researchers fine-tuned Llama 3.1, one of Meta’s open-source LLMs, on data from a range of 160 psychology experiments, which involved tasks like choosing from a set of “slot machines” to get the maximum payout or remembering sequences of letters. They called the resulting model Centaur.

Compared with conventional psychological models, which use simple math equations, Centaur did a far better job of predicting behavior. Accurate predictions of how humans respond in psychology experiments are valuable in and of themselves: For example, scientists could use Centaur to pilot their experiments on a computer before recruiting, and paying, human participants. In their paper, however, the researchers propose that Centaur could be more than just a prediction machine. By interrogating the mechanisms that allow Centaur to effectively replicate human behavior, they argue, scientists could develop new theories about the inner workings of the mind.

But some psychologists doubt whether Centaur can tell us much about the mind at all. Sure, it’s better than conventional psychological models at predicting how humans behave—but it also has a billion times more parameters. And just because a model behaves like a human on the outside doesn’t mean that it functions like one on the inside. Olivia Guest, an assistant professor of computational cognitive science at Radboud University in the Netherlands, compares Centaur to a calculator, which can effectively predict the response a math whiz will give when asked to add two numbers. “I don’t know what you would learn about human addition by studying a calculator,” she says.

Even if Centaur does capture something important about human psychology, scientists may struggle to extract any insight from the model’s millions of neurons. Though AI researchers are working hard to figure out how large language models work, they’ve barely managed to crack open the black box. Understanding an enormous neural-network model of the human mind may not prove much easier than understanding the thing itself.

One alternative approach is to go small. The second of the two Nature studies focuses on minuscule neural networks—some containing only a single neuron—that nevertheless can predict behavior in mice, rats, monkeys, and even humans. Because the networks are so small, it’s possible to track the activity of each individual neuron and use that data to figure out how the network is producing its behavioral predictions. And while there’s no guarantee that these models function like the brains they were trained to mimic, they can, at the very least, generate testable hypotheses about human and animal cognition.

There’s a cost to comprehensibility. Unlike Centaur, which was trained to mimic human behavior in dozens of different tasks, each tiny network can only predict behavior in one specific task. One network, for example, is specialized for making predictions about how people choose among different slot machines. “If the behavior is really complex, you need a large network,” says Marcelo Mattar, an assistant professor of psychology and neural science at New York University who led the tiny-network study and also contributed to Centaur. “The compromise, of course, is that now understanding it is very, very difficult.”

This trade-off between prediction and understanding is a key feature of neural-network-driven science. (I also happen to be writing a book about it.) Studies like Mattar’s are making some progress toward closing that gap—as tiny as his networks are, they can predict behavior more accurately than traditional psychological models. So is the research into LLM interpretability happening at places like Anthropic. For now, however, our understanding of complex systems—from humans to climate systems to proteins—is lagging farther and farther behind our ability to make predictions about them.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

What comes next for AI copyright lawsuits?

Last week, the technology companies Anthropic and Meta each won landmark victories in two separate court cases that examined whether or not the firms had violated copyright when they trained their large language models on copyrighted books without permission. The rulings are the first we’ve seen to come out of copyright cases of this kind. This is a big deal!

The use of copyrighted works to train models is at the heart of a bitter battle between tech companies and content creators. That battle is playing out in technical arguments about what does and doesn’t count as fair use of a copyrighted work. But it is ultimately about carving out a space in which human and machine creativity can continue to coexist.

There are dozens of similar copyright lawsuits working through the courts right now, with cases filed against all the top players—not only Anthropic and Meta but Google, OpenAI, Microsoft, and more. On the other side, plaintiffs range from individual artists and authors to large companies like Getty and the New York Times.

The outcomes of these cases are set to have an enormous impact on the future of AI. In effect, they will decide whether or not model makers can continue ordering up a free lunch. If not, they will need to start paying for such training data via new kinds of licensing deals—or find new ways to train their models. Those prospects could upend the industry.

And that’s why last week’s wins for the technology companies matter. So: Cases closed? Not quite. If you drill into the details, the rulings are less cut-and-dried than they seem at first. Let’s take a closer look.

In both cases, a group of authors (the Anthropic suit was a class action; 13 plaintiffs sued Meta, including high-profile names such as Sarah Silverman and Ta-Nehisi Coates) set out to prove that a technology company had violated their copyright by using their books to train large language models. And in both cases, the companies argued that this training process counted as fair use, a legal provision that permits the use of copyrighted works for certain purposes.  

There the similarities end. Ruling in Anthropic’s favor, senior district judge William Alsup argued on June 23 that the firm’s use of the books was legal because what it did with them was transformative, meaning that it did not replace the original works but made something new from them. “The technology at issue was among the most transformative many of us will see in our lifetimes,” Alsup wrote in his judgment.

In Meta’s case, district judge Vince Chhabria made a different argument. He also sided with the technology company, but he focused his ruling instead on the issue of whether or not Meta had harmed the market for the authors’ work. Chhabria said that he thought Alsup had brushed aside the importance of market harm. “The key question in virtually any case where a defendant has copied someone’s original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original,” he wrote on June 25.

Same outcome; two very different rulings. And it’s not clear exactly what that means for the other cases. On the one hand, it bolsters at least two versions of the fair-use argument. On the other, there’s some disagreement over how fair use should be decided.

But there are even bigger things to note. Chhabria was very clear in his judgment that Meta won not because it was in the right, but because the plaintiffs failed to make a strong enough argument. “In the grand scheme of things, the consequences of this ruling are limited,” he wrote. “This is not a class action, so the ruling only affects the rights of these 13 authors—not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” That reads a lot like an invitation for anyone else out there with a grievance to come and have another go.   

And neither company is yet home free. Anthropic and Meta both face wholly separate allegations that not only did they train their models on copyrighted books, but the way they obtained those books was illegal because they downloaded them from pirated databases. Anthropic now faces another trial over these piracy claims. Meta has been ordered to begin a discussion with its accusers over how to handle the issue.

So where does that leave us? As the first rulings to come out of cases of this type, last week’s judgments will no doubt carry enormous weight. But they are also the first rulings of many. Arguments on both sides of the dispute are far from exhausted.

“These cases are a Rorschach test in that either side of the debate will see what they want to see out of the respective orders,” says Amir Ghavi, a lawyer at Paul Hastings who represents a range of technology companies in ongoing copyright lawsuits. He also points out that the first cases of this type were filed more than two years ago: “Factoring in likely appeals and the other 40+ pending cases, there is still a long way to go before the issue is settled by the courts.”

“I’m disappointed at these rulings,” says Tyler Chou, founder and CEO of Tyler Chou Law for Creators, a firm that represents some of the biggest names on YouTube. “I think plaintiffs were out-gunned and didn’t have the time or resources to bring the experts and data that the judges needed to see.”

But Chou thinks this is just the first round of many. Like Ghavi, she thinks these decisions will go to appeal. And after that we’ll see cases start to wind up in which technology companies have met their match: “Expect the next wave of plaintiffs—publishers, music labels, news organizations—to arrive with deep pockets,” she says. “That will be the real test of fair use in the AI era.”

But even when the dust has settled in the courtrooms—what then? The problem won’t have been solved. That’s because the core grievance of creatives, whether individuals or institutions, is not really that their copyright has been violated—copyright is just the legal hammer they have to hand. Their real complaint is that their livelihoods and business models are at risk of being undermined. And beyond that: when AI slop devalues creative effort, will people’s motivations for putting work out into the world start to fall away?

In that sense, these legal battles are set to shape all our futures. There’s still no good solution on the table for this wider problem. Everything is still to play for.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

This story has been edited to add comments from Tyler Chou.

What does it mean for an algorithm to be “fair”?

Back in February, I flew to Amsterdam to report on a high-stakes experiment the city had recently conducted: a pilot program for what it called Smart Check, which was its attempt to create an effective, fair, and unbiased predictive algorithm to try to detect welfare fraud. But the city fell short of its lofty goals—and, with our partners at Lighthouse Reports and the Dutch newspaper Trouw, we tried to get to the bottom of why. You can read about it in our deep dive published last week.

For an American reporter, it’s been an interesting time to write a story on “responsible AI” in a progressive European city—just as ethical considerations in AI deployments appear to be disappearing in the United States, at least at the national level. 

For example, a few weeks before my trip, the Trump administration rescinded Biden’s executive order on AI safety and DOGE began turning to AI to decide which federal programs to cut. Then, more recently, House Republicans passed a 10-year moratorium on US states’ ability to regulate AI (though it has yet to be passed by the Senate). 

What all this points to is a new reality in the United States where responsible AI is no longer a priority (if it ever genuinely was). 

But this has also made me think more deeply about the stakes of deploying AI in situations that directly affect human lives, and about what success would even look like. 

When Amsterdam’s welfare department began developing the algorithm that became Smart Check, the municipality followed virtually every recommendation in the responsible-AI playbook: consulting external experts, running bias tests, implementing technical safeguards, and seeking stakeholder feedback. City officials hoped the resulting algorithm could avoid causing the worst types of harm inflicted by discriminatory AI over nearly a decade. 

After talking to a large number of people involved in the project and others who would potentially be affected by it, as well as some experts who did not work on it, it’s hard not to wonder if the city could ever have succeeded in its goals when neither “fairness” nor even “bias” has a universally agreed-upon definition. The city was treating these issues as technical ones that could be answered by reweighting numbers and figures—rather than political and philosophical questions that society as a whole has to grapple with.

On the afternoon that I arrived in Amsterdam, I sat down with Anke van der Vliet, a longtime advocate for welfare beneficiaries who served on what’s called the Participation Council, a 15-member citizen body that represents benefits recipients and their advocates.

The city had consulted the council during Smart Check’s development, but van der Vliet was blunt in sharing the committee’s criticisms of the plans. Its members simply didn’t want the program. They had well-placed fears of discrimination and disproportionate impact, given that fraud is found in only 3% of applications.

To the city’s credit, it did respond to some of their concerns and make changes in the algorithm’s design—like removing from consideration factors, such as age, whose inclusion could have had a discriminatory impact. But the city ignored the Participation Council’s main feedback: its recommendation to stop development altogether. 

Van der Vliet and other welfare advocates I met on my trip, like representatives from the Amsterdam Welfare Union, described what they see as a number of challenges faced by the city’s some 35,000 benefits recipients: the indignities of having to constantly re-prove the need for benefits, the increases in cost of living that benefits payments do not reflect, and the general feeling of distrust between recipients and the government. 

City welfare officials themselves recognize the flaws of the system, which “is held together by rubber bands and staples,” as Harry Bodaar, a senior policy advisor to the city who focuses on welfare fraud enforcement, told us. “And if you’re at the bottom of that system, you’re the first to fall through the cracks.”

So the Participation Council didn’t want Smart Check at all, even as Bodaar and others working in the department hoped that it could fix the system. It’s a classic example of a “wicked problem,” a social or cultural issue with no one clear answer and many potential consequences. 

After the story was published, I heard from Suresh Venkatasubramanian, a former tech advisor to the White House Office of Science and Technology Policy who co-wrote Biden’s AI Bill of Rights (now rescinded by Trump). “We need participation early on from communities,” he said, but he added that it also matters what officials do with the feedback—and whether there is “a willingness to reframe the intervention based on what people actually want.” 

Had the city started with a different question—what people actually want—perhaps it might have developed a different algorithm entirely. As the Dutch digital rights advocate Hans De Zwart put it to us, “We are being seduced by technological solutions for the wrong problems … why doesn’t the municipality build an algorithm that searches for people who do not apply for social assistance but are entitled to it?” 

These are the kinds of fundamental questions AI developers will need to consider, or they run the risk of repeating (or ignoring) the same mistakes over and over again.

Venkatasubramanian told me he found the story to be “affirming” in highlighting the need for “those in charge of governing these systems”  to “ask hard questions … starting with whether they should be used at all.”

But he also called the story “humbling”: “Even with good intentions, and a desire to benefit from all the research on responsible AI, it’s still possible to build systems that are fundamentally flawed, for reasons that go well beyond the details of the system constructions.” 

To better understand this debate, read our full story here. And if you want more detail on how we ran our own bias tests after the city gave us unprecedented access to the Smart Check algorithm, check out the methodology over at Lighthouse. (For any Dutch speakers out there, here’s the companion story in Trouw.) Thanks to the Pulitzer Center for supporting our reporting. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The Pentagon is gutting the team that tests AI and weapons systems

The Trump administration’s chainsaw approach to federal spending lives on, even as Elon Musk turns on the president. On May 28, Secretary of Defense Pete Hegseth announced he’d be gutting a key office at the Department of Defense responsible for testing and evaluating the safety of weapons and AI systems.

As part of a string of moves aimed at “reducing bloated bureaucracy and wasteful spending in favor of increased lethality,” Hegseth cut the size of the Office of the Director of Operational Test and Evaluation in half. The group was established in the 1980s—following orders from Congress—after criticisms that the Pentagon was fielding weapons and systems that didn’t perform as safely or effectively as advertised. Hegseth is reducing the agency’s staff to about 45, down from 94, and firing and replacing its director. He gave the office just seven days to implement the changes.

It is a significant overhaul of a department that in 40 years has never before been placed so squarely on the chopping block. Here’s how today’s defense tech companies, which have fostered close connections to the Trump administration, stand to gain, and why safety testing might suffer as a result. 

The Operational Test and Evaluation office is “the last gate before a technology gets to the field,” says Missy Cummings, a former fighter pilot for the US Navy who is now a professor of engineering and computer science at George Mason University. Though the military can do small experiments with new systems without running it by the office, it has to test anything that gets fielded at scale.

“In a bipartisan way—up until now—everybody has seen it’s working to help reduce waste, fraud, and abuse,” she says. That’s because it provides an independent check on companies’ and contractors’ claims about how well their technology works. It also aims to expose the systems to more rigorous safety testing.

The gutting comes at a particularly pivotal time for AI and military adoption: The Pentagon is experimenting with putting AI into everything, mainstream companies like OpenAI are now more comfortable working with the military, and defense giants like Anduril are winning big contracts to launch AI systems (last Thursday, Anduril announced a whopping $2.5 billion funding round, doubling its valuation to over $30 billion). 

Hegseth claims his cuts will “make testing and fielding weapons more efficient,” saving $300 million. But Cummings is concerned that they are paving a way to faster adoption while increasing the chances that new systems won’t be as safe or effective as promised. “The firings in DOTE send a clear message that all perceived obstacles for companies favored by Trump are going to be removed,” she says.

Anduril and Anthropic, which have launched AI applications for military use, did not respond to my questions about whether they pushed for or approve of the cuts. A representative for OpenAI said that the company was not involved in lobbying for the restructuring. 

“The cuts make me nervous,” says Mark Cancian, a senior advisor at the Center for Strategic and International Studies who previously worked at the Pentagon in collaboration with the testing office. “It’s not that we’ll go from effective to ineffective, but you might not catch some of the problems that would surface in combat without this testing step.”

It’s hard to say precisely how the cuts will affect the office’s ability to test systems, and Cancian admits that those responsible for getting new technologies out onto the battlefield sometimes complain that it can really slow down adoption. But still, he says, the office frequently uncovers errors that weren’t previously caught.

It’s an especially important step, Cancian says, whenever the military is adopting a new type of technology like generative AI. Systems that might perform well in a lab setting almost always encounter new challenges in more realistic scenarios, and the Operational Test and Evaluation group is where that rubber meets the road.

So what to make of all this? It’s true that the military was experimenting with artificial intelligence long before the current AI boom, particularly with computer vision for drone feeds, and defense tech companies have been winning big contracts for this push across multiple presidential administrations. But this era is different. The Pentagon is announcing ambitious pilots specifically for large language models, a relatively nascent technology that by its very nature produces hallucinations and errors, and it appears eager to put much-hyped AI into everything. The key independent group dedicated to evaluating the accuracy of these new and complex systems now only has half the staff to do it. I’m not sure that’s a win for anyone.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Inside the effort to tally AI’s energy appetite

After working on it for months, my colleague Casey Crownhart and I finally saw our story on AI’s energy and emissions burden go live last week. 

The initial goal sounded simple: Calculate how much energy is used each time we interact with a chatbot, and then tally that up to understand why everyone from leaders of AI companies to officials at the White House wants to harness unprecedented levels of electricity to power AI and reshape our energy grids in the process. 

It was, of course, not so simple. After speaking with dozens of researchers, we realized that the common understanding of AI’s energy appetite is full of holes. I encourage you to read the full story, which has some incredible graphics to help you understand everything from the energy used in a single query right up to what AI will require just three years from now (enough electricity to power 22% of US households, it turns out). But here are three takeaways I have after the project. 

AI is in its infancy

We focused on measuring the energy requirements that go into using a chatbot, generating an image, and creating a video with AI. But these three uses are relatively small-scale compared with where AI is headed next. 

Lots of AI companies are building reasoning models, which “think” for longer and use more energy. They’re building hardware devices, perhaps like the one Jony Ive has been working on (which OpenAI just acquired for $6.5 billion), that have AI constantly humming along in the background of our conversations. They’re designing agents and digital clones of us to act on our behalf. All these trends point to a more energy-intensive future (which, again, helps explain why OpenAI and others are spending such inconceivable amounts of money on energy). 

But the fact that AI is in its infancy raises another point. The models, chips, and cooling methods behind this AI revolution could all grow more efficient over time, as my colleague Will Douglas Heaven explains. This future isn’t predetermined.

AI video is on another level

When we tested the energy demands of various models, we found the energy required to produce even a low-quality, five-second video to be pretty shocking: It was 42,000 times more than the amount needed for a chatbot answer a question about a recipe, and enough to power a microwave for over an hour. If there’s one type of AI whose energy appetite should worry you, it’s this one. 

Soon after we published, Google debuted the latest iteration of its Veo model. People quickly created compilations of the most impressive clips (this one being the most shocking to me). Something we point out in the story is that Google (as well as OpenAI, which has its own video generator, Sora) denied our request for specific numbers on the energy their AI models use. Nonetheless, our reporting suggests it’s very likely that high-definition video models like Veo and Sora are much larger, and much more energy-demanding, than the models we tested. 

I think the key to whether the use of AI video will produce indefensible clouds of emissions in the near future will be how it’s used, and how it’s priced. The example I linked shows a bunch of TikTok-style content, and I predict that if creating AI video is cheap enough, social video sites will be inundated with this type of content. 

There are more important questions than your own individual footprint

We expected that a lot of readers would understandably think about this story in terms of their own individual footprint, wondering whether their AI usage is contributing to the climate crisis. Don’t panic: It’s likely that asking a chatbot for help with a travel plan does not meaningfully increase your carbon footprint. Video generation might. But after reporting on this for months, I think there are more important questions.

Consider, for example, the water being drained from aquifers in Nevada, the country’s driest state, to power data centers that are drawn to the area by tax incentives and easy permitting processes, as detailed in an incredible story by James Temple. Or look at how Meta’s largest data center project, in Louisiana, is relying on natural gas despite industry promises to use clean energy, per a story by David Rotman. Or the fact that nuclear energy is not the silver bullet that AI companies often make it out to be. 

There are global forces shaping how much energy AI companies are able to access and what types of sources will provide it. There is also very little transparency from leading AI companies on their current and future energy demands, even while they’re asking for public support for these plans. Pondering your individual footprint can be a good thing to do, provided you remember that it’s not so much your footprint as these other factors that are keeping climate researchers and energy experts we spoke to up at night.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How AI is introducing errors into courtrooms

It’s been quite a couple weeks for stories about AI in the courtroom. You might have heard about the deceased victim of a road rage incident whose family created an AI avatar of him to show as an impact statement (possibly the first time this has been done in the US). But there’s a bigger, far more consequential controversy brewing, legal experts say. AI hallucinations are cropping up more and more in legal filings. And it’s starting to infuriate judges. Just consider these three cases, each of which gives a glimpse into what we can expect to see more of as lawyers embrace AI.

A few weeks ago, a California judge, Michael Wilner, became intrigued by a set of arguments some lawyers made in a filing. He went to learn more about those arguments by following the articles they cited. But the articles didn’t exist. He asked the lawyers’ firm for more details, and they responded with a new brief that contained even more mistakes than the first. Wilner ordered the attorneys to give sworn testimonies explaining the mistakes, in which he learned that one of them, from the elite firm Ellis George, used Google Gemini as well as law-specific AI models to help write the document, which generated false information. As detailed in a filing on May 6, the judge fined the firm $31,000. 

Last week, another California-based judge caught another hallucination in a court filing, this time submitted by the AI company Anthropic in the lawsuit that record labels have brought against it over copyright issues. One of Anthropic’s lawyers had asked the company’s AI model Claude to create a citation for a legal article, but Claude included the wrong title and author. Anthropic’s attorney admitted that the mistake was not caught by anyone reviewing the document. 

Lastly, and perhaps most concerning, is a case unfolding in Israel. After police arrested an individual on charges of money laundering, Israeli prosecutors submitted a request asking a judge for permission to keep the individual’s phone as evidence. But they cited laws that don’t exist, prompting the defendant’s attorney to accuse them of including AI hallucinations in their request. The prosecutors, according to Israeli news outlets, admitted that this was the case, receiving a scolding from the judge. 

Taken together, these cases point to a serious problem. Courts rely on documents that are accurate and backed up with citations—two traits that AI models, despite being adopted by lawyers eager to save time, often fail miserably to deliver. 

Those mistakes are getting caught (for now), but it’s not a stretch to imagine that at some point soon, a judge’s decision will be influenced by something that’s totally made up by AI, and no one will catch it. 

I spoke with Maura Grossman, who teaches at the School of Computer Science at the University of Waterloo as well as Osgoode Hall Law School, and has been a vocal early critic of the problems that generative AI poses for courts. She wrote about the problem back in 2023, when the first cases of hallucinations started appearing. She said she thought courts’ existing rules requiring lawyers to vet what they submit to the courts, combined with the bad publicity those cases attracted, would put a stop to the problem. That hasn’t panned out.

Hallucinations “don’t seem to have slowed down,” she says. “If anything, they’ve sped up.” And these aren’t one-off cases with obscure local firms, she says. These are big-time lawyers making significant, embarrassing mistakes with AI. She worries that such mistakes are also cropping up more in documents not written by lawyers themselves, like expert reports (in December, a Stanford professor and expert on AI admitted to including AI-generated mistakes in his testimony).  

I told Grossman that I find all this a little surprising. Attorneys, more than most, are obsessed with diction. They choose their words with precision. Why are so many getting caught making these mistakes?

“Lawyers fall in two camps,” she says. “The first are scared to death and don’t want to use it at all.” But then there are the early adopters. These are lawyers tight on time or without a cadre of other lawyers to help with a brief. They’re eager for technology that can help them write documents under tight deadlines. And their checks on the AI’s work aren’t always thorough. 

The fact that high-powered lawyers, whose very profession it is to scrutinize language, keep getting caught making mistakes introduced by AI says something about how most of us treat the technology right now. We’re told repeatedly that AI makes mistakes, but language models also feel a bit like magic. We put in a complicated question and receive what sounds like a thoughtful, intelligent reply. Over time, AI models develop a veneer of authority. We trust them.

“We assume that because these large language models are so fluent, it also means that they’re accurate,” Grossman says. “We all sort of slip into that trusting mode because it sounds authoritative.” Attorneys are used to checking the work of junior attorneys and interns but for some reason, Grossman says, don’t apply this skepticism to AI.

We’ve known about this problem ever since ChatGPT launched nearly three years ago, but the recommended solution has not evolved much since then: Don’t trust everything you read, and vet what an AI model tells you. As AI models get thrust into so many different tools we use, I increasingly find this to be an unsatisfying counter to one of AI’s most foundational flaws.

Hallucinations are inherent to the way that large language models work. Despite that, companies are selling generative AI tools made for lawyers that claim to be reliably accurate. “Feel confident your research is accurate and complete,” reads the website for Westlaw Precision, and the website for CoCounsel promises its AI is “backed by authoritative content.” That didn’t stop their client, Ellis George, from being fined $31,000.

Increasingly, I have sympathy for people who trust AI more than they should. We are, after all, living in a time when the people building this technology are telling us that AI is so powerful it should be treated like nuclear weapons. Models have learned from nearly every word humanity has ever written down and are infiltrating our online life. If people shouldn’t trust everything AI models say, they probably deserve to be reminded of that a little more often by the companies building them. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Police tech can sidestep facial recognition bans now

Six months ago I attended the largest gathering of chiefs of police in the US to see how they’re using AI. I found some big developments, like officers getting AI to write their police reports. Today, I published a new story that shows just how far AI for police has developed since then. 

It’s about a new method police departments and federal agencies have found to track people: an AI tool that uses attributes like body size, gender, hair color and style, clothing, and accessories instead of faces. It offers a way around laws curbing the use of facial recognition, which are on the rise. 

Advocates from the ACLU, after learning of the tool through MIT Technology Review, said it was the first instance they’d seen of such a tracking system used at scale in the US, and they say it has a high potential for abuse by federal agencies. They say the prospect that AI will enable more powerful surveillance is especially alarming at a time when the Trump administration is pushing for more monitoring of protesters, immigrants, and students. 

I hope you read the full story for the details, and to watch a demo video of how the system works. But first, let’s talk for a moment about what this tells us about the development of police tech and what rules, if any, these departments are subject to in the age of AI.

As I pointed out in my story six months ago, police departments in the US have extraordinary independence. There are more than 18,000 departments in the country, and they generally have lots of discretion over what technology they spend their budgets on. In recent years, that technology has increasingly become AI-centric. 

Companies like Flock and Axon sell suites of sensors—cameras, license plate readers, gunshot detectors, drones—and then offer AI tools to make sense of that ocean of data (at last year’s conference I saw schmoozing between countless AI-for-police startups and the chiefs they sell to on the expo floor). Departments say these technologies save time, ease officer shortages, and help cut down on response times. 

Those sound like fine goals, but this pace of adoption raises an obvious question: Who makes the rules here? When does the use of AI cross over from efficiency into surveillance, and what type of transparency is owed to the public?

In some cases, AI-powered police tech is already driving a wedge between departments and the communities they serve. When the police in Chula Vista, California, were the first in the country to get special waivers from the Federal Aviation Administration to fly their drones farther than normal, they said the drones would be deployed to solve crimes and get people help sooner in emergencies. They’ve had some successes

But the department has also been sued by a local media outlet alleging it has reneged on its promise to make drone footage public, and residents have said the drones buzzing overhead feel like an invasion of privacy. An investigation found that these drones were deployed more often in poor neighborhoods, and for minor issues like loud music. 

Jay Stanley, a senior policy analyst at the ACLU, says there’s no overarching federal law that governs how local police departments adopt technologies like the tracking software I wrote about. Departments usually have the leeway to try it first, and see how their communities react after the fact. (Veritone, which makes the tool I wrote about, said they couldn’t name or connect me with departments using it so the details of how it’s being deployed by police are not yet clear). 

Sometimes communities take a firm stand; local laws against police use of facial recognition have been passed around the country. But departments—or the police tech companies they buy from—can find workarounds. Stanley says the new tracking software I wrote about poses lots of the same issues as facial recognition while escaping scrutiny because it doesn’t technically use biometric data.

“The community should be very skeptical of this kind of tech and, at a minimum, ask a lot of questions,” he says. He laid out a road map of what police departments should do before they adopt AI technologies: have hearings with the public, get community permission, and make promises about how the systems will and will not be used. He added that the companies making this tech should also allow it to be tested by independent parties. 

“This is all coming down the pike,” he says—and so quickly that policymakers and the public have little time to keep up. He adds, “Are these powers we want the police—the authorities that serve us—to have, and if so, under what conditions?”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Why the humanoid workforce is running late

On Thursday I watched Daniela Rus, one of the world’s top experts on AI-powered robots, address a packed room at a Boston robotics expo. Rus spent a portion of her talk busting the notion that giant fleets of humanoids are already making themselves useful in manufacturing and warehouses around the world. 

That might come as a surprise. For years AI has made it faster to train robots, and investors have responded feverishly. Figure AI, a startup that aims to build general-purpose humanoid robots for both homes and industry, is looking at a $1.5 billion funding round (more on Figure shortly), and there are commercial experiments with humanoids at Amazon and auto manufacturers. Bank of America predicts wider adoption of these robots around the corner, with a billion humanoids at work by 2050.

But Rus and many others I spoke with at the expo suggest that this hype just doesn’t add up.

Humanoids “are mostly not intelligent,” she said. Rus showed a video of herself speaking to an advanced humanoid that smoothly followed her instruction to pick up a watering can and water a nearby plant. It was impressive. But when she asked it to “water” her friend, the robot did not consider that humans don’t need watering like plants and moved to douse the person. “These robots lack common sense,” she said. 

I also spoke with Pras Velagapudi, the chief technology officer of Agility Robotics, who detailed physical limitations the company has to overcome too. To be strong, a humanoid needs a lot of power and a big battery. The stronger you make it and the heavier it is, the less time it can run without charging, and the more you need to worry about safety. A robot like this is also complex to manufacture.

Some impressive humanoid demos don’t overcome these core constraints as much as they display other impressive features: nimble robotic hands, for instance, or the ability to converse with people via a large language model. But these capabilities don’t necessarily translate well to the jobs that humanoids are supposed to be taking over (it’s more useful to program a long list of detailed instructions for a robot to follow than to speak to it, for example). 

This is not to say fleets of humanoids won’t ever join our workplaces, but rather that the adoption of the technology will likely be drawn out, industry specific, and slow. It’s related to what I wrote about last week: To people who consider AI a “normal” technology, rather than a utopian or dystopian one, this all makes sense. The technology that succeeds in an isolated lab setting will appear very different from the one that gets commercially adopted at scale. 

All of this sets the scene for what happened with one of the biggest names in robotics last week. Figure AI has raised a tremendous amount of investment for its humanoids, and founder Brett Adcock claimed on X in March that the company was the “most sought-after private stock in the secondary market.” Its most publicized work is with BMW, and Adcock has shown videos of Figure’s robots working to move parts for the automaker, saying that the partnership took just 12 months to launch. Adcock and Figure have generally not responded to media requests and don’t make the rounds at typical robot trade shows. 

In April, Fortune published an article quoting a spokesperson from BMW, alleging that the pair’s partnership involves fewer robots at a smaller scale than Figure has implied. On April 25, Adcock posted on LinkedIn that “Figure’s litigation counsel will aggressively pursue all available legal remedies—including, but not limited to, defamation claims—to correct the publication’s blatant misstatements.” The author of the Fortune article did not respond to my request for comment, and a representative for Adcock and Figure declined to say what parts of the article were inaccurate. The representative pointed me to Adcock’s statement, which lacks details. 

The specifics of Figure aside, I think this conflict is quite indicative of the tech moment we’re in. A frenzied venture capital market—buoyed by messages like the statement from Nvidia CEO Jensen Huang that “physical AI” is the future—is betting that humanoids will create the largest market for robotics the field has ever seen, and that someday they will essentially be capable of most physical work. 

But achieving that means passing countless hurdles. We’ll need safety regulations for humans working alongside humanoids that don’t even exist yet. Deploying such robots successfully in one industry, like automotive, may not lead to success in others. We’ll have to hope that AI will solve lots of problems along the way. These are all tll things that roboticists have reason to be skeptical about. 

Roboticists, from what I’ve seen, are normally a patient bunch. The first Roomba launched more than a decade after its conception, and it took more than 50 years to go from the first robotic arm ever to the millionth in production. Venture capitalists, on the other hand, are not known for such patience. 

Perhaps that’s why Bank of America’s new prediction of widespread humanoid adoption was met with enthusiasm by investors but enormous skepticism by roboticists. Aaron Prather, a director at the robotics standards organization ASTM, said on Thursday that the projections were “wildly off-base.” 

As we’ve covered before, humanoid hype is a cycle: One slick video raises the expectations of investors, which then incentivizes competitors to make even slicker videos. This makes it quite hard for anyone—a tech journalist, say—to peel back the curtain and find out how much impact humanoids are poised to have on the workforce. But I’ll do my darndest.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.