How scientists are trying to use AI to unlock the human mind 

Today’s AI landscape is defined by the ways in which neural networks are unlike human brains. A toddler learns how to communicate effectively with only a thousand calories a day and regular conversation; meanwhile, tech companies are reopening nuclear power plants, polluting marginalized communities, and pirating terabytes of books in order to train and run their LLMs.

But neural networks are, after all, neural—they’re inspired by brains. Despite their vastly different appetites for energy and data, large language models and human brains do share a good deal in common. They’re both made up of millions of subcomponents: biological neurons in the case of the brain, simulated “neurons” in the case of networks. They’re the only two things on Earth that can fluently and flexibly produce language. And scientists barely understand how either of them works.

I can testify to those similarities: I came to journalism, and to AI, by way of six years of neuroscience graduate school. It’s a common view among neuroscientists that building brainlike neural networks is one of the most promising paths for the field, and that attitude has started to spread to psychology. Last week, the prestigious journal Nature published a pair of studies showcasing the use of neural networks for predicting how humans and other animals behave in psychological experiments. Both studies propose that these trained networks could help scientists advance their understanding of the human mind. But predicting a behavior and explaining how it came about are two very different things.

In one of the studies, researchers transformed a large language model into what they refer to as a “foundation model of human cognition.” Out of the box, large language models aren’t great at mimicking human behavior—they behave logically in settings where humans abandon reason, such as casinos. So the researchers fine-tuned Llama 3.1, one of Meta’s open-source LLMs, on data from a range of 160 psychology experiments, which involved tasks like choosing from a set of “slot machines” to get the maximum payout or remembering sequences of letters. They called the resulting model Centaur.

Compared with conventional psychological models, which use simple math equations, Centaur did a far better job of predicting behavior. Accurate predictions of how humans respond in psychology experiments are valuable in and of themselves: For example, scientists could use Centaur to pilot their experiments on a computer before recruiting, and paying, human participants. In their paper, however, the researchers propose that Centaur could be more than just a prediction machine. By interrogating the mechanisms that allow Centaur to effectively replicate human behavior, they argue, scientists could develop new theories about the inner workings of the mind.

But some psychologists doubt whether Centaur can tell us much about the mind at all. Sure, it’s better than conventional psychological models at predicting how humans behave—but it also has a billion times more parameters. And just because a model behaves like a human on the outside doesn’t mean that it functions like one on the inside. Olivia Guest, an assistant professor of computational cognitive science at Radboud University in the Netherlands, compares Centaur to a calculator, which can effectively predict the response a math whiz will give when asked to add two numbers. “I don’t know what you would learn about human addition by studying a calculator,” she says.

Even if Centaur does capture something important about human psychology, scientists may struggle to extract any insight from the model’s millions of neurons. Though AI researchers are working hard to figure out how large language models work, they’ve barely managed to crack open the black box. Understanding an enormous neural-network model of the human mind may not prove much easier than understanding the thing itself.

One alternative approach is to go small. The second of the two Nature studies focuses on minuscule neural networks—some containing only a single neuron—that nevertheless can predict behavior in mice, rats, monkeys, and even humans. Because the networks are so small, it’s possible to track the activity of each individual neuron and use that data to figure out how the network is producing its behavioral predictions. And while there’s no guarantee that these models function like the brains they were trained to mimic, they can, at the very least, generate testable hypotheses about human and animal cognition.

There’s a cost to comprehensibility. Unlike Centaur, which was trained to mimic human behavior in dozens of different tasks, each tiny network can only predict behavior in one specific task. One network, for example, is specialized for making predictions about how people choose among different slot machines. “If the behavior is really complex, you need a large network,” says Marcelo Mattar, an assistant professor of psychology and neural science at New York University who led the tiny-network study and also contributed to Centaur. “The compromise, of course, is that now understanding it is very, very difficult.”

This trade-off between prediction and understanding is a key feature of neural-network-driven science. (I also happen to be writing a book about it.) Studies like Mattar’s are making some progress toward closing that gap—as tiny as his networks are, they can predict behavior more accurately than traditional psychological models. So is the research into LLM interpretability happening at places like Anthropic. For now, however, our understanding of complex systems—from humans to climate systems to proteins—is lagging farther and farther behind our ability to make predictions about them.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

What comes next for AI copyright lawsuits?

Last week, the technology companies Anthropic and Meta each won landmark victories in two separate court cases that examined whether or not the firms had violated copyright when they trained their large language models on copyrighted books without permission. The rulings are the first we’ve seen to come out of copyright cases of this kind. This is a big deal!

The use of copyrighted works to train models is at the heart of a bitter battle between tech companies and content creators. That battle is playing out in technical arguments about what does and doesn’t count as fair use of a copyrighted work. But it is ultimately about carving out a space in which human and machine creativity can continue to coexist.

There are dozens of similar copyright lawsuits working through the courts right now, with cases filed against all the top players—not only Anthropic and Meta but Google, OpenAI, Microsoft, and more. On the other side, plaintiffs range from individual artists and authors to large companies like Getty and the New York Times.

The outcomes of these cases are set to have an enormous impact on the future of AI. In effect, they will decide whether or not model makers can continue ordering up a free lunch. If not, they will need to start paying for such training data via new kinds of licensing deals—or find new ways to train their models. Those prospects could upend the industry.

And that’s why last week’s wins for the technology companies matter. So: Cases closed? Not quite. If you drill into the details, the rulings are less cut-and-dried than they seem at first. Let’s take a closer look.

In both cases, a group of authors (the Anthropic suit was a class action; 13 plaintiffs sued Meta, including high-profile names such as Sarah Silverman and Ta-Nehisi Coates) set out to prove that a technology company had violated their copyright by using their books to train large language models. And in both cases, the companies argued that this training process counted as fair use, a legal provision that permits the use of copyrighted works for certain purposes.  

There the similarities end. Ruling in Anthropic’s favor, senior district judge William Alsup argued on June 23 that the firm’s use of the books was legal because what it did with them was transformative, meaning that it did not replace the original works but made something new from them. “The technology at issue was among the most transformative many of us will see in our lifetimes,” Alsup wrote in his judgment.

In Meta’s case, district judge Vince Chhabria made a different argument. He also sided with the technology company, but he focused his ruling instead on the issue of whether or not Meta had harmed the market for the authors’ work. Chhabria said that he thought Alsup had brushed aside the importance of market harm. “The key question in virtually any case where a defendant has copied someone’s original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original,” he wrote on June 25.

Same outcome; two very different rulings. And it’s not clear exactly what that means for the other cases. On the one hand, it bolsters at least two versions of the fair-use argument. On the other, there’s some disagreement over how fair use should be decided.

But there are even bigger things to note. Chhabria was very clear in his judgment that Meta won not because it was in the right, but because the plaintiffs failed to make a strong enough argument. “In the grand scheme of things, the consequences of this ruling are limited,” he wrote. “This is not a class action, so the ruling only affects the rights of these 13 authors—not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” That reads a lot like an invitation for anyone else out there with a grievance to come and have another go.   

And neither company is yet home free. Anthropic and Meta both face wholly separate allegations that not only did they train their models on copyrighted books, but the way they obtained those books was illegal because they downloaded them from pirated databases. Anthropic now faces another trial over these piracy claims. Meta has been ordered to begin a discussion with its accusers over how to handle the issue.

So where does that leave us? As the first rulings to come out of cases of this type, last week’s judgments will no doubt carry enormous weight. But they are also the first rulings of many. Arguments on both sides of the dispute are far from exhausted.

“These cases are a Rorschach test in that either side of the debate will see what they want to see out of the respective orders,” says Amir Ghavi, a lawyer at Paul Hastings who represents a range of technology companies in ongoing copyright lawsuits. He also points out that the first cases of this type were filed more than two years ago: “Factoring in likely appeals and the other 40+ pending cases, there is still a long way to go before the issue is settled by the courts.”

“I’m disappointed at these rulings,” says Tyler Chou, founder and CEO of Tyler Chou Law for Creators, a firm that represents some of the biggest names on YouTube. “I think plaintiffs were out-gunned and didn’t have the time or resources to bring the experts and data that the judges needed to see.”

But Chou thinks this is just the first round of many. Like Ghavi, she thinks these decisions will go to appeal. And after that we’ll see cases start to wind up in which technology companies have met their match: “Expect the next wave of plaintiffs—publishers, music labels, news organizations—to arrive with deep pockets,” she says. “That will be the real test of fair use in the AI era.”

But even when the dust has settled in the courtrooms—what then? The problem won’t have been solved. That’s because the core grievance of creatives, whether individuals or institutions, is not really that their copyright has been violated—copyright is just the legal hammer they have to hand. Their real complaint is that their livelihoods and business models are at risk of being undermined. And beyond that: when AI slop devalues creative effort, will people’s motivations for putting work out into the world start to fall away?

In that sense, these legal battles are set to shape all our futures. There’s still no good solution on the table for this wider problem. Everything is still to play for.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

This story has been edited to add comments from Tyler Chou.

What does it mean for an algorithm to be “fair”?

Back in February, I flew to Amsterdam to report on a high-stakes experiment the city had recently conducted: a pilot program for what it called Smart Check, which was its attempt to create an effective, fair, and unbiased predictive algorithm to try to detect welfare fraud. But the city fell short of its lofty goals—and, with our partners at Lighthouse Reports and the Dutch newspaper Trouw, we tried to get to the bottom of why. You can read about it in our deep dive published last week.

For an American reporter, it’s been an interesting time to write a story on “responsible AI” in a progressive European city—just as ethical considerations in AI deployments appear to be disappearing in the United States, at least at the national level. 

For example, a few weeks before my trip, the Trump administration rescinded Biden’s executive order on AI safety and DOGE began turning to AI to decide which federal programs to cut. Then, more recently, House Republicans passed a 10-year moratorium on US states’ ability to regulate AI (though it has yet to be passed by the Senate). 

What all this points to is a new reality in the United States where responsible AI is no longer a priority (if it ever genuinely was). 

But this has also made me think more deeply about the stakes of deploying AI in situations that directly affect human lives, and about what success would even look like. 

When Amsterdam’s welfare department began developing the algorithm that became Smart Check, the municipality followed virtually every recommendation in the responsible-AI playbook: consulting external experts, running bias tests, implementing technical safeguards, and seeking stakeholder feedback. City officials hoped the resulting algorithm could avoid causing the worst types of harm inflicted by discriminatory AI over nearly a decade. 

After talking to a large number of people involved in the project and others who would potentially be affected by it, as well as some experts who did not work on it, it’s hard not to wonder if the city could ever have succeeded in its goals when neither “fairness” nor even “bias” has a universally agreed-upon definition. The city was treating these issues as technical ones that could be answered by reweighting numbers and figures—rather than political and philosophical questions that society as a whole has to grapple with.

On the afternoon that I arrived in Amsterdam, I sat down with Anke van der Vliet, a longtime advocate for welfare beneficiaries who served on what’s called the Participation Council, a 15-member citizen body that represents benefits recipients and their advocates.

The city had consulted the council during Smart Check’s development, but van der Vliet was blunt in sharing the committee’s criticisms of the plans. Its members simply didn’t want the program. They had well-placed fears of discrimination and disproportionate impact, given that fraud is found in only 3% of applications.

To the city’s credit, it did respond to some of their concerns and make changes in the algorithm’s design—like removing from consideration factors, such as age, whose inclusion could have had a discriminatory impact. But the city ignored the Participation Council’s main feedback: its recommendation to stop development altogether. 

Van der Vliet and other welfare advocates I met on my trip, like representatives from the Amsterdam Welfare Union, described what they see as a number of challenges faced by the city’s some 35,000 benefits recipients: the indignities of having to constantly re-prove the need for benefits, the increases in cost of living that benefits payments do not reflect, and the general feeling of distrust between recipients and the government. 

City welfare officials themselves recognize the flaws of the system, which “is held together by rubber bands and staples,” as Harry Bodaar, a senior policy advisor to the city who focuses on welfare fraud enforcement, told us. “And if you’re at the bottom of that system, you’re the first to fall through the cracks.”

So the Participation Council didn’t want Smart Check at all, even as Bodaar and others working in the department hoped that it could fix the system. It’s a classic example of a “wicked problem,” a social or cultural issue with no one clear answer and many potential consequences. 

After the story was published, I heard from Suresh Venkatasubramanian, a former tech advisor to the White House Office of Science and Technology Policy who co-wrote Biden’s AI Bill of Rights (now rescinded by Trump). “We need participation early on from communities,” he said, but he added that it also matters what officials do with the feedback—and whether there is “a willingness to reframe the intervention based on what people actually want.” 

Had the city started with a different question—what people actually want—perhaps it might have developed a different algorithm entirely. As the Dutch digital rights advocate Hans De Zwart put it to us, “We are being seduced by technological solutions for the wrong problems … why doesn’t the municipality build an algorithm that searches for people who do not apply for social assistance but are entitled to it?” 

These are the kinds of fundamental questions AI developers will need to consider, or they run the risk of repeating (or ignoring) the same mistakes over and over again.

Venkatasubramanian told me he found the story to be “affirming” in highlighting the need for “those in charge of governing these systems”  to “ask hard questions … starting with whether they should be used at all.”

But he also called the story “humbling”: “Even with good intentions, and a desire to benefit from all the research on responsible AI, it’s still possible to build systems that are fundamentally flawed, for reasons that go well beyond the details of the system constructions.” 

To better understand this debate, read our full story here. And if you want more detail on how we ran our own bias tests after the city gave us unprecedented access to the Smart Check algorithm, check out the methodology over at Lighthouse. (For any Dutch speakers out there, here’s the companion story in Trouw.) Thanks to the Pulitzer Center for supporting our reporting. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The Pentagon is gutting the team that tests AI and weapons systems

The Trump administration’s chainsaw approach to federal spending lives on, even as Elon Musk turns on the president. On May 28, Secretary of Defense Pete Hegseth announced he’d be gutting a key office at the Department of Defense responsible for testing and evaluating the safety of weapons and AI systems.

As part of a string of moves aimed at “reducing bloated bureaucracy and wasteful spending in favor of increased lethality,” Hegseth cut the size of the Office of the Director of Operational Test and Evaluation in half. The group was established in the 1980s—following orders from Congress—after criticisms that the Pentagon was fielding weapons and systems that didn’t perform as safely or effectively as advertised. Hegseth is reducing the agency’s staff to about 45, down from 94, and firing and replacing its director. He gave the office just seven days to implement the changes.

It is a significant overhaul of a department that in 40 years has never before been placed so squarely on the chopping block. Here’s how today’s defense tech companies, which have fostered close connections to the Trump administration, stand to gain, and why safety testing might suffer as a result. 

The Operational Test and Evaluation office is “the last gate before a technology gets to the field,” says Missy Cummings, a former fighter pilot for the US Navy who is now a professor of engineering and computer science at George Mason University. Though the military can do small experiments with new systems without running it by the office, it has to test anything that gets fielded at scale.

“In a bipartisan way—up until now—everybody has seen it’s working to help reduce waste, fraud, and abuse,” she says. That’s because it provides an independent check on companies’ and contractors’ claims about how well their technology works. It also aims to expose the systems to more rigorous safety testing.

The gutting comes at a particularly pivotal time for AI and military adoption: The Pentagon is experimenting with putting AI into everything, mainstream companies like OpenAI are now more comfortable working with the military, and defense giants like Anduril are winning big contracts to launch AI systems (last Thursday, Anduril announced a whopping $2.5 billion funding round, doubling its valuation to over $30 billion). 

Hegseth claims his cuts will “make testing and fielding weapons more efficient,” saving $300 million. But Cummings is concerned that they are paving a way to faster adoption while increasing the chances that new systems won’t be as safe or effective as promised. “The firings in DOTE send a clear message that all perceived obstacles for companies favored by Trump are going to be removed,” she says.

Anduril and Anthropic, which have launched AI applications for military use, did not respond to my questions about whether they pushed for or approve of the cuts. A representative for OpenAI said that the company was not involved in lobbying for the restructuring. 

“The cuts make me nervous,” says Mark Cancian, a senior advisor at the Center for Strategic and International Studies who previously worked at the Pentagon in collaboration with the testing office. “It’s not that we’ll go from effective to ineffective, but you might not catch some of the problems that would surface in combat without this testing step.”

It’s hard to say precisely how the cuts will affect the office’s ability to test systems, and Cancian admits that those responsible for getting new technologies out onto the battlefield sometimes complain that it can really slow down adoption. But still, he says, the office frequently uncovers errors that weren’t previously caught.

It’s an especially important step, Cancian says, whenever the military is adopting a new type of technology like generative AI. Systems that might perform well in a lab setting almost always encounter new challenges in more realistic scenarios, and the Operational Test and Evaluation group is where that rubber meets the road.

So what to make of all this? It’s true that the military was experimenting with artificial intelligence long before the current AI boom, particularly with computer vision for drone feeds, and defense tech companies have been winning big contracts for this push across multiple presidential administrations. But this era is different. The Pentagon is announcing ambitious pilots specifically for large language models, a relatively nascent technology that by its very nature produces hallucinations and errors, and it appears eager to put much-hyped AI into everything. The key independent group dedicated to evaluating the accuracy of these new and complex systems now only has half the staff to do it. I’m not sure that’s a win for anyone.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Inside the effort to tally AI’s energy appetite

After working on it for months, my colleague Casey Crownhart and I finally saw our story on AI’s energy and emissions burden go live last week. 

The initial goal sounded simple: Calculate how much energy is used each time we interact with a chatbot, and then tally that up to understand why everyone from leaders of AI companies to officials at the White House wants to harness unprecedented levels of electricity to power AI and reshape our energy grids in the process. 

It was, of course, not so simple. After speaking with dozens of researchers, we realized that the common understanding of AI’s energy appetite is full of holes. I encourage you to read the full story, which has some incredible graphics to help you understand everything from the energy used in a single query right up to what AI will require just three years from now (enough electricity to power 22% of US households, it turns out). But here are three takeaways I have after the project. 

AI is in its infancy

We focused on measuring the energy requirements that go into using a chatbot, generating an image, and creating a video with AI. But these three uses are relatively small-scale compared with where AI is headed next. 

Lots of AI companies are building reasoning models, which “think” for longer and use more energy. They’re building hardware devices, perhaps like the one Jony Ive has been working on (which OpenAI just acquired for $6.5 billion), that have AI constantly humming along in the background of our conversations. They’re designing agents and digital clones of us to act on our behalf. All these trends point to a more energy-intensive future (which, again, helps explain why OpenAI and others are spending such inconceivable amounts of money on energy). 

But the fact that AI is in its infancy raises another point. The models, chips, and cooling methods behind this AI revolution could all grow more efficient over time, as my colleague Will Douglas Heaven explains. This future isn’t predetermined.

AI video is on another level

When we tested the energy demands of various models, we found the energy required to produce even a low-quality, five-second video to be pretty shocking: It was 42,000 times more than the amount needed for a chatbot answer a question about a recipe, and enough to power a microwave for over an hour. If there’s one type of AI whose energy appetite should worry you, it’s this one. 

Soon after we published, Google debuted the latest iteration of its Veo model. People quickly created compilations of the most impressive clips (this one being the most shocking to me). Something we point out in the story is that Google (as well as OpenAI, which has its own video generator, Sora) denied our request for specific numbers on the energy their AI models use. Nonetheless, our reporting suggests it’s very likely that high-definition video models like Veo and Sora are much larger, and much more energy-demanding, than the models we tested. 

I think the key to whether the use of AI video will produce indefensible clouds of emissions in the near future will be how it’s used, and how it’s priced. The example I linked shows a bunch of TikTok-style content, and I predict that if creating AI video is cheap enough, social video sites will be inundated with this type of content. 

There are more important questions than your own individual footprint

We expected that a lot of readers would understandably think about this story in terms of their own individual footprint, wondering whether their AI usage is contributing to the climate crisis. Don’t panic: It’s likely that asking a chatbot for help with a travel plan does not meaningfully increase your carbon footprint. Video generation might. But after reporting on this for months, I think there are more important questions.

Consider, for example, the water being drained from aquifers in Nevada, the country’s driest state, to power data centers that are drawn to the area by tax incentives and easy permitting processes, as detailed in an incredible story by James Temple. Or look at how Meta’s largest data center project, in Louisiana, is relying on natural gas despite industry promises to use clean energy, per a story by David Rotman. Or the fact that nuclear energy is not the silver bullet that AI companies often make it out to be. 

There are global forces shaping how much energy AI companies are able to access and what types of sources will provide it. There is also very little transparency from leading AI companies on their current and future energy demands, even while they’re asking for public support for these plans. Pondering your individual footprint can be a good thing to do, provided you remember that it’s not so much your footprint as these other factors that are keeping climate researchers and energy experts we spoke to up at night.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

How AI is introducing errors into courtrooms

It’s been quite a couple weeks for stories about AI in the courtroom. You might have heard about the deceased victim of a road rage incident whose family created an AI avatar of him to show as an impact statement (possibly the first time this has been done in the US). But there’s a bigger, far more consequential controversy brewing, legal experts say. AI hallucinations are cropping up more and more in legal filings. And it’s starting to infuriate judges. Just consider these three cases, each of which gives a glimpse into what we can expect to see more of as lawyers embrace AI.

A few weeks ago, a California judge, Michael Wilner, became intrigued by a set of arguments some lawyers made in a filing. He went to learn more about those arguments by following the articles they cited. But the articles didn’t exist. He asked the lawyers’ firm for more details, and they responded with a new brief that contained even more mistakes than the first. Wilner ordered the attorneys to give sworn testimonies explaining the mistakes, in which he learned that one of them, from the elite firm Ellis George, used Google Gemini as well as law-specific AI models to help write the document, which generated false information. As detailed in a filing on May 6, the judge fined the firm $31,000. 

Last week, another California-based judge caught another hallucination in a court filing, this time submitted by the AI company Anthropic in the lawsuit that record labels have brought against it over copyright issues. One of Anthropic’s lawyers had asked the company’s AI model Claude to create a citation for a legal article, but Claude included the wrong title and author. Anthropic’s attorney admitted that the mistake was not caught by anyone reviewing the document. 

Lastly, and perhaps most concerning, is a case unfolding in Israel. After police arrested an individual on charges of money laundering, Israeli prosecutors submitted a request asking a judge for permission to keep the individual’s phone as evidence. But they cited laws that don’t exist, prompting the defendant’s attorney to accuse them of including AI hallucinations in their request. The prosecutors, according to Israeli news outlets, admitted that this was the case, receiving a scolding from the judge. 

Taken together, these cases point to a serious problem. Courts rely on documents that are accurate and backed up with citations—two traits that AI models, despite being adopted by lawyers eager to save time, often fail miserably to deliver. 

Those mistakes are getting caught (for now), but it’s not a stretch to imagine that at some point soon, a judge’s decision will be influenced by something that’s totally made up by AI, and no one will catch it. 

I spoke with Maura Grossman, who teaches at the School of Computer Science at the University of Waterloo as well as Osgoode Hall Law School, and has been a vocal early critic of the problems that generative AI poses for courts. She wrote about the problem back in 2023, when the first cases of hallucinations started appearing. She said she thought courts’ existing rules requiring lawyers to vet what they submit to the courts, combined with the bad publicity those cases attracted, would put a stop to the problem. That hasn’t panned out.

Hallucinations “don’t seem to have slowed down,” she says. “If anything, they’ve sped up.” And these aren’t one-off cases with obscure local firms, she says. These are big-time lawyers making significant, embarrassing mistakes with AI. She worries that such mistakes are also cropping up more in documents not written by lawyers themselves, like expert reports (in December, a Stanford professor and expert on AI admitted to including AI-generated mistakes in his testimony).  

I told Grossman that I find all this a little surprising. Attorneys, more than most, are obsessed with diction. They choose their words with precision. Why are so many getting caught making these mistakes?

“Lawyers fall in two camps,” she says. “The first are scared to death and don’t want to use it at all.” But then there are the early adopters. These are lawyers tight on time or without a cadre of other lawyers to help with a brief. They’re eager for technology that can help them write documents under tight deadlines. And their checks on the AI’s work aren’t always thorough. 

The fact that high-powered lawyers, whose very profession it is to scrutinize language, keep getting caught making mistakes introduced by AI says something about how most of us treat the technology right now. We’re told repeatedly that AI makes mistakes, but language models also feel a bit like magic. We put in a complicated question and receive what sounds like a thoughtful, intelligent reply. Over time, AI models develop a veneer of authority. We trust them.

“We assume that because these large language models are so fluent, it also means that they’re accurate,” Grossman says. “We all sort of slip into that trusting mode because it sounds authoritative.” Attorneys are used to checking the work of junior attorneys and interns but for some reason, Grossman says, don’t apply this skepticism to AI.

We’ve known about this problem ever since ChatGPT launched nearly three years ago, but the recommended solution has not evolved much since then: Don’t trust everything you read, and vet what an AI model tells you. As AI models get thrust into so many different tools we use, I increasingly find this to be an unsatisfying counter to one of AI’s most foundational flaws.

Hallucinations are inherent to the way that large language models work. Despite that, companies are selling generative AI tools made for lawyers that claim to be reliably accurate. “Feel confident your research is accurate and complete,” reads the website for Westlaw Precision, and the website for CoCounsel promises its AI is “backed by authoritative content.” That didn’t stop their client, Ellis George, from being fined $31,000.

Increasingly, I have sympathy for people who trust AI more than they should. We are, after all, living in a time when the people building this technology are telling us that AI is so powerful it should be treated like nuclear weapons. Models have learned from nearly every word humanity has ever written down and are infiltrating our online life. If people shouldn’t trust everything AI models say, they probably deserve to be reminded of that a little more often by the companies building them. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Police tech can sidestep facial recognition bans now

Six months ago I attended the largest gathering of chiefs of police in the US to see how they’re using AI. I found some big developments, like officers getting AI to write their police reports. Today, I published a new story that shows just how far AI for police has developed since then. 

It’s about a new method police departments and federal agencies have found to track people: an AI tool that uses attributes like body size, gender, hair color and style, clothing, and accessories instead of faces. It offers a way around laws curbing the use of facial recognition, which are on the rise. 

Advocates from the ACLU, after learning of the tool through MIT Technology Review, said it was the first instance they’d seen of such a tracking system used at scale in the US, and they say it has a high potential for abuse by federal agencies. They say the prospect that AI will enable more powerful surveillance is especially alarming at a time when the Trump administration is pushing for more monitoring of protesters, immigrants, and students. 

I hope you read the full story for the details, and to watch a demo video of how the system works. But first, let’s talk for a moment about what this tells us about the development of police tech and what rules, if any, these departments are subject to in the age of AI.

As I pointed out in my story six months ago, police departments in the US have extraordinary independence. There are more than 18,000 departments in the country, and they generally have lots of discretion over what technology they spend their budgets on. In recent years, that technology has increasingly become AI-centric. 

Companies like Flock and Axon sell suites of sensors—cameras, license plate readers, gunshot detectors, drones—and then offer AI tools to make sense of that ocean of data (at last year’s conference I saw schmoozing between countless AI-for-police startups and the chiefs they sell to on the expo floor). Departments say these technologies save time, ease officer shortages, and help cut down on response times. 

Those sound like fine goals, but this pace of adoption raises an obvious question: Who makes the rules here? When does the use of AI cross over from efficiency into surveillance, and what type of transparency is owed to the public?

In some cases, AI-powered police tech is already driving a wedge between departments and the communities they serve. When the police in Chula Vista, California, were the first in the country to get special waivers from the Federal Aviation Administration to fly their drones farther than normal, they said the drones would be deployed to solve crimes and get people help sooner in emergencies. They’ve had some successes

But the department has also been sued by a local media outlet alleging it has reneged on its promise to make drone footage public, and residents have said the drones buzzing overhead feel like an invasion of privacy. An investigation found that these drones were deployed more often in poor neighborhoods, and for minor issues like loud music. 

Jay Stanley, a senior policy analyst at the ACLU, says there’s no overarching federal law that governs how local police departments adopt technologies like the tracking software I wrote about. Departments usually have the leeway to try it first, and see how their communities react after the fact. (Veritone, which makes the tool I wrote about, said they couldn’t name or connect me with departments using it so the details of how it’s being deployed by police are not yet clear). 

Sometimes communities take a firm stand; local laws against police use of facial recognition have been passed around the country. But departments—or the police tech companies they buy from—can find workarounds. Stanley says the new tracking software I wrote about poses lots of the same issues as facial recognition while escaping scrutiny because it doesn’t technically use biometric data.

“The community should be very skeptical of this kind of tech and, at a minimum, ask a lot of questions,” he says. He laid out a road map of what police departments should do before they adopt AI technologies: have hearings with the public, get community permission, and make promises about how the systems will and will not be used. He added that the companies making this tech should also allow it to be tested by independent parties. 

“This is all coming down the pike,” he says—and so quickly that policymakers and the public have little time to keep up. He adds, “Are these powers we want the police—the authorities that serve us—to have, and if so, under what conditions?”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Why the humanoid workforce is running late

On Thursday I watched Daniela Rus, one of the world’s top experts on AI-powered robots, address a packed room at a Boston robotics expo. Rus spent a portion of her talk busting the notion that giant fleets of humanoids are already making themselves useful in manufacturing and warehouses around the world. 

That might come as a surprise. For years AI has made it faster to train robots, and investors have responded feverishly. Figure AI, a startup that aims to build general-purpose humanoid robots for both homes and industry, is looking at a $1.5 billion funding round (more on Figure shortly), and there are commercial experiments with humanoids at Amazon and auto manufacturers. Bank of America predicts wider adoption of these robots around the corner, with a billion humanoids at work by 2050.

But Rus and many others I spoke with at the expo suggest that this hype just doesn’t add up.

Humanoids “are mostly not intelligent,” she said. Rus showed a video of herself speaking to an advanced humanoid that smoothly followed her instruction to pick up a watering can and water a nearby plant. It was impressive. But when she asked it to “water” her friend, the robot did not consider that humans don’t need watering like plants and moved to douse the person. “These robots lack common sense,” she said. 

I also spoke with Pras Velagapudi, the chief technology officer of Agility Robotics, who detailed physical limitations the company has to overcome too. To be strong, a humanoid needs a lot of power and a big battery. The stronger you make it and the heavier it is, the less time it can run without charging, and the more you need to worry about safety. A robot like this is also complex to manufacture.

Some impressive humanoid demos don’t overcome these core constraints as much as they display other impressive features: nimble robotic hands, for instance, or the ability to converse with people via a large language model. But these capabilities don’t necessarily translate well to the jobs that humanoids are supposed to be taking over (it’s more useful to program a long list of detailed instructions for a robot to follow than to speak to it, for example). 

This is not to say fleets of humanoids won’t ever join our workplaces, but rather that the adoption of the technology will likely be drawn out, industry specific, and slow. It’s related to what I wrote about last week: To people who consider AI a “normal” technology, rather than a utopian or dystopian one, this all makes sense. The technology that succeeds in an isolated lab setting will appear very different from the one that gets commercially adopted at scale. 

All of this sets the scene for what happened with one of the biggest names in robotics last week. Figure AI has raised a tremendous amount of investment for its humanoids, and founder Brett Adcock claimed on X in March that the company was the “most sought-after private stock in the secondary market.” Its most publicized work is with BMW, and Adcock has shown videos of Figure’s robots working to move parts for the automaker, saying that the partnership took just 12 months to launch. Adcock and Figure have generally not responded to media requests and don’t make the rounds at typical robot trade shows. 

In April, Fortune published an article quoting a spokesperson from BMW, alleging that the pair’s partnership involves fewer robots at a smaller scale than Figure has implied. On April 25, Adcock posted on LinkedIn that “Figure’s litigation counsel will aggressively pursue all available legal remedies—including, but not limited to, defamation claims—to correct the publication’s blatant misstatements.” The author of the Fortune article did not respond to my request for comment, and a representative for Adcock and Figure declined to say what parts of the article were inaccurate. The representative pointed me to Adcock’s statement, which lacks details. 

The specifics of Figure aside, I think this conflict is quite indicative of the tech moment we’re in. A frenzied venture capital market—buoyed by messages like the statement from Nvidia CEO Jensen Huang that “physical AI” is the future—is betting that humanoids will create the largest market for robotics the field has ever seen, and that someday they will essentially be capable of most physical work. 

But achieving that means passing countless hurdles. We’ll need safety regulations for humans working alongside humanoids that don’t even exist yet. Deploying such robots successfully in one industry, like automotive, may not lead to success in others. We’ll have to hope that AI will solve lots of problems along the way. These are all tll things that roboticists have reason to be skeptical about. 

Roboticists, from what I’ve seen, are normally a patient bunch. The first Roomba launched more than a decade after its conception, and it took more than 50 years to go from the first robotic arm ever to the millionth in production. Venture capitalists, on the other hand, are not known for such patience. 

Perhaps that’s why Bank of America’s new prediction of widespread humanoid adoption was met with enthusiasm by investors but enormous skepticism by roboticists. Aaron Prather, a director at the robotics standards organization ASTM, said on Thursday that the projections were “wildly off-base.” 

As we’ve covered before, humanoid hype is a cycle: One slick video raises the expectations of investors, which then incentivizes competitors to make even slicker videos. This makes it quite hard for anyone—a tech journalist, say—to peel back the curtain and find out how much impact humanoids are poised to have on the workforce. But I’ll do my darndest.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Here’s why we need to start thinking of AI as “normal”

Right now, despite its ubiquity, AI is seen as anything but a normal technology. There is talk of AI systems that will soon merit the term “superintelligence,” and the former CEO of Google recently suggested we control AI models the way we control uranium and other nuclear weapons materials. Anthropic is dedicating time and money to study AI “welfare,” including what rights AI models may be entitled to. Meanwhile, such models are moving into disciplines that feel distinctly human, from making music to providing therapy.

No wonder that anyone pondering AI’s future tends to fall into either a utopian or a dystopian camp. While OpenAI’s Sam Altman muses that AI’s impact will feel more like the Renaissance than the Industrial Revolution, over half of Americans are more concerned than excited about AI’s future. (That half includes a few friends of mine, who at a party recently speculated whether AI-resistant communities might emerge—modern-day Mennonites, carving out spaces where AI is limited by choice, not necessity.) 

So against this backdrop, a recent essay by two AI researchers at Princeton felt quite provocative. Arvind Narayanan, who directs the university’s Center for Information Technology Policy, and doctoral candidate Sayash Kapoor wrote a 40-page plea for everyone to calm down and think of AI as a normal technology. This runs opposite to the “common tendency to treat it akin to a separate species, a highly autonomous, potentially superintelligent entity.”

Instead, according to the researchers, AI is a general-purpose technology whose application might be better compared to the drawn-out adoption of electricity or the internet than to nuclear weapons—though they concede this is in some ways a flawed analogy.

The core point, Kapoor says, is that we need to start differentiating between the rapid development of AI methods—the flashy and impressive displays of what AI can do in the lab—and what comes from the actual applications of AI, which in historical examples of other technologies lag behind by decades. 

“Much of the discussion of AI’s societal impacts ignores this process of adoption,” Kapoor told me, “and expects societal impacts to occur at the speed of technological development.” In other words, the adoption of useful artificial intelligence, in his view, will be less of a tsunami and more of a trickle.

In the essay, the pair make some other bracing arguments: terms like “superintelligence” are so incoherent and speculative that we shouldn’t use them; AI won’t automate everything but will birth a category of human labor that monitors, verifies, and supervises AI; and we should focus more on AI’s likelihood to worsen current problems in society than the possibility of it creating new ones.

“AI supercharges capitalism,” Narayanan says. It has the capacity to either help or hurt inequality, labor markets, the free press, and democratic backsliding, depending on how it’s deployed, he says. 

There’s one alarming deployment of AI that the authors leave out, though: the use of AI by militaries. That, of course, is picking up rapidly, raising alarms that life and death decisions are increasingly being aided by AI. The authors exclude that use from their essay because it’s hard to analyze without access to classified information, but they say their research on the subject is forthcoming. 

One of the biggest implications of treating AI as “normal” is that it would upend the position that both the Biden administration and now the Trump White House have taken: Building the best AI is a national security priority, and the federal government should take a range of actions—limiting what chips can be exported to China, dedicating more energy to data centers—to make that happen. In their paper, the two authors refer to US-China “AI arms race” rhetoric as “shrill.”

“The arms race framing verges on absurd,” Narayanan says. The knowledge it takes to build powerful AI models spreads quickly and is already being undertaken by researchers around the world, he says, and “it is not feasible to keep secrets at that scale.” 

So what policies do the authors propose? Rather than planning around sci-fi fears, Kapoor talks about “strengthening democratic institutions, increasing technical expertise in government, improving AI literacy, and incentivizing defenders to adopt AI.” 

By contrast to policies aimed at controlling AI superintelligence or winning the arms race, these recommendations sound totally boring. And that’s kind of the point.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Phase two of military AI has arrived

Last week, I spoke with two US Marines who spent much of last year deployed in the Pacific, conducting training exercises from South Korea to the Philippines. Both were responsible for analyzing surveillance to warn their superiors about possible threats to the unit. But this deployment was unique: For the first time, they were using generative AI to scour intelligence, through a chatbot interface similar to ChatGPT. 

As I wrote in my new story, this experiment is the latest evidence of the Pentagon’s push to use generative AI—tools that can engage in humanlike conversation—throughout its ranks, for tasks including surveillance. Consider this phase two of the US military’s AI push, where phase one began back in 2017 with older types of AI, like computer vision to analyze drone imagery. Though this newest phase began under the Biden administration, there’s fresh urgency as Elon Musk’s DOGE and Secretary of Defense Pete Hegseth push loudly for AI-fueled efficiency. 

As I also write in my story, this push raises alarms from some AI safety experts about whether large language models are fit to analyze subtle pieces of intelligence in situations with high geopolitical stakes. It also accelerates the US toward a world where AI is not just analyzing military data but suggesting actions—for example, generating lists of targets. Proponents say this promises greater accuracy and fewer civilian deaths, but many human rights groups argue the opposite. 

With that in mind, here are three open questions to keep your eye on as the US military, and others around the world, bring generative AI to more parts of the so-called “kill chain.”

What are the limits of “human in the loop”?

Talk to as many defense-tech companies as I have and you’ll hear one phrase repeated quite often: “human in the loop.” It means that the AI is responsible for particular tasks, and humans are there to check its work. It’s meant to be a safeguard against the most dismal scenarios—AI wrongfully ordering a deadly strike, for example—but also against more trivial mishaps. Implicit in this idea is an admission that AI will make mistakes, and a promise that humans will catch them.

But the complexity of AI systems, which pull from thousands of pieces of data, make that a herculean task for humans, says Heidy Khlaaf, who is chief AI scientist at the AI Now Institute, a research organization, and previously led safety audits for AI-powered systems.

“‘Human in the loop’ is not always a meaningful mitigation,” she says. When an AI model relies on thousands of data points to draw conclusions, “it wouldn’t really be possible for a human to sift through that amount of information to determine if the AI output was erroneous.” As AI systems rely on more and more data, this problem scales up. 

Is AI making it easier or harder to know what should be classified?

In the Cold War era of US military intelligence, information was captured through covert means, written up into reports by experts in Washington, and then stamped “Top Secret,” with access restricted to those with proper clearances. The age of big data, and now the advent of generative AI to analyze that data, is upending the old paradigm in lots of ways.

One specific problem is called classification by compilation. Imagine that hundreds of unclassified documents all contain separate details of a military system. Someone who managed to piece those together could reveal important information that on its own would be classified. For years, it was reasonable to assume that no human could connect the dots, but this is exactly the sort of thing that large language models excel at. 

With the mountain of data growing each day, and then AI constantly creating new analyses, “I don’t think anyone’s come up with great answers for what the appropriate classification of all these products should be,” says Chris Mouton, a senior engineer for RAND, who recently tested how well suited generative AI is for intelligence and analysis. Underclassifying is a US security concern, but lawmakers have also criticized the Pentagon for overclassifying information. 

The defense giant Palantir is positioning itself to help, by offering its AI tools to determine whether a piece of data should be classified or not. It’s also working with Microsoft on AI models that would train on classified data. 

How high up the decision chain should AI go?

Zooming out for a moment, it’s worth noting that the US military’s adoption of AI has in many ways followed consumer patterns. Back in 2017, when apps on our phones were getting good at recognizing our friends in photos, the Pentagon launched its own computer vision effort, called Project Maven, to analyze drone footage and identify targets.

Now, as large language models enter our work and personal lives through interfaces such as ChatGPT, the Pentagon is tapping some of these models to analyze surveillance. 

So what’s next? For consumers, it’s agentic AI, or models that can not just converse with you and analyze information but go out onto the internet and perform actions on your behalf. It’s also personalized AI, or models that learn from your private data to be more helpful. 

All signs point to the prospect that military AI models will follow this trajectory as well. A report published in March from Georgetown’s Center for Security and Emerging Technology found a surge in military adoption of AI to assist in decision-making. “Military commanders are interested in AI’s potential to improve decision-making, especially at the operational level of war,” the authors wrote.

In October, the Biden administration released its national security memorandum on AI, which provided some safeguards for these scenarios. This memo hasn’t been formally repealed by the Trump administration, but President Trump has indicated that the race for competitive AI in the US needs more innovation and less oversight. Regardless, it’s clear that AI is quickly moving up the chain not just to handle administrative grunt work, but to assist in the most high-stakes, time-sensitive decisions. 

I’ll be following these three questions closely. If you have information on how the Pentagon might be handling these questions, please reach out via Signal at jamesodonnell.22. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.