Here’s why we need to start thinking of AI as “normal”

Right now, despite its ubiquity, AI is seen as anything but a normal technology. There is talk of AI systems that will soon merit the term “superintelligence,” and the former CEO of Google recently suggested we control AI models the way we control uranium and other nuclear weapons materials. Anthropic is dedicating time and money to study AI “welfare,” including what rights AI models may be entitled to. Meanwhile, such models are moving into disciplines that feel distinctly human, from making music to providing therapy.

No wonder that anyone pondering AI’s future tends to fall into either a utopian or a dystopian camp. While OpenAI’s Sam Altman muses that AI’s impact will feel more like the Renaissance than the Industrial Revolution, over half of Americans are more concerned than excited about AI’s future. (That half includes a few friends of mine, who at a party recently speculated whether AI-resistant communities might emerge—modern-day Mennonites, carving out spaces where AI is limited by choice, not necessity.) 

So against this backdrop, a recent essay by two AI researchers at Princeton felt quite provocative. Arvind Narayanan, who directs the university’s Center for Information Technology Policy, and doctoral candidate Sayash Kapoor wrote a 40-page plea for everyone to calm down and think of AI as a normal technology. This runs opposite to the “common tendency to treat it akin to a separate species, a highly autonomous, potentially superintelligent entity.”

Instead, according to the researchers, AI is a general-purpose technology whose application might be better compared to the drawn-out adoption of electricity or the internet than to nuclear weapons—though they concede this is in some ways a flawed analogy.

The core point, Kapoor says, is that we need to start differentiating between the rapid development of AI methods—the flashy and impressive displays of what AI can do in the lab—and what comes from the actual applications of AI, which in historical examples of other technologies lag behind by decades. 

“Much of the discussion of AI’s societal impacts ignores this process of adoption,” Kapoor told me, “and expects societal impacts to occur at the speed of technological development.” In other words, the adoption of useful artificial intelligence, in his view, will be less of a tsunami and more of a trickle.

In the essay, the pair make some other bracing arguments: terms like “superintelligence” are so incoherent and speculative that we shouldn’t use them; AI won’t automate everything but will birth a category of human labor that monitors, verifies, and supervises AI; and we should focus more on AI’s likelihood to worsen current problems in society than the possibility of it creating new ones.

“AI supercharges capitalism,” Narayanan says. It has the capacity to either help or hurt inequality, labor markets, the free press, and democratic backsliding, depending on how it’s deployed, he says. 

There’s one alarming deployment of AI that the authors leave out, though: the use of AI by militaries. That, of course, is picking up rapidly, raising alarms that life and death decisions are increasingly being aided by AI. The authors exclude that use from their essay because it’s hard to analyze without access to classified information, but they say their research on the subject is forthcoming. 

One of the biggest implications of treating AI as “normal” is that it would upend the position that both the Biden administration and now the Trump White House have taken: Building the best AI is a national security priority, and the federal government should take a range of actions—limiting what chips can be exported to China, dedicating more energy to data centers—to make that happen. In their paper, the two authors refer to US-China “AI arms race” rhetoric as “shrill.”

“The arms race framing verges on absurd,” Narayanan says. The knowledge it takes to build powerful AI models spreads quickly and is already being undertaken by researchers around the world, he says, and “it is not feasible to keep secrets at that scale.” 

So what policies do the authors propose? Rather than planning around sci-fi fears, Kapoor talks about “strengthening democratic institutions, increasing technical expertise in government, improving AI literacy, and incentivizing defenders to adopt AI.” 

By contrast to policies aimed at controlling AI superintelligence or winning the arms race, these recommendations sound totally boring. And that’s kind of the point.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

AI agents are the AI industry’s hypiest new product—intelligent assistants capable of completing tasks without human supervision. But while they can be theoretically useful—Simular AI’s S2 agent, for example, intelligently switches between models depending on what it’s been told to do—they could also be weaponized to execute cyberattacks. Elsewhere, OpenAI is reported to be throwing its hat into the social media arena, and AI models are getting more adept at making music. Oh, and if the results of the first half-marathon pitting humans against humanoid robots are anything to go by, we won’t have to worry about the robot uprising any time soon.

Seeing AI as a collaborator, not a creator

The reason you are reading this letter from me today is that I was bored 30 years ago. 

I was bored and curious about the world and so I wound up spending a lot of time in the university computer lab, screwing around on Usenet and the early World Wide Web, looking for interesting things to read. Soon enough I wasn’t content to just read stuff on the internet—I wanted to make it. So I learned HTML and made a basic web page, and then a better web page, and then a whole website full of web things. And then I just kept going from there. That amateurish collection of web pages led to a journalism internship with the online arm of a magazine that paid little attention to what we geeks were doing on the web. And that led to my first real journalism job, and then another, and, well, eventually this journalism job. 

But none of that would have been possible if I hadn’t been bored and curious. And more to the point: curious about tech. 

The university computer lab may seem at first like an unlikely center for creativity. We tend to think of creativity as happening more in the artist’s studio or writers’ workshop. But throughout history, very often our greatest creative leaps—and I would argue that the web and its descendants represent one such leap—have been due to advances in technology. 

There are the big easy examples, like photography or the printing press, but it’s also true of all sorts of creative inventions that we often take for granted. Oil paints. Theaters. Musical scores. Electric synthesizers! Almost anywhere you look in the arts, perhaps outside of pure vocalization, technology has played a role.  

But the key to artistic achievement has never been the technology itself. It has been the way artists have applied it to express our humanity. Think of the way we talk about the arts. We often compliment it with words that refer to our humanity, like soul, heart, and life; we often criticize it with descriptors such as sterile, clinical, or lifeless. (And sure, you can love a sterile piece of art, but typically that’s because the artist has leaned into sterility to make a point about humanity!)

All of which is to say I think that AI can be, will be, and already is a tool for creative expression, but that true art will always be something steered by human creativity, not machines. 

I could be wrong. I hope not. 

This issue, which was entirely produced by human beings using computers, explores creativity and the tension between the artist and technology. You can see it on our cover illustrated by Tom Humberstone, and read about it in stories from James O’Donnell, Will Douglas Heaven, Rebecca Ackermann, Michelle Kim, Bryan Gardiner, and Allison Arieff

Yet of course, creativity is about more than just the arts. All of human advancement stems from creativity, because creativity is how we solve problems. So it was important to us to bring you accounts of that as well. You’ll find those in stories from Carrie Klein, Carly Kay, Matthew Ponsford, and Robin George Andrews. (If you’ve ever wanted to know how we might nuke an asteroid, this is the issue for you!)  

We’re also trying to get a little more creative ourselves. Over the next few issues, you’ll notice some changes coming to this magazine with the addition of some new regular items (see Caiwei Chen’s “3 Things” for one such example). Among those changes, we are planning to solicit and publish more regular reader feedback and answer questions you may have about technology. We invite you to get creative and email us: newsroom@technologyreview.com.

As always, thanks for reading.

AI is pushing the limits of the physical world

Architecture often assumes a binary between built projects and theoretical ones. What physics allows in actual buildings, after all, is vastly different from what architects can imagine and design (often referred to as “paper architecture”). That imagination has long been supported and enabled by design technology, but the latest advancements in artificial intelligence have prompted a surge in the theoretical. 

ai-generated shapes
Karl Daubmann, College of Architecture and Design at Lawrence Technological University
“Very often the new synthetic image that comes from a tool like Midjourney or Stable Diffusion feels new,” says Daubmann, “infused by each of the multiple tools but rarely completely derived from them.”

“Transductions: Artificial Intelligence in Architectural Experimentation,” a recent exhibition at the Pratt Institute in Brooklyn, brought together works from over 30 practitioners exploring the experimental, generative, and collaborative potential of artificial intelligence to open up new areas of architectural inquiry—something they’ve been working on for a decade or more, since long before AI became mainstream. Architects and exhibition co-­curators Jason Vigneri-Beane, Olivia Vien, Stephen Slaughter, and Hart Marlow explain that the works in “Transductions” emerged out of feedback loops among architectural discourses, techniques, formats, and media that range from imagery, text, and animation to mixed-­reality media and fabrication. The aim isn’t to present projects that are going to break ground anytime soon; architects already know how to build things with the tools they have. Instead, the show attempts to capture this very early stage in architecture’s exploratory engagement with AI.

Technology has long enabled architecture to push the limits of form and function. As early as 1963, Sketchpad, one of the first architectural software programs, allowed architects and designers to move and change objects on screen. Rapidly, traditional hand drawing gave way to an ever-expanding suite of programs—­Revit, SketchUp, and BIM, among many others—that helped create floor plans and sections, track buildings’ energy usage, enhance sustainable construction, and aid in following building codes, to name just a few uses. 

The architects exhibiting in “Trans­ductions” view newly evolving forms of AI “like a new tool rather than a profession-­ending development,” says Vigneri-Beane, despite what some of his peers fear about the technology. He adds, “I do appreciate that it’s a somewhat unnerving thing for people, [but] I feel a familiarity with the rhetoric.”

After all, he says, AI doesn’t just do the job. “To get something interesting and worth saving in AI, an enormous amount of time is required,” he says. “My architectural vocabulary has gotten much more precise and my visual sense has gotten an incredible workout, exercising all these muscles which have atrophied a little bit.”

Vien agrees: “I think these are extremely powerful tools for an architect and designer. Do I think it’s the entire future of architecture? No, but I think it’s a tool and a medium that can expand the long history of mediums and media that architects can use not just to represent their work but as a generator of ideas.”

Andrew Kudless, Hines College of Architecture and Design
This image, part of the Urban Resolution series, shows how the Stable Diffusion AI model “is unable to focus on constructing a realistic image and instead duplicates features that are prominent in the local latent space,” Kudless says.

Jason Vigneri-Beane, Pratt Institute
“These images are from a larger series on cyborg ecologies that have to do with co-creating with machines to imagine [other] machines,” says Vigneri-Beane. “I might refer to these as cryptomegafauna—infrastructural robots operating at an architectural scale.”

Martin Summers, University of Kentucky College of Design
“Most AI is racing to emulate reality,” says Summers. “I prefer to revel in the hallucinations and misinterpretations like glitches and the sublogic they reveal present in a mediated reality.”
Jason Lee, Pratt Institute
Lee typically uses AI “to generate iterations or high-resolution sketches,” he says. “I am also using it to experiment with how much realism one can incorporate with more abstract representation methods.”

Olivia Vien, Pratt Institute
For the series Imprinting Grounds, Vien created images digitally and fed them into Midjourney. “It riffs on the ideas of damask textile patterns in a more digital realm,” she says.

Robert Lee Brackett III, Pratt Institute
“While new software raises concerns about the absence of traditional tools like hand drawing and modeling, I view these technologies as collaborators rather than replacements,” Brackett says.
A Google Gemini model now has a “dial” to adjust how much it reasons

Google DeepMind’s latest update to a top Gemini AI model includes a dial to control how much the system “thinks” through a response. The new feature is ostensibly designed to save money for developers, but it also concedes a problem: Reasoning models, the tech world’s new obsession, are prone to overthinking, burning money and energy in the process.

Since 2019, there have been a couple of tried and true ways to make an AI model more powerful. One was to make it bigger by using more training data, and the other was to give it better feedback on what constitutes a good answer. But toward the end of last year, Google DeepMind and other AI companies turned to a third method: reasoning.

“We’ve been really pushing on ‘thinking,’” says Jack Rae, a principal research scientist at DeepMind. Such models, which are built to work through problems logically and spend more time arriving at an answer, rose to prominence earlier this year with the launch of the DeepSeek R1 model. They’re attractive to AI companies because they can make an existing model better by training it to approach a problem pragmatically. That way, the companies can avoid having to build a new model from scratch. 

When the AI model dedicates more time (and energy) to a query, it costs more to run. Leaderboards of reasoning models show that one task can cost upwards of $200 to complete. The promise is that this extra time and money help reasoning models do better at handling challenging tasks, like analyzing code or gathering information from lots of documents. 

“The more you can iterate over certain hypotheses and thoughts,” says Google DeepMind chief technical officer Koray Kavukcuoglu, the more “it’s going to find the right thing.”

This isn’t true in all cases, though. “The model overthinks,” says Tulsee Doshi, who leads the product team at Gemini, referring specifically to Gemini Flash 2.5, the model released today that includes a slider for developers to dial back how much it thinks. “For simple prompts, the model does think more than it needs to.” 

When a model spends longer than necessary on a problem, it makes the model expensive to run for developers and worsens AI’s environmental footprint.

Nathan Habib, an engineer at Hugging Face who has studied the proliferation of such reasoning models, says overthinking is abundant. In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight, Habib says. Indeed, when OpenAI announced a new model in February, it said it would be the company’s last nonreasoning model. 

The performance gain is “undeniable” for certain tasks, Habib says, but not for many others where people normally use AI. Even when reasoning is used for the right problem, things can go awry. Habib showed me an example of a leading reasoning model that was asked to work through an organic chemistry problem. It started out okay, but halfway through its reasoning process the model’s responses started resembling a meltdown: It sputtered “Wait, but …” hundreds of times. It ended up taking far longer than a nonreasoning model would spend on one task. Kate Olszewska, who works on evaluating Gemini models at DeepMind, says Google’s models can also get stuck in loops.

Google’s new “reasoning” dial is one attempt to solve that problem. For now, it’s built not for the consumer version of Gemini but for developers who are making apps. Developers can set a budget for how much computing power the model should spend on a certain problem, the idea being to turn down the dial if the task shouldn’t involve much reasoning at all. Outputs from the model are about six times more expensive to generate when reasoning is turned on.

Another reason for this flexibility is that it’s not yet clear when more reasoning will be required to get a better answer.

“It’s really hard to draw a boundary on, like, what’s the perfect task right now for thinking?” Rae says. 

Obvious tasks include coding (developers might paste hundreds of lines of code into the model and then ask for help), or generating expert-level research reports. The dial would be turned way up for these, and developers might find the expense worth it. But more testing and feedback from developers will be needed to find out when medium or low settings are good enough.

Habib says the amount of investment in reasoning models is a sign that the old paradigm for how to make models better is changing. “Scaling laws are being replaced,” he says. 

Instead, companies are betting that the best responses will come from longer thinking times rather than bigger models. It’s been clear for several years that AI companies are spending more money on inferencing—when models are actually “pinged” to generate an answer for something—than on training, and this spending will accelerate as reasoning models take off. Inferencing is also responsible for a growing share of emissions.

(While on the subject of models that “reason” or “think”: an AI model cannot perform these acts in the way we normally use such words when talking about humans. I asked Rae why the company uses anthropomorphic language like this. “It’s allowed us to have a simple name,” he says, “and people have an intuitive sense of what it should mean.” Kavukcuoglu says that Google is not trying to mimic any particular human cognitive process in its models.)

Even if reasoning models continue to dominate, Google DeepMind isn’t the only game in town. When the results from DeepSeek began circulating in December and January, it triggered a nearly $1 trillion dip in the stock market because it promised that powerful reasoning models could be had for cheap. The model is referred to as “open weight”—in other words, its internal settings, called weights, are made publicly available, allowing developers to run it on their own rather than paying to access proprietary models from Google or OpenAI. (The term “open source” is reserved for models that disclose the data they were trained on.) 

So why use proprietary models from Google when open ones like DeepSeek are performing so well? Kavukcuoglu says that coding, math, and finance are cases where “there’s high expectation from the model to be very accurate, to be very precise, and to be able to understand really complex situations,” and he expects models that deliver on that, open or not, to win out. In DeepMind’s view, this reasoning will be the foundation of future AI models that act on your behalf and solve problems for you.

“Reasoning is the key capability that builds up intelligence,” he says. “The moment the model starts thinking, the agency of the model has started.”

This story was updated to clarify the problem of “overthinking.

Adapting for AI’s reasoning era

Anyone who crammed for exams in college knows that an impressive ability to regurgitate information is not synonymous with critical thinking.

The large language models (LLMs) first publicly released in 2022 were impressive but limited—like talented students who excel at multiple-choice exams but stumble when asked to defend their logic. Today’s advanced reasoning models are more akin to seasoned graduate students who can navigate ambiguity and backtrack when necessary, carefully working through problems with a methodical approach.

As AI systems that learn by mimicking the mechanisms of the human brain continue to advance, we’re witnessing an evolution in models from rote regurgitation to genuine reasoning. This capability marks a new chapter in the evolution of AI—and what enterprises can gain from it. But in order to tap into this enormous potential, organizations will need to ensure they have the right infrastructure and computational resources to support the advancing technology.

The reasoning revolution

“Reasoning models are qualitatively different than earlier LLMs,” says Prabhat Ram, partner AI/HPC architect at Microsoft, noting that these models can explore different hypotheses, assess if answers are consistently correct, and adjust their approach accordingly. “They essentially create an internal representation of a decision tree based on the training data they’ve been exposed to, and explore which solution might be the best.”

This adaptive approach to problem-solving isn’t without trade-offs. Earlier LLMs delivered outputs in milliseconds based on statistical pattern-matching and probabilistic analysis. This was—and still is—efficient for many applications, but it doesn’t allow the AI sufficient time to thoroughly evaluate multiple solution paths.

In newer models, extended computation time during inference—seconds, minutes, or even longer—allows the AI to employ more sophisticated internal reinforcement learning. This opens the door for multi-step problem-solving and more nuanced decision-making.

To illustrate future use cases for reasoning-capable AI, Ram offers the example of a NASA rover sent to explore the surface of Mars. “Decisions need to be made at every moment around which path to take, what to explore, and there has to be a risk-reward trade-off. The AI has to be able to assess, ‘Am I about to jump off a cliff? Or, if I study this rock and I have a limited amount of time and budget, is this really the one that’s scientifically more worthwhile?’” Making these assessments successfully could result in groundbreaking scientific discoveries at previously unthinkable speed and scale.

Reasoning capabilities are also a milestone in the proliferation of agentic AI systems: autonomous applications that perform tasks on behalf of users, such as scheduling appointments or booking travel itineraries. “Whether you’re asking AI to make a reservation, provide a literature summary, fold a towel, or pick up a piece of rock, it needs to first be able to understand the environment—what we call perception—comprehend the instructions and then move into a planning and decision-making phase,” Ram explains.

Enterprise applications of reasoning-capable AI systems

The enterprise applications for reasoning-capable AI are far-reaching. In health care, reasoning AI systems could analyze patient data, medical literature, and treatment protocols to support diagnostic or treatment decisions. In scientific research, reasoning models could formulate hypotheses, design experimental protocols, and interpret complex results—potentially accelerating discoveries across fields from materials science to pharmaceuticals. In financial analysis, reasoning AI could help evaluate investment opportunities or market expansion strategies, as well as develop risk profiles or economic forecasts.

Armed with these insights, their own experience, and emotional intelligence, human doctors, researchers, and financial analysts could make more informed decisions, faster. But before setting these systems loose in the wild, safeguards and governance frameworks will need to be ironclad, particularly in high-stakes contexts like health care or autonomous vehicles.

“For a self-driving car, there are real-time decisions that need to be made vis-a-vis whether it turns the steering wheel to the left or the right, whether it hits the gas pedal or the brake—you absolutely do not want to hit a pedestrian or get into an accident,” says Ram. “Being able to reason through situations and make an ‘optimal’ decision is something that reasoning models will have to do going forward.”

The infrastructure underpinning AI reasoning

To operate optimally, reasoning models require significantly more computational resources for inference. This creates distinct scaling challenges. Specifically, because the inference durations of reasoning models can vary widely—from just a few seconds to many minutes—load balancing across these diverse tasks can be challenging.

Overcoming these hurdles requires tight collaboration between infrastructure providers and hardware manufacturers, says Ram, speaking of Microsoft’s collaboration with NVIDIA, which brings its accelerated computing platform to Microsoft products, including Azure AI.

“When we think about Azure, and when we think about deploying systems for AI training and inference, we really have to think about the entire system as a whole,” Ram explains. “What are you going to do differently in the data center? What are you going to do about multiple data centers? How are you going to connect them?” These considerations extend into reliability challenges at all scales: from memory errors at the silicon level, to transmission errors within and across servers, thermal anomalies, and even data center-level issues like power fluctuations—all of which require sophisticated monitoring and rapid response systems.

By creating a holistic system architecture designed to handle fluctuating AI demands, Microsoft and NVIDIA’s collaboration allows companies to harness the power of reasoning models without needing to manage the underlying complexity. In addition to performance benefits, these types of collaborations allow companies to keep pace with a tech landscape evolving at breakneck speed. “Velocity is a unique challenge in this space,” says Ram. “Every three months, there is a new foundation model. The hardware is also evolving very fast—in the last four years, we’ve deployed each generation of NVIDIA GPUs and now NVIDIA GB200NVL72. Leading the field really does require a very close collaboration between Microsoft and NVIDIA to share roadmaps, timelines, and designs on the hardware engineering side, qualifications and validation suites, issues that arise in production, and so on.”

Advancements in AI infrastructure designed specifically for reasoning and agentic models are critical for bringing reasoning-capable AI to a broader range of organizations. Without robust, accessible infrastructure, the benefits of reasoning models will remain relegated to companies with massive computing resources.

Looking ahead, the evolution of reasoning-capable AI systems and the infrastructure that supports them promises even greater gains. For Ram, the frontier extends beyond enterprise applications to scientific discovery and breakthroughs that propel humanity forward: “The day when these agentic systems can power scientific research and propose new hypotheses that can lead to a Nobel Prize, I think that’s the day when we can say that this evolution is complete.”

To learn more, please read Microsoft and NVIDIA accelerate AI development and performance, watch the NVIDIA GTC AI Conference sessions on demand, and explore the topic areas of Azure AI solutions and Azure AI infrastructure.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

This content was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Phase two of military AI has arrived

Last week, I spoke with two US Marines who spent much of last year deployed in the Pacific, conducting training exercises from South Korea to the Philippines. Both were responsible for analyzing surveillance to warn their superiors about possible threats to the unit. But this deployment was unique: For the first time, they were using generative AI to scour intelligence, through a chatbot interface similar to ChatGPT. 

As I wrote in my new story, this experiment is the latest evidence of the Pentagon’s push to use generative AI—tools that can engage in humanlike conversation—throughout its ranks, for tasks including surveillance. Consider this phase two of the US military’s AI push, where phase one began back in 2017 with older types of AI, like computer vision to analyze drone imagery. Though this newest phase began under the Biden administration, there’s fresh urgency as Elon Musk’s DOGE and Secretary of Defense Pete Hegseth push loudly for AI-fueled efficiency. 

As I also write in my story, this push raises alarms from some AI safety experts about whether large language models are fit to analyze subtle pieces of intelligence in situations with high geopolitical stakes. It also accelerates the US toward a world where AI is not just analyzing military data but suggesting actions—for example, generating lists of targets. Proponents say this promises greater accuracy and fewer civilian deaths, but many human rights groups argue the opposite. 

With that in mind, here are three open questions to keep your eye on as the US military, and others around the world, bring generative AI to more parts of the so-called “kill chain.”

What are the limits of “human in the loop”?

Talk to as many defense-tech companies as I have and you’ll hear one phrase repeated quite often: “human in the loop.” It means that the AI is responsible for particular tasks, and humans are there to check its work. It’s meant to be a safeguard against the most dismal scenarios—AI wrongfully ordering a deadly strike, for example—but also against more trivial mishaps. Implicit in this idea is an admission that AI will make mistakes, and a promise that humans will catch them.

But the complexity of AI systems, which pull from thousands of pieces of data, make that a herculean task for humans, says Heidy Khlaaf, who is chief AI scientist at the AI Now Institute, a research organization, and previously led safety audits for AI-powered systems.

“‘Human in the loop’ is not always a meaningful mitigation,” she says. When an AI model relies on thousands of data points to draw conclusions, “it wouldn’t really be possible for a human to sift through that amount of information to determine if the AI output was erroneous.” As AI systems rely on more and more data, this problem scales up. 

Is AI making it easier or harder to know what should be classified?

In the Cold War era of US military intelligence, information was captured through covert means, written up into reports by experts in Washington, and then stamped “Top Secret,” with access restricted to those with proper clearances. The age of big data, and now the advent of generative AI to analyze that data, is upending the old paradigm in lots of ways.

One specific problem is called classification by compilation. Imagine that hundreds of unclassified documents all contain separate details of a military system. Someone who managed to piece those together could reveal important information that on its own would be classified. For years, it was reasonable to assume that no human could connect the dots, but this is exactly the sort of thing that large language models excel at. 

With the mountain of data growing each day, and then AI constantly creating new analyses, “I don’t think anyone’s come up with great answers for what the appropriate classification of all these products should be,” says Chris Mouton, a senior engineer for RAND, who recently tested how well suited generative AI is for intelligence and analysis. Underclassifying is a US security concern, but lawmakers have also criticized the Pentagon for overclassifying information. 

The defense giant Palantir is positioning itself to help, by offering its AI tools to determine whether a piece of data should be classified or not. It’s also working with Microsoft on AI models that would train on classified data. 

How high up the decision chain should AI go?

Zooming out for a moment, it’s worth noting that the US military’s adoption of AI has in many ways followed consumer patterns. Back in 2017, when apps on our phones were getting good at recognizing our friends in photos, the Pentagon launched its own computer vision effort, called Project Maven, to analyze drone footage and identify targets.

Now, as large language models enter our work and personal lives through interfaces such as ChatGPT, the Pentagon is tapping some of these models to analyze surveillance. 

So what’s next? For consumers, it’s agentic AI, or models that can not just converse with you and analyze information but go out onto the internet and perform actions on your behalf. It’s also personalized AI, or models that learn from your private data to be more helpful. 

All signs point to the prospect that military AI models will follow this trajectory as well. A report published in March from Georgetown’s Center for Security and Emerging Technology found a surge in military adoption of AI to assist in decision-making. “Military commanders are interested in AI’s potential to improve decision-making, especially at the operational level of war,” the authors wrote.

In October, the Biden administration released its national security memorandum on AI, which provided some safeguards for these scenarios. This memo hasn’t been formally repealed by the Trump administration, but President Trump has indicated that the race for competitive AI in the US needs more innovation and less oversight. Regardless, it’s clear that AI is quickly moving up the chain not just to handle administrative grunt work, but to assist in the most high-stakes, time-sensitive decisions. 

I’ll be following these three questions closely. If you have information on how the Pentagon might be handling these questions, please reach out via Signal at jamesodonnell.22. 

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

A small US city experiments with AI to find out what residents want

Bowling Green, Kentucky, is home to 75,000 residents who recently wrapped up an experiment in using AI for democracy: Can an online polling platform, powered by machine learning, capture what residents want to see happen in their city?

When Doug Gorman, elected leader of the county that includes Bowling Green, took office in 2023, it was the fastest-growing city in the state and projected to double in size by 2050, but it lacked a plan for how that growth would unfold. Gorman had a meeting with Sam Ford, a local consultant who had worked with the surveying platform Pol.is, which uses machine learning to gather opinions from large groups of people. 

They “needed a vision” for the anticipated growth, Ford says. The two convened a group of volunteers with experience in eight areas: economic development, talent, housing, public health, quality of life, tourism, storytelling, and infrastructure. They built a plan to use Pol.is to help write a 25-year plan for the city. The platform is just one of several new technologies used in Europe and increasingly in the US to help make sure that local governance is informed by public opinion.

After a month of advertising, the Pol.is portal launched in February. Residents could go to the website and anonymously submit an idea (in less than 140 characters) for what the 25-year plan should include. They could also vote on whether they agreed or disagreed with other ideas. The tool could be translated into a participant’s preferred language, and human moderators worked to make sure the traffic was coming from the Bowling Green area. 

Over the month that it was live, 7,890 residents participated, and 2,000 people submitted their own ideas. An AI-powered tool from Google Jigsaw then analyzed the data to find what people agreed and disagreed on. 

Experts on democracy technologies who were not involved in the project say this level of participation—about 10% of the city’s residents—was impressive.

“That is a lot,” says Archon Fung, director of the Ash Center for Innovation and Democratic Governance at the Harvard Kennedy School. A local election might see a 25% turnout, he says, and that requires nothing more than filling out a ballot. 

“Here, it’s a more demanding kind of participation, right? You’re actually voting on or considering some substantive things, and 2,000 people are contributing ideas,” he says. “So I think that’s a lot of people who are engaged.”

The plans that received the most attention in the Bowling Green experiment were hyperlocal. The ideas with the broadest support were increasing the number of local health-care specialists so residents wouldn’t have to travel to nearby Nashville for medical care, enticing more restaurants and grocery stores to open on the city’s north side, and preserving historic buildings. 

More contentious ideas included approving recreational marijuana, adding sexual orientation and gender identity to the city’s nondiscrimination clause, and providing more options for private education. Out of 3,940 unique ideas, 2,370 received more than 80% agreement, including initiatives like investing in stormwater infrastructure and expanding local opportunities for children and adults with autism.  

The volunteers running the experiment were not completely hands-off. Submitted ideas were screened according to a moderation policy, and redundant ideas were not posted. Ford says that 51% of ideas were published, and 31% were deemed redundant. About 6% of ideas were not posted because they were either completely off-topic or contained a personal attack.

But some researchers who study the technologies that can make democracy more effective question whether soliciting input in this manner is a reliable way to understand what a community wants.

One problem is self-selection—for example, certain kinds of people tend to show up to in-person forums like town halls. Research shows that seniors, homeowners, and people with high levels of education are the most likely to attend, Fung says. It’s possible that similar dynamics are at play among the residents of Bowling Green who decided to participate in the project.

“Self-selection is not an adequate way to represent the opinions of a public,” says James Fishkin, a political scientist at Stanford who’s known for developing a process he calls deliberative polling, in which a representative sample of a population’s residents are brought together for a weekend, paid about $300 each for their participation, and asked to deliberate in small groups. Other methods, used in some European governments, use jury-style groups of residents to make public policy decisions. 

What’s clear to everyone who studies the effectiveness of these tools is that they promise to move a city in a more democratic direction, but we won’t know if Bowling Green’s experiment worked until residents see what the city does with the ideas that they raised.

“You can’t make policy based on a tweet,” says Beth Simone Noveck, who directs a lab that studies democracy and technology at Northeastern University. As she points out, residents were voting on 140-character ideas, and those now need to be formed into real policies. 

“What comes next,” she says, “is the conversation between the city and residents to develop a short proposal into something that can actually be implemented.” For residents to trust that their voice actually matters, the city must be clear on why it’s implementing some ideas and not others. 

For now, the organizers have made the results public, and they will make recommendations to the Warren County leadership later this year. 

How AI is interacting with our creative human processes

In 2021, 20 years after the death of her older sister, Vauhini Vara was still unable to tell the story of her loss. “I wondered,” she writes in Searches, her new collection of essays on AI technology, “if Sam Altman’s machine could do it for me.” So she tried ChatGPT. But as it expanded on Vara’s prompts in sentences ranging from the stilted to the unsettling to the sublime, the thing she’d enlisted as a tool stopped seeming so mechanical. 

“Once upon a time, she taught me to exist,” the AI model wrote of the young woman Vara had idolized. Vara, a journalist and novelist, called the resulting essay “Ghosts,” and in her opinion, the best lines didn’t come from her: “I found myself irresistibly attracted to GPT-3—to the way it offered, without judgment, to deliver words to a writer who has found herself at a loss for them … as I tried to write more honestly, the AI seemed to be doing the same.”

The rapid proliferation of AI in our lives introduces new challenges around authorship, authenticity, and ethics in work and art. But it also offers a particularly human problem in narrative: How can we make sense of these machines, not just use them? And how do the words we choose and stories we tell about technology affect the role we allow it to take on (or even take over) in our creative lives? Both Vara’s book and The Uncanny Muse, a collection of essays on the history of art and automation by the music critic David Hajdu, explore how humans have historically and personally wrestled with the ways in which machines relate to our own bodies, brains, and creativity. At the same time, The Mind Electric, a new book by a neurologist, Pria Anand, reminds us that our own inner workings may not be so easy to replicate.

Searches is a strange artifact. Part memoir, part critical analysis, and part AI-assisted creative experimentation, Vara’s essays trace her time as a tech reporter and then novelist in the San Francisco Bay Area alongside the history of the industry she watched grow up. Tech was always close enough to touch: One college friend was an early Google employee, and when Vara started reporting on Facebook (now Meta), she and Mark Zuckerberg became “friends” on his platform. In 2007, she published a scoop that the company was planning to introduce ad targeting based on users’ personal information—the first shot fired in the long, gnarly data war to come. In her essay “Stealing Great Ideas,” she talks about turning down a job reporting on Apple to go to graduate school for fiction. There, she wrote a novel about a tech founder, which was later published as The Immortal King Rao. Vara points out that in some ways at the time, her art was “inextricable from the resources [she] used to create it”—products like Google Docs, a MacBook, an iPhone. But these pre-AI resources were tools, plain and simple. What came next was different.

Interspersed with Vara’s essays are chapters of back-and-forths between the author and ChatGPT about the book itself, where the bot serves as editor at Vara’s prompting. ChatGPT obligingly summarizes and critiques her writing in a corporate-­shaded tone that’s now familiar to any knowledge worker. “If there’s a place for disagreement,” it offers about the first few chapters on tech companies, “it might be in the balance of these narratives. Some might argue that the ­benefits—such as job creation, innovation in various sectors like AI and logistics, and contributions to the global economy—can outweigh the negatives.” 

book cover
Searches: Selfhood in the Digital Age
Vauhini Vara
PANTHEON, 2025

Vara notices that ChatGPT writes “we” and “our” in these responses, pulling it into the human story, not the tech one: “Earlier you mentioned ‘our access to information’ and ‘our collective experiences and understandings.’” When she asks what the rhetorical purpose of that choice is, ChatGPT responds with a numbered list of benefits including “inclusivity and solidarity” and “neutrality and objectivity.” It adds that “using the first-person plural helps to frame the discussion in terms of shared human experiences and collective challenges.” Does the bot believe it’s human? Or at least, do the humans who made it want other humans to believe it does? “Can corporations use these [rhetorical] tools in their products too, to subtly make people identify with, and not in opposition to, them?” Vara asks. ChatGPT replies, “Absolutely.”

Vara has concerns about the words she’s used as well. In “Thank You for Your Important Work,” she worries about the impact of “Ghosts,” which went viral after it was first published. Had her writing helped corporations hide the reality of AI behind a velvet curtain? She’d meant to offer a nuanced “provocation,” exploring how uncanny generative AI can be. But instead, she’d produced something beautiful enough to resonate as an ad for its creative potential. Even Vara herself felt fooled. She particularly loved one passage the bot wrote, about Vara and her sister as kids holding hands on a long drive. But she couldn’t imagine either of them being so sentimental. What Vara had elicited from the machine, she realized, was “wish fulfillment,” not a haunting. 

The rapid proliferation of AI in our lives introduces new challenges around authorship, authenticity, and ethics in work and art. How can we make sense of these machines, not just use them? 

The machine wasn’t the only thing crouching behind that too-good-to-be-true curtain. The GPT models and others are trained through human labor, in sometimes exploitative conditions. And much of the training data was the creative work of human writers before her. “I’d conjured artificial language about grief through the extraction of real human beings’ language about grief,” she writes. The creative ghosts in the model were made of code, yes, but also, ultimately, made of people. Maybe Vara’s essay helped cover up that truth too.

In the book’s final essay, Vara offers a mirror image of those AI call-and-­response exchanges as an antidote. After sending out an anonymous survey to women of various ages, she presents the replies to each question, one after the other. “Describe something that doesn’t exist,” she prompts, and the women respond: “God.” “God.” “God.” “Perfection.” “My job. (Lost it.)” Real people contradict each other, joke, yell, mourn, and reminisce. Instead of a single authoritative voice—an editor, or a company’s limited style guide—Vara gives us the full gasping crowd of human creativity. “What’s it like to be alive?” Vara asks the group. “It depends,” one woman answers.    

David Hajdu, now music editor at The Nation and previously a music critic for The New Republic, goes back much further than the early years of Facebook to tell the history of how humans have made and used machines to express ourselves. Player pianos, microphones, synthesizers, and electrical instruments were all assistive technologies that faced skepticism before acceptance and, sometimes, elevation in music and popular culture. They even influenced the kind of art people were able to and wanted to make. Electrical amplification, for instance, allowed singers to use a wider vocal range and still reach an audience. The synthesizer introduced a new lexicon of sound to rock music. “What’s so bad about being mechanical, anyway?” Hajdu asks in The Uncanny Muse. And “what’s so great about being human?” 

book cover of the Uncanny Muse
The Uncanny Muse: Music, Art, and Machines from Automata to AI
David Hajdu
W.W. NORTON & COMPANY, 2025

But Hajdu is also interested in how intertwined the history of man and machine can be, and how often we’ve used one as a metaphor for the other. Descartes saw the body as empty machinery for consciousness, he reminds us. Hobbes wrote that “life is but a motion of limbs.” Freud described the mind as a steam engine. Andy Warhol told an interviewer that “everybody should be a machine.” And when computers entered the scene, humans used them as metaphors for themselves too. “Where the machine model had once helped us understand the human body … a new category of machines led us to imagine the brain (how we think, what we know, even how we feel or how we think about what we feel) in terms of the computer,” Hajdu writes. 

But what is lost with these one-to-one mappings? What happens when we imagine that the complexity of the brain—an organ we do not even come close to fully understanding—can be replicated in 1s and 0s? Maybe what happens is we get a world full of chatbots and agents, computer-­generated artworks and AI DJs, that companies claim are singular creative voices rather than remixes of a million human inputs. And perhaps we also get projects like the painfully named Painting Fool—an AI that paints, developed by Simon Colton, a scholar at Queen Mary University of London. He told Hajdu that he wanted to “demonstrate the potential of a computer program to be taken seriously as a creative artist in its own right.” What Colton means is not just a machine that makes art but one that expresses its own worldview: “Art that communicates what it’s like to be a machine.”  

What happens when we imagine that the complexity of the brain—an organ we do not even come close to fully understanding—can be replicated in 1s and 0s?

Hajdu seems to be curious and optimistic about this line of inquiry. “Machines of many kinds have been communicating things for ages, playing invaluable roles in our communication through art,” he says. “Growing in intelligence, machines may still have more to communicate, if we let them.” But the question that The Uncanny Muse raises at the end is: Why should we art-­making humans be so quick to hand over the paint to the paintbrush? Why do we care how the paintbrush sees the world? Are we truly finished telling our own stories ourselves?

Pria Anand might say no. In The Mind Electric, she writes: “Narrative is universally, spectacularly human; it is as unconscious as breathing, as essential as sleep, as comforting as familiarity. It has the capacity to bind us, but also to other, to lay bare, but also obscure.” The electricity in The Mind Electric belongs entirely to the human brain—no metaphor necessary. Instead, the book explores a number of neurological afflictions and the stories patients and doctors tell to better understand them. “The truth of our bodies and minds is as strange as fiction,” Anand writes—and the language she uses throughout the book is as evocative as that in any novel. 

cover of the Mind Electric
The Mind Electric: A Neurologist on the Strangeness and Wonder of Our Brains
Pria Anand
WASHINGTON SQUARE PRESS, 2025

In personal and deeply researched vignettes in the tradition of Oliver Sacks, Anand shows that any comparison between brains and machines will inevitably fall flat. She tells of patients who see clear images when they’re functionally blind, invent entire backstories when they’ve lost a memory, break along seams that few can find, and—yes—see and hear ghosts. In fact, Anand cites one study of 375 college students in which researchers found that nearly three-quarters “had heard a voice that no one else could hear.” These were not diagnosed schizophrenics or sufferers of brain tumors—just people listening to their own uncanny muses. Many heard their name, others heard God, and some could make out the voice of a loved one who’d passed on. Anand suggests that writers throughout history have harnessed organic exchanges with these internal apparitions to make art. “I see myself taking the breath of these voices in my sails,” Virginia Woolf wrote of her own experiences with ghostly sounds. “I am a porous vessel afloat on sensation.” The mind in The Mind Electric is vast, mysterious, and populated. The narratives people construct to traverse it are just as full of wonder. 

Humans are not going to stop using technology to help us create anytime soon—and there’s no reason we should. Machines make for wonderful tools, as they always have. But when we turn the tools themselves into artists and storytellers, brains and bodies, magicians and ghosts, we bypass truth for wish fulfillment. Maybe what’s worse, we rob ourselves of the opportunity to contribute our own voices to the lively and loud chorus of human experience. And we keep others from the human pleasure of hearing them too. 

Rebecca Ackermann is a writer, designer, and artist based in San Francisco.

Generative AI is learning to spy for the US military

For much of last year, about 2,500 US service members from the 15th Marine Expeditionary Unit sailed aboard three ships throughout the Pacific, conducting training exercises in the waters off South Korea, the Philippines, India, and Indonesia. At the same time, onboard the ships, an experiment was unfolding: The Marines in the unit responsible for sorting through foreign intelligence and making their superiors aware of possible local threats were for the first time using generative AI to do it, testing a leading AI tool the Pentagon has been funding.

Two officers tell us that they used the new system to help scour thousands of pieces of open-source intelligence—nonclassified articles, reports, images, videos—collected in the various countries where they operated, and that it did so far faster than was possible with the old method of analyzing them manually. Captain Kristin Enzenauer, for instance, says she used large language models to translate and summarize foreign news sources, while Captain Will Lowdon used AI to help write the daily and weekly intelligence reports he provided to his commanders. 

“We still need to validate the sources,” says Lowdon. But the unit’s commanders encouraged the use of large language models, he says, “because they provide a lot more efficiency during a dynamic situation.”

The generative AI tools they used were built by the defense-tech company Vannevar Labs, which in November was granted a production contract worth up to $99 million by the Pentagon’s startup-oriented Defense Innovation Unit with the goal of bringing its intelligence tech to more military units. The company, founded in 2019 by veterans of the CIA and US intelligence community, joins the likes of Palantir, Anduril, and Scale AI as a major beneficiary of the US military’s embrace of artificial intelligence—not only for physical technologies like drones and autonomous vehicles but also for software that is revolutionizing how the Pentagon collects, manages, and interprets data for warfare and surveillance. 

Though the US military has been developing computer vision models and similar AI tools, like those used in Project Maven, since 2017, the use of generative AI—tools that can engage in human-like conversation like those built by Vannevar Labs—represent a newer frontier.

The company applies existing large language models, including some from OpenAI and Microsoft, and some bespoke ones of its own to troves of open-source intelligence the company has been collecting since 2021. The scale at which this data is collected is hard to comprehend (and a large part of what sets Vannevar’s products apart): terabytes of data in 80 different languages are hoovered every day in 180 countries. The company says it is able to analyze social media profiles and breach firewalls in countries like China to get hard-to-access information; it also uses nonclassified data that is difficult to get online (gathered by human operatives on the ground), as well as reports from physical sensors that covertly monitor radio waves to detect illegal shipping activities. 

Vannevar then builds AI models to translate information, detect threats, and analyze political sentiment, with the results delivered through a chatbot interface that’s not unlike ChatGPT. The aim is to provide customers with critical information on topics as varied as international fentanyl supply chains and China’s efforts to secure rare earth minerals in the Philippines. 

“Our real focus as a company,” says Scott Philips, Vannevar Labs’ chief technology officer, is to “collect data, make sense of that data, and help the US make good decisions.” 

That approach is particularly appealing to the US intelligence apparatus because for years the world has been awash in more data than human analysts can possibly interpret—a problem that contributed to the 2003 founding of Palantir, a company with a market value of over $200 billion and known for its powerful and controversial tools, including a database that helps Immigration and Customs Enforcement search for and track information on undocumented immigrants

In 2019, Vannevar saw an opportunity to use large language models, which were then new on the scene, as a novel solution to the data conundrum. The technology could enable AI not just to collect data but to actually talk through an analysis with someone interactively.

Vannevar’s tools proved useful for the deployment in the Pacific, and Enzenauer and Lowdon say that while they were instructed to always double-check the AI’s work, they didn’t find inaccuracies to be a significant issue. Enzenauer regularly used the tool to track any foreign news reports in which the unit’s exercises were mentioned and to perform sentiment analysis, detecting the emotions and opinions expressed in text. Judging whether a foreign news article reflects a threatening or friendly opinion toward the unit is a task that on previous deployments she had to do manually.

“It was mostly by hand—researching, translating, coding, and analyzing the data,” she says. “It was definitely way more time-consuming than it was when using the AI.” 

Still, Enzenauer and Lowdon say there were hiccups, some of which would affect most digital tools: The ships had spotty internet connections much of the time, limiting how quickly the AI model could synthesize foreign intelligence, especially if it involved photos or video. 

With this first test completed, the unit’s commanding officer, Colonel Sean Dynan, said on a call with reporters in February that heavier use of generative AI was coming; this experiment was “the tip of the iceberg.” 

This is indeed the direction that the entire US military is barreling toward at full speed. In December, the Pentagon said it will spend $100 million in the next two years on pilots specifically for generative AI applications. In addition to Vannevar, it’s also turning to Microsoft and Palantir, which are working together on AI models that would make use of classified data. (The US is of course not alone in this approach; notably, Israel has been using AI to sort through information and even generate lists of targets in its war in Gaza, a practice that has been widely criticized.)

Perhaps unsurprisingly, plenty of people outside the Pentagon are warning about the potential risks of this plan, including Heidy Khlaaf, who is chief AI scientist at the AI Now Institute, a research organization, and has expertise in leading safety audits for AI-powered systems. She says this rush to incorporate generative AI into military decision-making ignores more foundational flaws of the technology: “We’re already aware of how LLMs are highly inaccurate, especially in the context of safety-critical applications that require precision.” 

Khlaaf adds that even if humans are “double-checking” the work of AI, there’s little reason to think they’re capable of catching every mistake. “‘Human-in-the-loop’ is not always a meaningful mitigation,” she says. When an AI model relies on thousands of data points to come to conclusions, “it wouldn’t really be possible for a human to sift through that amount of information to determine if the AI output was erroneous.”

One particular use case that concerns her is sentiment analysis, which she argues is “a highly subjective metric that even humans would struggle to appropriately assess based on media alone.” 

If AI perceives hostility toward US forces where a human analyst would not—or if the system misses hostility that is really there—the military could make an misinformed decision or escalate a situation unnecessarily.

Sentiment analysis is indeed a task that AI has not perfected. Philips, the Vannevar CTO, says the company has built models specifically to judge whether an article is pro-US or not, but MIT Technology Review was not able to evaluate them. 

Chris Mouton, a senior engineer for RAND, recently tested how well-suited generative AI is for the task. He evaluated leading models, including OpenAI’s GPT-4 and an older version of GPT fine-tuned to do such intelligence work, on how accurately they flagged foreign content as propaganda compared with human experts. “It’s hard,” he says, noting that AI struggled to identify more subtle types of propaganda. But he adds that the models could still be useful in lots of other analysis tasks. 

Another limitation of Vannevar’s approach, Khlaaf says, is that the usefulness of open-source intelligence is debatable. Mouton says that open-source data can be “pretty extraordinary,” but Khlaaf points out that unlike classified intel gathered through reconnaissance or wiretaps, it is exposed to the open internet—making it far more susceptible to misinformation campaigns, bot networks, and deliberate manipulation, as the US Army has warned.

For Mouton, the biggest open question now is whether these generative AI technologies will be simply one investigatory tool among many that analysts use—or whether they’ll produce the subjective analysis that’s relied upon and trusted in decision-making. “This is the central debate,” he says. 

What everyone agrees is that AI models are accessible—you can just ask them a question about complex pieces of intelligence, and they’ll respond in plain language. But it’s still in dispute what imperfections will be acceptable in the name of efficiency. 

Update: This story was updated to include additional context from Heidy Khlaaf.