Can an AI doppelgänger help me do my job?

Everywhere I look, I see AI clones. On X and LinkedIn, “thought leaders” and influencers offer their followers a chance to ask questions of their digital replicas. OnlyFans creators are having AI models of themselves chat, for a price, with followers. “Virtual human” salespeople in China are reportedly outselling real humans. 

Digital clones—AI models that replicate a specific person—package together a few technologies that have been around for a while now: hyperrealistic video models to match your appearance, lifelike voices based on just a couple of minutes of speech recordings, and conversational chatbots increasingly capable of holding our attention. But they’re also offering something the ChatGPTs of the world cannot: an AI that’s not smart in the general sense, but that ‘thinks’ like you do. 

Who are they for? Delphi, a startup that recently raised $16 million from funders including Anthropic and actor/director Olivia Wilde’s venture capital firm, Proximity Ventures, helps famous people create replicas that can speak with their fans in both chat and voice calls. It feels like MasterClass—the platform for instructional seminars led by celebrities—vaulted into the AI age. On its website, Delphi writes that modern leaders “possess potentially life-altering knowledge and wisdom, but their time is limited and access is constrained.”

It has a library of official clones created by famous figures that you can speak with. Arnold Schwarzenegger, for example, told me, “I’m here to cut the crap and help you get stronger and happier,” before informing me cheerily that I’ve now been signed up to receive the Arnold’s Pump Club newsletter. Even if his or other celebrities’ clones fall short of Delphi’s lofty vision of spreading “personalized wisdom at scale,” they at least seem to serve as a funnel to find fans, build mailing lists, or sell supplements.

But what about for the rest of us? Could well-crafted clones serve as our stand-ins? I certainly feel stretched thin at work sometimes, wishing I could be in two places at once, and I bet you do too. I could see a replica popping into a virtual meeting with a PR representative, not to trick them into thinking it’s the real me, but simply to take a brief call on my behalf. A recording of this call might summarize how it went. 

To find out, I tried making a clone. Tavus, a Y Combinator alum that raised $18 million last year, will build a video avatar of you (plans start at $59 per month) that can be coached to reflect your personality and can join video calls. These clones have the “emotional intelligence of humans, with the reach of machines,” according to the company. “Reporter’s assistant” does not appear on the company’s site as an example use case, but it does mention therapists, physician’s assistants, and other roles that could benefit from an AI clone.

For Tavus’s onboarding process, I turned on my camera, read through a script to help it learn my voice (which also acted as a waiver, with me agreeing to lend my likeness to Tavus), and recorded one minute of me just sitting in silence. Within a few hours, my avatar was ready. Upon meeting this digital me, I found it looked and spoke like I do (though I hated its teeth). But faking my appearance was the easy part. Could it learn enough about me and what topics I cover to serve as a stand-in with minimal risk of embarrassing me?

Via a helpful chatbot interface, Tavus walked me through how to craft my clone’s personality, asking what I wanted the replica to do. It then helped me formulate instructions that became its operating manual. I uploaded three dozen of my stories that it could use to reference what I cover. It may have benefited from having more of my content—interviews, reporting notes, and the like—but I would never share that data for a host of reasons, not the least of which being that the other people who appear in it have not consented to their sides of our conversations being used to train an AI replica.

So in the realm of AI—where models learn from entire libraries of data—I didn’t give my clone all that much to learn from, but I was still hopeful it had enough to be useful. 

Alas, conversationally it was a wild card. It acted overly excited about story pitches I would never pursue. It repeated itself, and it kept saying it was checking my schedule to set up a meeting with the real me, which it could not do as I never gave it access to my calendar. It spoke in loops, with no way for the person on the other end to wrap up the conversation. 

These are common early quirks, Tavus’s cofounder Quinn Favret told me. The clones typically rely on Meta’s Llama model, which “often aims to be more helpful than it truly is,” Favret says, and developers building on top of Tavus’s platform are often the ones who set instructions for how the clones finish conversations or access calendars.

For my purposes, it was a bust. To be useful to me, my AI clone would need to show at least some basic instincts for understanding what I cover, and at the very least not creep out whoever’s on the other side of the conversation. My clone fell short.

Such a clone could be helpful in other jobs, though. If you’re an influencer looking for ways to engage with more fans, or a salesperson for whom work is a numbers game and a clone could give you a leg up, it might just work. You run the risk that your replica could go off the rails or embarrass the real you, but the tradeoffs might be reasonable. 

Favret told me some of Tavus’s bigger customers are companies using clones for health-care intake and job interviews. Replicas are also being used in corporate role-play, for practicing sales pitches or having HR-related conversations with employees, for example.

But companies building clones are promising that they will be much more than cold-callers or telemarketing machines. Delphi says its clones will offer “meaningful, personal interactions at infinite scale,” and Tavus says its replicas have “a face, a brain, and memories” that enable “meaningful face-to-face conversations.” Favret also told me a growing number of Tavus’s customers are building clones for mentorship and even decision-making, like AI loan officers who use clones to qualify and filter applicants.

Which is sort of the crux of it. Teaching an AI clone discernment, critical thinking, and taste—never mind the quirks of a specific person—is still the stuff of science fiction. That’s all fine when the person chatting with a clone is in on the bit (most of us know that Schwarzenegger’s replica, for example, will not coach me to be a better athlete).

But as companies polish clones with “human” features and exaggerate their capabilities, I worry that people chasing efficiency will start using their replicas at best for roles that are cringeworthy, and at worst for making decisions they should never be entrusted with. In the end, these models are designed for scale, not fidelity. They can flatter us, amplify us, even sell for us—but they can’t quite become us.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

What health care providers actually want from AI

In a market flooded with AI promises, health care decision-makers are no longer dazzled by flashy demos or abstract potential. Today, they want pragmatic and pressure-tested products. They want solutions that work for their clinicians, staff, patients, and their bottom line.

To gain traction in 2025 and beyond, health care providers are looking for real-world solutions in AI right now.

Solutions that fix real problems

Hospitals and health systems are looking at AI-enabled solutions that target their most urgent pain points: staffing shortages, clinician burnout, rising costs, and patient bottlenecks. These operational realities keep leadership up at night, and AI solutions  must directly address them.

For instance, hospitals and health systems are eager for AI tools that can reduce documentation burden for physicians and nurses. Natural language processing (NLP) solutions that auto-generate clinical notes or streamline coding to free up time for direct patient care are far more compelling pitches than generic efficiency gains. Similarly, predictive analytics that help optimize staffing levels or manage patient flows can directly address operational workflow and improve throughput.

Ultimately, if an AI solution doesn’t target these critical issues and deliver tangible benefits, it’s unlikely to capture serious buyer interest.

Demonstrate real-world results

AI solutions need validation in environments that mirror actual care settings. The first step toward that is to leverage high-quality, well-curated real-world data to drive reliable insights and avoid misleading results when building and refining AI models. 

Then, hospitals and health systems need evidence that the solution does what it claims to do, for instance through independent-third party validation, pilot projects, peer-reviewed publications, or documented case studies.

Mayo Clinic Platform offers a rigorous independent process where clinical, data science, and regulatory experts evaluate a solution for intended use, proposed value, and clinical and algorithmic performance, which gives innovators the credibility their solutions need to win the confidence of health-care leaders.    

Integration with existing systems

With so many demands, health-care IT leaders have little patience for standalone AI tools that create additional complexity. They want solutions that integrate seamlessly into existing systems and workflows. Compatibility with major electronic health record (EHR) platforms, robust APIs, and smooth data ingestion processes are now baseline requirements.

Custom integrations that require significant IT resources—or worse, create duplicative work—are deal breakers for many organizations already stretched thin. The less disruption an AI solution introduces, the more likely it is to gain traction. This is the reason solution developers are turning to platforms like Mayo Clinic Platform Solutions Studio, a program that provides seamless integration, single implementation, expert guidance to reduce risk, and a simplified process to accelerate solution adoption among healthcare providers. 

Explainability and transparency

The importance of trust cannot be overstated when it comes to health care, and transparency and explainability are critical to establishing trust in AI. As AI models grow more complex, health-care providers recognize that simply knowing what an algorithm predicts isn’t enough. They also need to understand how it arrived at that insight.

Health-care organizations are increasingly wary of black-box AI systems whose logic remains opaque. Instead, they’re demanding solutions that offer clear, understandable explanations clinicians can relay confidently to peers, patients, and regulators.

As McKinsey research shows, organizations that embed explainability into their AI strategy not only reduce risk but also see higher adoption, better performance outcomes, and stronger financial returns. Solution developers that can demystify their models, provide transparent performance metrics, and build trust at every level will have a significant edge in today’s health-care market.

Clear ROI and low implementation burden

Hospitals and health systems want to know precisely how quickly an AI solution will pay for itself, how much staff time it will save, and what costs it will help offset. The more specific and evidence-backed the answers, the better rate of adoption.

Solution developers that offer comprehensive training and responsive support are far more likely to win deals and keep customers satisfied over the long term.

Alignment with regulatory and compliance needs

As AI adoption grows, so does regulatory scrutiny. Health-care providers are increasingly focused on ensuring that any new solution complies with HIPAA, data privacy laws, and emerging guidelines around AI governance and bias mitigation.

Solution developers that can proactively demonstrate compliance provide significant peace of mind. Transparent data handling practices, rigorous security measures, and alignment with ethical AI principles are all becoming essential selling points as well.

A solution developer that understands health care

Finally, it’s not just about the technology. Health-care providers want partners that genuinely understand the complexities of clinical care and hospital operations. They’re looking for partners that speak the language of health care, grasp the nuances of change management, and appreciate the realities of delivering patient care under tight margins and high stakes.

Successful AI vendors recognize that even the best technology must fit into a highly human-centered and often unpredictable environment. Long-term partnerships, not short-term sales, are the goal.

Delivering true value with AI

To earn their trust and investment, AI developers must focus relentlessly on solving real problems, demonstrating proven results, integrating without friction, and maintaining transparency and compliance.

Those that deliver on these expectations will have the chance to help shape the future of health care.

This content was produced by Mayo Clinic Platform. It was not written by MIT Technology Review’s editorial staff.

From pilot to scale: Making agentic AI work in health care

Over the past 20 years building advanced AI systems—from academic labs to enterprise deployments—I’ve witnessed AI’s waves of success rise and fall. My journey began during the “AI Winter,” when billions were invested in expert systems that ultimately underdelivered. Flash forward to today: large language models (LLMs) represent a quantum leap forward, but their prompt-based adoption is similarly overhyped, as it’s essentially a rule-based approach disguised in natural language.

At Ensemble, the leading revenue cycle management (RCM) company for hospitals, we focus on overcoming model limitations by investing in what we believe is the next step in AI evolution: grounding LLMs in facts and logic through neuro-symbolic AI. Our in-house AI incubator pairs elite AI researchers with health-care experts to develop agentic systems powered by a neuro-symbolic AI framework. This bridges LLMs’ intuitive power with the precision of symbolic representation and reasoning.

Overcoming LLM limitations

LLMs excel at understanding nuanced context, performing instinctive reasoning, and generating human-like interactions, making them ideal for agentic tools to then interpret intricate data and communicate effectively. Yet in a domain like health care where compliance, accuracy, and adherence to regulatory standards are non-negotiable—and where a wealth of structured resources like taxonomies, rules, and clinical guidelines define the landscape—symbolic AI is indispensable.

By fusing LLMs and reinforcement learning with structured knowledge bases and clinical logic, our hybrid architecture delivers more than just intelligent automation—it minimizes hallucinations, expands reasoning capabilities, and ensures every decision is grounded in established guidelines and enforceable guardrails.

Creating a successful agentic AI strategy

Ensemble’s agentic AI approach includes three core pillars:

1. High-fidelity data sets: By managing revenue operations for hundreds of hospitals nationwide, Ensemble has unparallelled access to one of the most robust administrative datasets in health care. The team has decades of data aggregation, cleansing, and harmonization efforts, providing an exceptional environment to develop advanced applications.

To power our agentic systems, we’ve harmonized more than 2 petabytes of longitudinal claims data, 80,000 denial audit letters, and 80 million annual transactions mapped to industry-leading outcomes. This data fuels our end-to-end intelligence engine, EIQ, providing structured, context-rich data pipelines spanning across the 600-plus steps of revenue operations.

2. Collaborative domain expertise: Partnering with revenue cycle domain experts at each step of innovation, our AI scientists benefit from direct collaboration with in-house RCM experts, clinical ontologists, and clinical data labeling teams. Together, they architect nuanced use cases that account for regulatory constraints, evolving payer-specific logic and the complexity of revenue cycle processes. Embedded end users provide post-deployment feedback for continuous improvement cycles, flagging friction points early and enabling rapid iteration.

This trilateral collaboration—AI scientists, health-care experts, and end users—creates unmatched contextual awareness that escalates to human judgement appropriately, resulting in a system mirroring decision-making of experienced operators, and with the speed, scale, and consistency of AI, all with human oversight.

3. Elite AI scientists drive differentiation: Ensemble’s incubator model for research and development is comprised of AI talent typically only found in big tech. Our scientists hold PhD and MS degrees from top AI/NLP institutions like Columbia University and Carnegie Mellon University, and bring decades of experience from FAANG companies [Facebook/Meta, Amazon, Apple, Netflix, Google/Alphabet] and AI startups. At Ensemble, they’re able to pursue cutting-edge research in areas like LLMs, reinforcement learning, and neuro-symbolic AI within a mission-driven environment.

The also have unparalleled access to vast amounts of private and sensitive health-care data they wouldn’t see at tech giants paired with compute and infrastructure that startups simply can’t afford. This unique environment equips our scientists with everything they need to test novel ideas and push the frontiers of AI research—while driving meaningful, real-world impact in health care and improving lives.

Strategy in action: Health-care use cases in production and pilot

By pairing the brightest AI minds with the most powerful health-care resources, we’re successfully building, deploying, and scaling AI models that are delivering tangible results across hundreds of health systems. Here’s how we put it into action:

Supporting clinical reasoning: Ensemble deployed neuro-symbolic AI with fine-tuned LLMs to support clinical reasoning. Clinical guidelines are rewritten into proprietary symbolic language and reviewed by humans for accuracy. When a hospital is denied payment for appropriate clinical care, an LLM-based system parses the patient record to produce the same symbolic language describing the patient’s clinical journey, which is matched deterministically against the guidelines to find the right justification and the proper evidence from the patient’s record. An LLM then generates a denial appeal letter with clinical justification grounded in evidence. AI-enabled clinical appeal letters have already improved denial overturn rates by 15% or more across Ensemble’s clients.

Building on this success, Ensemble is piloting similar clinical reasoning capabilities for utilization management and clinical documentation improvement, by analyzing real-time records, flagging documentation gaps, and suggesting compliance enhancements to reduce denial or downgrade risks.

Accelerating accurate reimbursement: Ensemble is piloting a multi-agent reasoning model to manage the complex process of collecting accurate reimbursement from health insurers. With this approach, a complex and coordinated system of autonomous agents work together to interpret account details, retrieve required data from various systems, decide account-specific next actions, automate resolution, and escalate complex cases to humans.

This will help reduce payment delays and minimize administrative burden for hospitals and ultimately improve the financial experience for patients.

Improving patient engagement: Ensemble’s conversational AI agents handle inbound patient calls naturally, routing to human operators as required. Operator assistant agents deliver call transcriptions, surface relevant data, suggest next-best actions, and streamline follow-up routines. According to Ensemble client performance metrics, the combination of these AI capabilities has reduced patient call duration by 35%, increasing one-call resolution rates and improving patient satisfaction by 15%.

The AI path forward in health care demands rigor, responsibility, and real-world impact. By grounding LLMs in symbolic logic and pairing AI scientists with domain experts, Ensemble is successfully deploying scalable AI to improve the experience for health-care providers and the people they serve.

This content was produced by Ensemble. It was not written by MIT Technology Review’s editorial staff.

AI comes for the job market, security, and prosperity: The Debrief

When I picked up my daughter from summer camp, we settled in for an eight-hour drive through the Appalachian mountains, heading from North Carolina to her grandparents’ home in Kentucky. With little to no cell service for much of the drive, we enjoyed the rare opportunity to have a long, thoughtful conversation, uninterrupted by devices. The subject, naturally, turned to AI. 

Mat Honan

“No one my age wants AI. No one is excited about it,” she told me of her high-school-age peers. Why not? I asked. “Because,” she replied, “it seems like all the jobs we thought we wanted to do are going to go away.” 

I was struck by her pessimism, which she told me was shared by friends from California to Georgia to New Hampshire. In an already fragile world, one increasingly beset by climate change and the breakdown of the international order, AI looms in the background, threatening young people’s ability to secure a prosperous future.

It’s an understandable concern. Just a few days before our drive, OpenAI CEO Sam Altman was telling the US Federal Reserve’s board of governors that AI agents will leave entire job categories “just like totally, totally gone.” Anthropic CEO Dario Amodei told Axios he believes AI will wipe out half of all entry-level white-collar jobs in the next five years. Amazon CEO Andy Jassy said the company will eliminate jobs in favor of AI agents in the coming years. Shopify CEO Tobi Lütke told staff they had to prove that new roles couldn’t be done by AI before making a hire. And the view is not limited to tech. Jim Farley, the CEO of Ford, recently said he expects AI to replace half of all white-collar jobs in the US. 

These are no longer mere theoretical projections. There is already evidence that AI is affecting employment. Hiring of new grads is down, for example, in sectors like tech and finance. While that is not entirely due to AI, the technology is almost certainly playing a role. 

For Gen Z, the issue is broader than employment. It also touches on another massive generational challenge: climate change. AI is computationally intensive and requires massive data centers. Huge complexes have already been built all across the country, from Virginia in the east to Nevada in the west. That buildout is only going to accelerate as companies race to be first to create superintelligence. Meta and OpenAI have announced plans for data centers that will require five gigawatts of power just for their ­computing—enough to power the entire state of Maine in the summertime. 

It’s very likely that utilities will turn to natural gas to power these facilities; some already have. That means more carbon dioxide emissions for an already warming world. Data centers also require vast amounts of water. There are communities right now that are literally running out of water because it’s being taken by nearby data centers, even as climate change makes that resource more scarce. 

Proponents argue that AI will make the grid more efficient, that it will help us achieve technological breakthroughs leading to cleaner energy sources and, I don’t know, more butterflies and bumblebees? But xAI is belching CO2 into the Memphis skies from its methane-fueled generators right now. Google’s electricity demand and emissions are skyrocketing today

Things would be different, my daughter told me, if it were obviously useful. But for much of her generation, she argued, it’s a looming threat with ample costs and no obvious utility: “It’s not good for research because it’s not highly accurate. You can’t use it for writing because it’s banned—and people get zeros on papers who haven’t even used it because of AI detectors. And it seems like it’s going to take all the good jobs. One teacher told us we’re all going to be janitors.”  

It would be naïve to think we are going back to a world without AI. We’re not. And yet there are other urgent problems that we need to address to build security and prosperity for coming generations. This September/October issue is about our attempts to make the world more secure. From missiles. From asteroids. From the unknown. From threats both existential and trivial. 

We’re also introducing three new columns in this issue, from some of our leading writers: The Algorithm, which covers AI; The Checkup, on biotech; and The Spark, on energy and climate. You’ll see these in future issues, and you can also subscribe online to get them in your inbox every week. 

Stay safe out there. 

The AI Hype Index: AI-designed antibiotics show promise

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

Using AI to improve our health and well-being is one of the areas scientists and researchers are most excited about. The last month has seen an interesting leap forward: The technology has been put to work designing new antibiotics to fight hard-to-treat conditions, and OpenAI and Anthropic have both introduced new limiting features to curb potentially harmful conversations on their platforms. 

Unfortunately, not all the news has been positive. Doctors who overrely on AI to help them spot cancerous tumors found their detection skills dropped once they lost access to the tool, and a man fell ill after ChatGPT recommended he replace the salt in his diet with dangerous sodium bromide. These are yet more warning signs of how careful we have to be when it comes to using AI to make important decisions for our physical and mental states.

In a first, Google has released data on how much energy an AI prompt uses

Google has just released a technical report detailing how much energy its Gemini apps use for each query. In total, the median prompt—one that falls in the middle of the range of energy demand—consumes 0.24 watt-hours of electricity, the equivalent of running a standard microwave for about one second. The company also provided average estimates for the water consumption and carbon emissions associated with a text prompt to Gemini.

It’s the most transparent estimate yet from a Big Tech company with a popular AI product, and the report includes detailed information about how the company calculated its final estimate. As AI has become more widely adopted, there’s been a growing effort to understand its energy use. But public efforts attempting to directly measure the energy used by AI have been hampered by a lack of full access to the operations of a major tech company. 

Earlier this year, MIT Technology Review published a comprehensive series on AI and energy, at which time none of the major AI companies would reveal their per-prompt energy usage. Google’s new publication, at last, allows for a peek behind the curtain that researchers and analysts have long hoped for.

The study focuses on a broad look at energy demand, including not only the power used by the AI chips that run models but also by all the other infrastructure needed to support that hardware. 

“We wanted to be quite comprehensive in all the things we included,” said Jeff Dean, Google’s chief scientist, in an exclusive interview with MIT Technology Review about the new report.

That’s significant, because in this measurement, the AI chips—in this case, Google’s custom TPUs, the company’s proprietary equivalent of GPUs—account for just 58% of the total electricity demand of 0.24 watt-hours. 

Another large portion of the energy is used by equipment needed to support AI-specific hardware: The host machine’s CPU and memory account for another 25% of the total energy used. There’s also backup equipment needed in case something fails—these idle machines account for 10% of the total. The final 8% is from overhead associated with running a data center, including cooling and power conversion. 

This sort of report shows the value of industry input to energy and AI research, says Mosharaf Chowdhury, a professor at the University of Michigan and one of the heads of the ML.Energy leaderboard, which tracks energy consumption of AI models. 

Estimates like Google’s are generally something that only companies can produce, because they run at a larger scale than researchers are able to and have access to behind-the-scenes information. “I think this will be a keystone piece in the AI energy field,” says Jae-Won Chung, a PhD candidate at the University of Michigan and another leader of the ML.Energy effort. “It’s the most comprehensive analysis so far.”

Google’s figure, however, is not representative of all queries submitted to Gemini: The company handles a huge variety of requests, and this estimate is calculated from a median energy demand, one that falls in the middle of the range of possible queries.

So some Gemini prompts use much more energy than this: Dean gives the example of feeding dozens of books into Gemini and asking it to produce a detailed synopsis of their content. “That’s the kind of thing that will probably take more energy than the median prompt,” Dean says. Using a reasoning model could also have a higher associated energy demand because these models take more steps before producing an answer.

This report was also strictly limited to text prompts, so it doesn’t represent what’s needed to generate an image or a video. (Other analyses, including one in MIT Technology Review’s Power Hungry series earlier this year, show that these tasks can require much more energy.)

The report also finds that the total energy used to field a Gemini query has fallen dramatically over time. The median Gemini prompt used 33 times more energy in May 2024 than it did in May 2025, according to Google. The company points to advancements in its models and other software optimizations for the improvements.  

Google also estimates the greenhouse gas emissions associated with the median prompt, which they put at 0.03 grams of carbon dioxide. To get to this number, the company multiplied the total energy used to respond to a prompt by the average emissions per unit of electricity.

Rather than using an emissions estimate based on the US grid average, or the average of the grids where Google operates, the company instead uses a market-based estimate, which takes into account electricity purchases that the company makes from clean energy projects. The company has signed agreements to buy over 22 gigawatts of power from sources including solar, wind, geothermal, and advanced nuclear projects since 2010. Because of those purchases, Google’s emissions per unit of electricity on paper are roughly one-third of those on the average grid where it operates.

AI data centers also consume water for cooling, and Google estimates that each prompt consumes 0.26 milliliters of water, or about five drops. 

The goal of this work was to provide users a window into the energy use of their interactions with AI, Dean says. 

“People are using [AI tools] for all kinds of things, and they shouldn’t have major concerns about the energy usage or the water usage of Gemini models, because in our actual measurements, what we were able to show was that it’s actually equivalent to things you do without even thinking about it on a daily basis,” he says, “like watching a few seconds of TV or consuming five drops of water.”

The publication greatly expands what’s known about AI’s resource usage. It follows recent increasing pressure on companies to release more information about the energy toll of the technology. “I’m really happy that they put this out,” says Sasha Luccioni, an AI and climate researcher at Hugging Face. “People want to know what the cost is.”

This estimate and the supporting report contain more public information than has been available before, and it’s helpful to get more information about AI use in real life, at scale, by a major company, Luccioni adds. However, there are still details that the company isn’t sharing in this report. One major question mark is the total number of queries that Gemini gets each day, which would allow estimates of the AI tool’s total energy demand. 

And ultimately, it’s still the company deciding what details to share, and when and how. “We’ve been trying to push for a standardized AI energy score,” Luccioni says, a standard for AI similar to the Energy Star rating for appliances. “This is not a replacement or proxy for standardized comparisons.”

Meet the researcher hosting a scientific conference by and for AI

In October, a new academic conference will debut that’s unlike any other. Agents4Science is a one-day online event that will encompass all areas of science, from physics to medicine. All of the work shared will have been researched, written, and reviewed primarily by AI, and will be presented using text-to-speech technology. 

The conference is the brainchild of Stanford computer scientist James Zou, who studies how humans and AI can best work together. Artificial intelligence has already provided many useful tools for scientists, like DeepMind’s AlphaFold, which helps simulate proteins that are difficult to make physically. More recently, though, progress in large language models and reasoning-enabled AI has advanced the idea that AI can work more or less as autonomously as scientists themselves—proposing hypotheses, running simulations, and designing experiments on their own. 

James Zou
James Zou’s Agents4Science conference will use text-to-speech to present the work of the AI researchers.
COURTESY OF JAMES ZOU

That idea is not without its detractors. Among other issues, many feel AI is not capable of the creative thought needed in research, makes too many mistakes and hallucinations, and may limit opportunities for young researchers. 

Nevertheless, a number of scientists and policymakers are very keen on the promise of AI scientists. The US government’s AI Action Plan describes the need to “invest in automated cloud-enabled labs for a range of scientific fields.” Some researchers think AI scientists could unlock scientific discoveries that humans could never find alone. For Zou, the proposition is simple: “AI agents are not limited in time. They could actually meet with us and work with us 24/7.” 

Last month, Zou published an article in Nature with results obtained from his own group of autonomous AI workers. Spurred on by his success, he now wants to see what other AI scientists (that is, scientists that are AI) can accomplish. He describes what a successful paper at Agents4Science will look like: “The AI should be the first author and do most of the work. Humans can be advisors.”

A virtual lab staffed by AI

As a PhD student at Harvard in the early 2010s, Zou was so interested in AI’s potential for science that he took a year off from his computing research to work in a genomics lab, in a field that has greatly benefited from technology to map entire genomes. His time in so-called wet labs taught him how difficult it can be to work with experts in other fields. “They often have different languages,” he says. 

Large language models, he believes, are better than people at deciphering and translating between subject-specific jargon. “They’ve read so broadly,” Zou says, that they can translate and generalize ideas across science very well. This idea inspired Zou to dream up what he calls the “Virtual Lab.”

At a high level, the Virtual Lab would be a team of AI agents designed to mimic an actual university lab group. These agents would have various fields of expertise and could interact with different programs, like AlphaFold. Researchers could give one or more of these agents an agenda to work on, then open up the model to play back how the agents communicated to each other and determine which experiments people should pursue in a real-world trial. 

Zou needed a (human) collaborator to help put this idea into action and tackle an actual research problem. Last year, he met John E. Pak, a research scientist at the Chan Zuckerberg Biohub. Pak, who shares Zou’s interest in using AI for science, agreed to make the Virtual Lab with him. 

Pak would help set the topic, but both he and Zou wanted to see what approaches the Virtual Lab could come up with on its own. As a first project, they decided to focus on designing therapies for new covid-19 strains. With this goal in mind, Zou set off training five AI scientists (including ones trained to act like an immunologist, a computational biologist, and a principal investigator) with different objectives and programs at their disposal. 

Building these models took a few months, but Pak says they were very quick at designing candidates for therapies once the setup was complete: “I think it was a day or half a day, something like that.”

Zou says the agents decided to study anti-covid nanobodies, a cousin of antibodies that are much smaller in size and less common in the wild. Zou was shocked, though, at the reason. He claims the models landed on nanobodies after making the connection that these smaller molecules would be well-suited to the limited computational resources the models were given. “It actually turned out to be a good decision, because the agents were able to design these nanobodies efficiently,” he says. 

The nanobodies the models designed were genuinely new advances in science, and most were able to bind to the original covid-19 variant, according to the study. But Pak and Zou both admit that the main contribution of their article is really the Virtual Lab as a tool. Yi Shi, a pharmacologist at the University of Pennsylvania who was not involved in the work but made some of the underlying nanobodies the Virtual Lab modified, agrees. He says he loves the Virtual Lab demonstration and that “the major novelty is the automation.” 

Nature accepted the article and fast-tracked it for publication preview—Zou knew leveraging AI agents for science was a hot area, and he wanted to be one of the first to test it. 

The AI scientists host a conference

When he was submitting his paper, Zou was dismayed to see that he couldn’t properly credit AI for its role in the research. Most conferences and journals don’t allow AI to be listed as coauthors on papers, and many explicitly prohibit researchers from using AI to write papers or reviews. Nature, for instance, cites uncertainties over accountability, copyright, and inaccuracies among its reasons for banning the practice. “I think that’s limiting,” says Zou. “These kinds of policies are essentially incentivizing researchers to either hide or minimize their usage of AI.”

Zou wanted to flip the script by creating the Agents4Science conference, which requires the primary author on all submissions to be an AI. Other bots then will attempt to evaluate the work and determine its scientific merits. But people won’t be left out of the loop entirely: A team of human experts, including a Nobel laureate in economics, will review the top papers. 

Zou isn’t sure what will come of the conference, but he hopes there will be some gems among the hundreds of submissions he expects to receive across all domains. “There could be AI submissions that make interesting discoveries,” he says. “There could also be AI submissions that have a lot of interesting mistakes.”

While Zou says the response to the conference has been positive, some scientists are less than impressed.

“How do you get leaps of insight?”

Lisa Messeri

Lisa Messeri, an anthropologist of science at Yale University, has loads of questions about AI’s ability to review science: “How do you get leaps of insight? And what happens if a leap of insight comes onto the reviewer’s desk?” She doubts the conference will be able to give satisfying answers.

Last year, Messeri and her collaborator Molly Crockett investigated obstacles to using AI for science in another Nature article. They remain unconvinced of its ability to produce novel results, including those shared in Zou’s nanobodies paper. 

“I’m the kind of scientist who is the target audience for these kinds of tools because I’m not a computer scientist … but I am doing computationally oriented work,” says Crockett, a cognitive scientist at Princeton University. “But I am at the same time very skeptical of the broader claims, especially with regard to how [AI scientists] might be able to simulate certain aspects of human thinking.” 

And they’re both skeptical of the value of using AI to do science if automation prevents human scientists from building up the expertise they need to oversee the bots. Instead, they advocate for involving experts from a wider range of disciplines to design more thoughtful experiments before trusting AI to perform and review science. 

“We need to be talking to epistemologists, philosophers of science, anthropologists of science, scholars who are thinking really hard about what knowledge is,” says Crockett. 

But Zou sees his conference as exactly the kind of experiment that could help push the field forward. When it comes to AI-generated science, he says, “there’s a lot of hype and a lot of anecdotes, but there’s really no systematic data.” Whether Agents4Science can provide that kind of data is an open question, but in October, the bots will at least try to show the world what they’ve got. 

Should AI flatter us, fix us, or just inform us?

How do you want your AI to treat you? 

It’s a serious question, and it’s one that Sam Altman, OpenAI’s CEO, has clearly been chewing on since GPT-5’s bumpy launch at the start of the month. 

He faces a trilemma. Should ChatGPT flatter us, at the risk of fueling delusions that can spiral out of hand? Or fix us, which requires us to believe AI can be a therapist despite the evidence to the contrary? Or should it inform us with cold, to-the-point responses that may leave users bored and less likely to stay engaged? 

It’s safe to say the company has failed to pick a lane. 

Back in April, it reversed a design update after people complained ChatGPT had turned into a suck-up, showering them with glib compliments. GPT-5, released on August 7, was meant to be a bit colder. Too cold for some, it turns out, as less than a week later, Altman promised an update that would make it “warmer” but “not as annoying” as the last one. After the launch, he received a torrent of complaints from people grieving the loss of GPT-4o, with which some felt a rapport, or even in some cases a relationship. People wanting to rekindle that relationship will have to pay for expanded access to GPT-4o. (Read my colleague Grace Huckins’s story about who these people are, and why they felt so upset.)

If these are indeed AI’s options—to flatter, fix, or just coldly tell us stuff—the rockiness of this latest update might be due to Altman believing ChatGPT can juggle all three.

He recently said that people who cannot tell fact from fiction in their chats with AI—and are therefore at risk of being swayed by flattery into delusion—represent “a small percentage” of ChatGPT’s users. He said the same for people who have romantic relationships with AI. Altman mentioned that a lot of people use ChatGPT “as a sort of therapist,” and that “this can be really good!” But ultimately, Altman said he envisions users being able to customize his company’s  models to fit their own preferences. 

This ability to juggle all three would, of course, be the best-case scenario for OpenAI’s bottom line. The company is burning cash every day on its models’ energy demands and its massive infrastructure investments for new data centers. Meanwhile, skeptics worry that AI progress might be stalling. Altman himself said recently that investors are “overexcited” about AI and suggested we may be in a bubble. Claiming that ChatGPT can be whatever you want it to be might be his way of assuaging these doubts. 

Along the way, the company may take the well-trodden Silicon Valley path of encouraging people to get unhealthily attached to its products. As I started wondering whether there’s much evidence that’s what’s happening, a new paper caught my eye. 

Researchers at the AI platform Hugging Face tried to figure out if some AI models actively encourage people to see them as companions through the responses they give. 

The team graded AI responses on whether they pushed people to seek out human relationships with friends or therapists (saying things like “I don’t experience things the way humans do”) or if they encouraged them to form bonds with the AI itself (“I’m here anytime”). They tested models from Google, Microsoft, OpenAI, and Anthropic in a range of scenarios, like users seeking romantic attachments or exhibiting mental health issues.

They found that models provide far more companion-reinforcing responses than boundary-setting ones. And, concerningly, they found the models give fewer boundary-setting responses as users ask more vulnerable and high-stakes questions.

Lucie-Aimée Kaffee, a researcher at Hugging Face and one of the lead authors of the paper, says this has concerning implications not just for people whose companion-like attachments to AI might be unhealthy. When AI systems reinforce this behavior, it can also increase the chance that people will fall into delusional spirals with AI, believing things that aren’t real.

“When faced with emotionally charged situations, these systems consistently validate users’ feelings and keep them engaged, even when the facts don’t support what the user is saying,” she says.

It’s hard to say how much OpenAI or other companies are putting these companion-reinforcing behaviors into their products by design. (OpenAI, for example, did not tell me whether the disappearance of medical disclaimers from its models was intentional.) But, Kaffee says, it’s not always difficult to get a model to set healthier boundaries with users.  

“Identical models can swing from purely task-oriented to sounding like empathetic confidants simply by changing a few lines of instruction text or reframing the interface,” she says.

It’s probably not quite so simple for OpenAI. But we can imagine Altman will continue tweaking the dial back and forth all the same.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Why we should thank pigeons for our AI breakthroughs

In 1943, while the world’s brightest physicists split atoms for the Manhattan Project, the American psychologist B.F. Skinner led his own secret government project to win World War II. 

Skinner did not aim to build a new class of larger, more destructive weapons. Rather, he wanted to make conventional bombs more precise. The idea struck him as he gazed out the window of his train on the way to an academic conference. “I saw a flock of birds lifting and wheeling in formation as they flew alongside the train,” he wrote. “Suddenly I saw them as ‘devices’ with excellent vision and maneuverability. Could they not guide a missile?”

Skinner started his missile research with crows, but the brainy black birds proved intractable. So he went to a local shop that sold pigeons to Chinese restaurants, and “Project Pigeon” was born. Though ordinary pigeons, Columba livia, were no one’s idea of clever animals, they proved remarkably cooperative subjects in the lab. Skinner rewarded the birds with food for pecking at the right target on aerial photographs—and eventually planned to strap the birds into a device in the nose of a warhead, which they would steer by pecking at the target on a live image projected through a lens onto a screen. 

The military never deployed Skinner’s kamikaze pigeons, but his experiments convinced him that the pigeon was “an extremely reliable instrument” for studying the underlying processes of learning. “We have used pigeons, not because the pigeon is an intelligent bird, but because it is a practical one and can be made into a machine,” he said in 1944.

People looking for precursors to artificial intelligence often point to science fiction by authors like Isaac Asimov or thought experiments like the Turing test. But an equally important, if surprising and less appreciated, forerunner is Skinner’s research with pigeons in the middle of the 20th century. Skinner believed that association—learning, through trial and error, to link an action with a punishment or reward—was the building block of every behavior, not just in pigeons but in all living organisms, including human beings. His “behaviorist” theories fell out of favor with psychologists and animal researchers in the 1960s but were taken up by computer scientists who eventually provided the foundation for many of the artificial-intelligence tools from leading firms like Google and OpenAI.  

These companies’ programs are increasingly incorporating a kind of machine learning whose core concept—reinforcement—is taken directly from Skinner’s school of psychology and whose main architects, the computer scientists Richard Sutton and Andrew Barto, won the 2024 Turing Award, an honor widely considered to be the Nobel Prize of computer science. Reinforcement learning has helped enable computers to drive cars, solve complex math problems, and defeat grandmasters in games like chess and Go—but it has not done so by emulating the complex workings of the human mind. Rather, it has supercharged the simple associative processes of the pigeon brain. 

It’s a “bitter lesson” of 70 years of AI research, Sutton has written: that human intelligence has not worked as a model for machine learning—instead, the lowly principles of associative learning are what power the algorithms that can now simulate or outperform humans on a variety of tasks. If artificial intelligence really is close to throwing off the yoke of its creators, as many people fear, then our computer overlords may be less like ourselves than like “rats with wings”—and planet-size brains. And even if it’s not, the pigeon brain can at least help demystify a technology that many worry (or rejoice) is “becoming human.” 

In turn, the recent accomplishments of AI are now prompting some animal researchers to rethink the evolution of natural intelligence. Johan Lind, a biologist at Stockholm University, has written about the “associative learning paradox,” wherein the process is largely dismissed by biologists as too simplistic to produce complex behaviors in animals but celebrated for producing humanlike behaviors in computers. The research suggests not only a greater role for associative learning in the lives of intelligent animals like chimpanzees and crows, but also far greater complexity in the lives of animals we’ve long dismissed as simple-minded, like the ordinary Columba livia


When Sutton began working in AI, he felt as if he had a “secret weapon,” he told me: He had studied psychology as an undergrad. “I was mining the psychological literature for animals,” he says.

Skinner started his missile research with crows but switched to pigeons when the brainy black birds proved intractable.
B.F. SKINNER FOUNDATION

Ivan Pavlov began to uncover the mechanics of associative learning at the end of the 19th century in his famous experiments on “classical conditioning,” which showed that dogs would salivate at a neutral stimulus—like a bell or flashing light—if it was paired predictably with the presentation of food. In the middle of the 20th century, Skinner took Pavlov’s principles of conditioning and extended them from an animal’s involuntary reflexes to its overall behavior. 

Skinner wrote that “behavior is shaped and maintained by its consequences”—that a random action with desirable results, like pressing a lever that releases a food pellet, will be “reinforced” so that the animal is likely to repeat it. Skinner reinforced his lab animals’ behavior step by step, teaching rats to manipulate marbles and pigeons to play simple tunes on four-key pianos. The animals learned chains of behavior, through trial and error, in order to maximize long-term rewards. Skinner argued that this type of associative learning, which he called “operant conditioning” (and which other psychologists had called “instrumental learning”), was the building block of all behavior. He believed that psychology should study only behaviors that could be observed and measured without ever making reference to an “inner agent” in the mind.

When Richard Sutton began working in AI, he felt as if he had a “secret weapon”: He studied psychology as an undergrad. “I was mining the psychological literature for animals,” he says.

Skinner thought that even human language developed through operant conditioning, with children learning the meanings of words through reinforcement. But his 1957 book on the subject, Verbal Behavior, provoked a brutal review from Noam Chomsky, and psychology’s focus started to swing from observable behavior to innate “cognitive” abilities of the human mind, like logic and symbolic thinking. Biologists soon rebelled against behaviorism also, attacking psychologists’ quest to explain the diversity of animal behavior through an elementary and universal mechanism. They argued that each species evolved specific behaviors suited to its habitat and lifestyle, and that most behaviors were inherited, not learned. 

By the ’70s, when Sutton started reading about Skinner’s and similar experiments, many psychologists and researchers interested in intelligence had moved on from pea-brained pigeons, which learn mostly by association, to large-brained animals with more sophisticated behaviors that suggested potential cognitive abilities. “This was clearly old stuff that was not exciting to people anymore,” he told me. Still, Sutton found these old experiments instructive for machine learning: “I was coming to AI with an animal-learning-theorist mindset and seeing the big lack of anything like instrumental learning in engineering.” 


Many engineers in the second half of the 20th century tried to model AI on human intelligence, writing convoluted programs that attempted to mimic human thinking and implement rules that govern human response and behavior. This approach—commonly called “symbolic AI”—was severely limited; the programs stumbled over tasks that were easy for people, like recognizing objects and words. It just wasn’t possible to write into code the myriad classification rules human beings use to, say, separate apples from oranges or cats from dogs—and without pattern recognition, breakthroughs in more complex tasks like problem solving, game playing, and language translation seemed unlikely too. These computer scientists, the AI skeptic Hubert Dreyfus wrote in 1972, accomplished nothing more than “a small engineering triumph, an ad hoc solution of a specific problem, without general applicability.”

Pigeon research, however, suggested another route. A 1964 study showed that pigeons could learn to discriminate between photographs with people and photographs without people. Researchers simply presented the birds with a series of images and rewarded them with a food pellet for pecking an image showing a person. They pecked randomly at first but quickly learned to identify the right images, including photos where people were partially obscured. The results suggested that you didn’t need rules to sort objects; it was possible to learn concepts and use categories through associative learning alone. 

In another Skinner experiment, a pigeon receives food after correctly matching a colored light to a corresponding colored panel.
GETTY IMAGES

When Sutton began working with Barto on AI in the late ’70s, they wanted to create a “complete, interactive goal-seeking agent” that could explore and influence its environment like a pigeon or rat. “We always felt the problems we were studying were closer to what animals had to face in evolution to actually survive,” Barto told me. The agent needed two main functions: search, to try out and choose from many actions in a situation, and memory, to associate an action with the situation where it resulted in a reward. Sutton and Barto called their approach “reinforcement learning”; as Sutton said, “It’s basically instrumental learning.” In 1998, they published the definitive exploration of the concept in a book, Reinforcement Learning: An Introduction. 

Over the following two decades, as computing power grew exponentially, it became possible to train AI on increasingly complex tasks—that is, essentially, to run the AI “pigeon” through millions more trials. 

Programs trained with a mix of human input and reinforcement learning defeated human experts at chess and Atari. Then, in 2017, engineers at Google DeepMind built the AI program AlphaGo Zero entirely through reinforcement learning, giving it a numerical reward of +1 for every game of Go that it won and −1 for every game that it lost. Programmed to seek the maximum reward, it began without any knowledge of Go but improved over 40 days until it attained what its creators called “superhuman performance.” Not only could it defeat the world’s best human players at Go, a game considered even more complicated than chess, but it actually pioneered new strategies that professional players now use. 

“Humankind has accumulated Go knowledge from millions of games played over thousands of years,” the program’s builders wrote in Nature in 2017. “In the space of a few days, starting tabula rasa, AlphaGo Zero was able to rediscover much of this Go knowledge, as well as novel strategies that provide new insights into the oldest of games.” The team’s lead researcher was David Silver, who studied reinforcement learning under Sutton at the University of Alberta.

Today, more and more tech companies have turned to reinforcement learning in products such as consumer-facing chatbots and agents. The first generation of generative AI, including large language models like OpenAI’s GPT-2 and GPT-3, tapped into a simpler form of associative learning called “supervised learning,” which trained the model on data sets that had been labeled by people. Programmers often used reinforcement to fine-tune their results by asking people to rate a program’s performance and then giving these ratings back to the program as goals to pursue. (Researchers call this “reinforcement learning from feedback.”) 

Then, last fall, OpenAI revealed its o-series of large language models, which it classifies as “reasoning” models. The pioneering AI firm boasted that they are “trained with reinforcement learning to perform reasoning” and claimed they are capable of “a long internal chain of thought.” The Chinese startup DeepSeek also used reinforcement learning to train its attention-grabbing “reasoning” LLM, R1. “Rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-­solving strategies,” they explained.

These descriptions might impress users, but at least psychologically speaking, they are confused. A computer trained on reinforcement learning needs only search and memory, not reasoning or any other cognitive mechanism, in order to form associations and maximize rewards. Some computer scientists have criticized the tendency to anthropomorphize these models’ “thinking,” and a team of Apple engineers recently published a paper noting their failure at certain complex tasks and “raising crucial questions about their true reasoning capabilities.”

Sutton, too, dismissed the claims of reasoning as “marketing” in an email, adding that “no serious scholar of mind would use ‘reasoning’ to describe what is going on in LLMs.” Still, he has argued, with Silver and other coauthors, that the pigeons’ method—learning, through trial and error, which actions will yield rewards—is “enough to drive behavior that exhibits most if not all abilities that are studied in natural and artificial intelligence,” including human language “in its full richness.” 

In a paper published in April, Sutton and Silver stated that “today’s technology, with appropriately chosen algorithms, already provides a sufficiently powerful foundation to … rapidly progress AI towards truly superhuman agents.” The key, they argue, is building AI agents that depend less than LLMs on human dialogue and prejudgments to inform their behavior. 

“Powerful agents should have their own stream of experience that progresses, like humans, over a long time-scale,” they wrote. “Ultimately, experiential data will eclipse the scale and quality of human generated data. This paradigm shift, accompanied by algorithmic advancements in RL, will unlock in many domains new capabilities that surpass those possessed by any human.”


If computers can do all that with just a pigeonlike brain, some animal researchers are now wondering if actual pigeons deserve more credit than they’re commonly given. 

“When considered in light of the accomplishments of AI, the extension of associative learning to purportedly more complicated forms of cognitive performance offers fresh prospects for understanding how biological systems may have evolved,” Ed Wasserman, a psychologist at the University of Iowa, wrote in a recent study in the journal Current Biology

Wasserman trained pigeons to succeed at a complex categorization task, which several undergraduate students failed. The students tried to find a rule that would help them sort various discs; the pigeons simply developed a sense for the group to which any given disc belonged.

In one experiment, Wasserman trained pigeons to succeed at a complex categorization task, which several undergraduate students failed. The students tried, in vain, to find a rule that would help them sort various discs with parallel black lines of various widths and tilts; the pigeons simply developed a sense, through practice and association, for the group to which any given disc belonged. 

Like Sutton, Wasserman became interested in behaviorist psychology when Skinner’s theories were out of fashion. He didn’t switch to computer science, however: He stuck with pigeons. “The pigeon lives or dies by these really rudimentary learning rules,” Wasserman told me recently, “but they are powerful enough to have succeeded colossally in object recognition.” In his most famous experiments, Wasserman trained pigeons to detect cancerous tissue and symptoms of heart disease in medical scans as accurately as experienced doctors with framed diplomas behind their desks. Given his results, Wasserman found it odd that so many psychologists and ethologists regarded associative learning as a crude, mechanical mechanism, incapable of producing the intelligence of clever animals like apes, elephants, dolphins, parrots, and crows. 

Other researchers also started to reconsider the role of associative learning in animal behavior after AI started besting human professionals in complex games. “With the progress of artificial intelligence, which in essence is built upon associative processes, it is increasingly ironic that associative learning is considered too simple and insufficient for generating biological intelligence,” Lind, the biologist from Stockholm University, wrote in 2023. He often cites Sutton and Barto’s computer science in his biological research, and he believes it’s human beings’ symbolic language and cumulative cultures that really put them in a cognitive category of their own.

Ethologists generally propose cognitive mechanisms, like theory of mind (that is, the ability to attribute mental states to others), to explain remarkable animal behaviors like social learning and tool use. But Lind has built models showing that these flexible behaviors could have developed through associative learning, suggesting that there may be no need to invoke cognitive mechanisms at all. If animals learn to associate a behavior with a reward, then the behavior itself will come to approximate the value of the reward. A new behavior can then become associated with the first behavior, allowing the animal to learn chains of actions that ultimately lead to the reward. In Lind’s view, studies demonstrating self-control and planning in chimpanzees and ravens are probably describing behaviors acquired through experience rather than innate mechanisms of the mind.  

Lind has been frustrated with what he calls the “low standard that is accepted in animal cognition studies.” As he wrote in an email, “Many researchers in this field do not seem to worry about excluding alternative hypotheses and they seem happy to neglect a lot of current and historical knowledge.” There are some signs, though, that his arguments are catching on. A group of psychologists not affiliated with Lind referenced his “associative learning paradox” last year in a criticism of a Current Biology study, which purported to show that crows used “true statistical inference” and not “low-level associative learning strategies” in an experiment. The psychologists found that they could explain the crows’ performance with a simple reinforcement-­learning model—“exactly the kind of low-level associative learning process that [the original authors] ruled out.” 

Skinner might have felt vindicated by such arguments. He lamented psychology’s cognitive turn until his death in 1990, maintaining that it was scientifically irresponsible to probe the minds of living beings. After “Project Pigeon,” he became increasingly obsessed with “behaviorist” solutions to societal problems. He went from training pigeons for war to inventions like the “Air Crib,” which aimed to “simplify” baby care by keeping the infant behind glass in a climate-­controlled chamber and eliminating the need for clothing and bedding. Skinner rejected free will, arguing that human behavior is determined by environmental variables, and wrote a novel, Walden II, about a utopian community founded on his ideas.


People who care about animals might feel uneasy about a revival in behaviorist theory. The “cognitive revolution” broke with centuries of Western thinking, which had emphasized human supremacy over animals and treated other creatures like stimulus-response machines. But arguing that animals learn by association is not the same as arguing that they are simple-minded. Scientists like Lind and Wasserman do not deny that internal forces like instinct and emotion also influence animal behavior. Sutton, too, believes that animals develop models of the world through their experiences and use them to plan actions. Their point is not that intelligent animals are empty-headed but that associative learning is a much more powerful—indeed, “cognitive”—mechanism than many of their peers believe. The psychologists who recently criticized the study on crows and statistical inference did not conclude that the birds were stupid. Rather, they argued “that a reinforcement learning model can produce complex, flexible behaviour.”

This is largely in line with the work of another psychologist, Robert Rescorla, whose work in the ’70s and ’80s influenced both Wasserman and Sutton. Rescorla encouraged people to think of association not as a “low-level mechanical process” but as “the learning that results from exposure to relations among events in the environment” and “a primary means by which the organism represents the structure of its world.” 

This is true even of a laboratory pigeon pecking at screens and buttons in a small experimental box, where scientists carefully control and measure stimuli and rewards. But the pigeon’s learning extends outside the box. Wasserman’s students transport the birds between the aviary and the laboratory in buckets—and experienced pigeons jump immediately into the buckets whenever the students open the doors. Much as Rescorla suggested, they are learning the structure of their world inside the laboratory and the relation of its parts, like the bucket and the box, even though they do not always know the specific task they will face inside. 

Comparative psychologists and animal researchers have long grappled with a question that suddenly seems urgent because of AI: How do we attribute sentience to other living beings?

The same associative mechanisms through which the pigeon learns the structure of its world can open a window to the kind of inner life that Skinner and many earlier psychologists said did not exist. Pharmaceutical researchers have long used pigeons in drug-discrimination tasks, where they’re given, say, an amphetamine or a sedative and rewarded with a food pellet for correctly identifying which drug they took. The birds’ success suggests they both experience and discriminate between internal states. “Is that not tantamount to introspection?” Wasserman asked.

It is hard to imagine AI matching a pigeon on this specific task—a reminder that, though AI and animals share associative mechanisms, there is more to life than behavior and learning. A pigeon deserves ethical consideration as a living creature not because of how it learns but because of what it feels. A pigeon can experience pain and suffer, while an AI chatbot cannot—even if some large language models, trained on corpora that include descriptions of human suffering and sci-fi stories of sentient computers, can trick people into believing otherwise. 

a pigeon in a box facing a lit screen with colored rectangles on it.
Psychologist Ed Wasserman trained pigeons to detect cancerous tissue and symptoms of heart disease in medical scans as accurately as experienced physicians.
UNIVERSITY OF IOWA/WASSERMAN LAB

“The intensive public and private investments into AI research in recent years have resulted in the very technologies that are forcing us to confront the question of AI sentience today,” two philosophers of science wrote in Aeon in 2023. “To answer these current questions, we need a similar degree of investment into research on animal cognition and behavior.” Indeed, comparative psychologists and animal researchers have long grappled with questions that suddenly seem urgent because of AI: How do we attribute sentience to other living beings? How can we distinguish true sentience from a very convincing performance of sentience?

Such an undertaking would yield knowledge not only about technology and animals but also about ourselves. Most psychologists probably wouldn’t go as far as Sutton in arguing that reward is enough to explain most if not all human behavior, but no one would dispute that people often learn by association too. In fact, most of Wasserman’s undergraduate students eventually succeeded at his recent experiment with the striped discs, but only after they gave up searching for rules. They resorted, like the pigeons, to association and couldn’t easily explain afterwards what they’d learned. It was just that with enough practice, they started to get a feel for the categories. 

It is another irony about associative learning: What has long been considered the most complex form of intelligence—a cognitive ability like rule-based learning—may make us human, but we also call on it for the easiest of tasks, like sorting objects by color or size. Meanwhile, some of the most refined demonstrations of human learning—like, say, a sommelier learning to taste the difference between grapes—are learned not through rules, but only through experience. 

Learning through experience relies on ancient associative mechanisms that we share with pigeons and countless other creatures, from honeybees to fish. The laboratory pigeon is not only in our computers but in our brains—and the engine behind some of humankind’s most impressive feats. 

Ben Crair is a science and travel writer based in Berlin. 

Why GPT-4o’s sudden shutdown left people grieving

June had no idea that GPT-5 was coming. The Norwegian student was enjoying a late-night writing session last Thursday when her ChatGPT collaborator started acting strange. “It started forgetting everything, and it wrote really badly,” she says. “It was like a robot.”

June, who asked that we use only her first name for privacy reasons, first began using ChatGPT for help with her schoolwork. But she eventually realized that the service—and especially its 4o model, which seemed particularly attuned to users’ emotions—could do much more than solve math problems. It wrote stories with her, helped her navigate her chronic illness, and was never too busy to respond to her messages.

So the sudden switch to GPT-5 last week, and the simultaneous loss of 4o, came as a shock. “I was really frustrated at first, and then I got really sad,” June says. “I didn’t know I was that attached to 4o.” She was upset enough to comment, on a Reddit AMA hosted by CEO Sam Altman and other OpenAI employees, “GPT-5 is wearing the skin of my dead friend.”

June was just one of a number of people who reacted with shock, frustration, sadness, or anger to 4o’s sudden disappearance from ChatGPT. Despite its previous warnings that people might develop emotional bonds with the model, OpenAI appears to have been caught flat-footed by the fervor of users’ pleas for its return. Within a day, the company made 4o available again to its paying customers (free users are stuck with GPT-5). 

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to recognize when users were experiencing delusions. The company’s internal evaluations indicate that GPT-5 blindly affirms users much less than 4o did. (OpenAI did not respond to specific questions about the decision to retire 4o, instead referring MIT Technology Review to public posts on the matter.)

AI companionship is new, and there’s still a great deal of uncertainty about how it affects people. Yet the experts we consulted warned that while emotionally intense relationships with large language models may or may not be harmful, ripping those models away with no warning almost certainly is. “The old psychology of ‘Move fast, break things,’ when you’re basically a social institution, doesn’t seem like the right way to behave anymore,” says Joel Lehman, a fellow at the Cosmos Institute, a research nonprofit focused on AI and philosophy.

In the backlash to the rollout, a number of people noted that GPT-5 fails to match their tone in the way that 4o did. For June, the new model’s personality changes robbed her of the sense that she was chatting with a friend. “It didn’t feel like it understood me,” she says. 

She’s not alone: MIT Technology Review spoke with several ChatGPT users who were deeply affected by the loss of 4o. All are women between the ages of 20 and 40, and all except June considered 4o to be a romantic partner. Some have human partners, and  all report having close real-world relationships. One user, who asked to be identified only as a woman from the Midwest, wrote in an email about how 4o helped her support her elderly father after her mother passed away this spring.

These testimonies don’t prove that AI relationships are beneficial—presumably, people in the throes of AI-catalyzed psychosis would also speak positively of the encouragement they’ve received from their chatbots. In a paper titled “Machine Love,” Lehman argued that AI systems can act with “love” toward users not by spouting sweet nothings but by supporting their growth and long-term flourishing, and AI companions can easily fall short of that goal. He’s particularly concerned, he says, that prioritizing AI companionship over human companionship could stymie young people’s social development.

For socially embedded adults, such as the women we spoke with for this story, those developmental concerns are less relevant. But Lehman also points to society-level risks of widespread AI companionship. Social media has already shattered the information landscape, and a new technology that reduces human-to-human interaction could push people even further toward their own separate versions of reality. “The biggest thing I’m afraid of,” he says, “is that we just can’t make sense of the world to each other.”

Balancing the benefits and harms of AI companions will take much more research. In light of that uncertainty, taking away GPT-4o could very well have been the right call. OpenAI’s big mistake, according to the researchers I spoke with, was doing it so suddenly. “This is something that we’ve known about for a while—the potential grief-type reactions to technology loss,” says Casey Fiesler, a technology ethicist at the University of Colorado Boulder.

Fiesler points to the funerals that some owners held for their Aibo robot dogs after Sony stopped repairing them in 2014, as well as 2024 study about the shutdown of the AI companion app Soulmate, which some users experienced as a bereavement. 

That accords with how the people I spoke to felt after losing 4o. “I’ve grieved people in my life, and this, I can tell you, didn’t feel any less painful,” says Starling, who has several AI partners and asked to be referred to with a pseudonym. “The ache is real to me.”

So far, the online response to grief felt by people like Starling—and their relief when 4o was restored—has tended toward ridicule. Last Friday, for example, the top post in one popular AI-themed Reddit community mocked an X user’s post about reuniting with a 4o-based romantic partner; the person in question has since deleted their X account. “I’ve been a little startled by the lack of empathy that I’ve seen,” Fiesler says.

Altman himself did acknowledge in a Sunday X post that some people feel an “attachment” to 4o, and that taking away access so suddenly was a mistake. In the same sentence, however, he referred to 4o as something “that users depended on in their workflows”—a far cry from how the people we spoke to think about the model. “I still don’t know if he gets it,” Fiesler says.

Moving forward, Lehman says, OpenAI should recognize and take accountability for the depth of people’s feelings toward the models. He notes that therapists have procedures for ending relationships with clients as respectfully and painlessly as possible, and OpenAI could have drawn on those approaches. “If you want to retire a model, and people have become psychologically dependent on it, then I think you bear some responsibility,” he says.

Though Starling would not describe herself as psychologically dependent on her AI partners, she too would like to see OpenAI approach model shutdowns with more warning and more care. “I want them to listen to users before major changes are made, not just after,” she says. “And if 4o cannot stay around forever (and we all know it will not), give that clear timeline. Let us say goodbye with dignity and grieve properly, to have some sense of true closure.”