Can AI help DOGE slash government budgets? It’s complex.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

No tech leader before has played the role in a new presidential administration that Elon Musk is playing now. Under his leadership, DOGE has entered offices in a half-dozen agencies and counting, begun building AI models for government data, accessed various payment systems, had its access to the Treasury halted by a federal judge, and sparked lawsuits questioning the legality of the group’s activities.  

The stated goal of DOGE’s actions, per a statement from a White House spokesperson to the New York Times on Thursday, is “slashing waste, fraud, and abuse.”

As I point out in my story published Friday, these three terms mean very different things in the world of federal budgets, from errors the government makes when spending money to nebulous spending that’s legal and approved but disliked by someone in power. 

Many of the new administration’s loudest and most sweeping actions—like Musk’s promise to end the entirety of USAID’s varied activities or Trump’s severe cuts to scientific funding from the National Institutes of Health—might be said to target the latter category. If DOGE feeds government data to large language models, it might easily find spending associated with DEI or other initiatives the administration considers wasteful as it pushes for $2 trillion in cuts, nearly a third of the federal budget. 

But the fact that DOGE aides are reportedly working in the offices of Medicaid and even Medicare—where budget cuts have been politically untenable for decades—suggests the task force is also driven by evidence published by the Government Accountability Office. The GAO’s reports also give a clue into what DOGE might be hoping AI can accomplish.

Here’s what the reports reveal: Six federal programs account for 85% of what the GAO calls improper payments by the government, or about $200 billion per year, and Medicare and Medicaid top the list. These make up small fractions of overall spending but nearly 14% of the federal deficit. Estimates of fraud, in which courts found that someone willfully misrepresented something for financial benefit, run between $233 billion and $521 billion annually. 

So where is fraud happening, and could AI models fix it, as DOGE staffers hope? To answer that, I spoke with Jetson Leder-Luis, an economist at Boston University who researches fraudulent federal payments in health care and how algorithms might help stop them.

“By dollar value [of enforcement], most health-care fraud is committed by pharmaceutical companies,” he says. 

Often those companies promote drugs for uses that are not approved, called “off-label promotion,” which is deemed fraud when Medicare or Medicaid pay the bill. Other types of fraud include “upcoding,” where a provider sends a bill for a more expensive service than was given, and medical-necessity fraud, where patients receive services that they’re not qualified for or didn’t need. There’s also substandard care, where companies take money but don’t provide adequate services.

The way the government currently handles fraud is referred to as “pay and chase.” Questionable payments occur, and then people try to track it down after the fact. The more effective way, as advocated by Leder-Luis and others, is to look for patterns and stop fraudulent payments before they occur. 

This is where AI comes in. The idea is to use predictive models to find providers that show the marks of questionable payment. “You want to look for providers who make a lot more money than everyone else, or providers who bill a specialty code that nobody else bills,” Leder-Luis says, naming just two of many anomalies the models might look for. In a 2024 study by Leder-Luis and colleagues, machine-learning models achieved an eightfold improvement over random selection in identifying suspicious hospitals.

The government does use some algorithms to do this already, but they’re vastly underutilized and miss clear-cut fraud cases, Leder-Luis says. Switching to a preventive model requires more than just a technological shift. Health-care fraud, like other fraud, is investigated by law enforcement under the current “pay and chase” paradigm. “A lot of the types of things that I’m suggesting require you to think more like a data scientist than like a cop,” Leder-Luis says.

One caveat is procedural. Building AI models, testing them, and deploying them safely in different government agencies is a massive feat, made even more complex by the sensitive nature of health data. 

Critics of Musk, like the tech and democracy group Tech Policy Press, argue that his zeal for government AI discards established procedures and is based on a false idea “that the goal of bureaucracy is merely what it produces (services, information, governance) and can be isolated from the process through which democracy achieves those ends: debate, deliberation, and consensus.”

Jennifer Pahlka, who served as US deputy chief technology officer under President Barack Obama, argued in a recent op-ed in the New York Times that ineffective procedures have held the US government back from adopting useful tech. Still, she warns, abandoning nearly all procedure would be an overcorrection.

Democrats’ goal “must be a muscular, lean, effective administrative state that works for Americans,” she wrote. “Mr. Musk’s recklessness will not get us there, but neither will the excessive caution and addiction to procedure that Democrats exhibited under President Joe Biden’s leadership.”

The other caveat is this: Unless DOGE articulates where and how it’s focusing its efforts, our insight into its intentions is limited. How much is Musk identifying evidence-based opportunities to reduce fraud, versus just slashing what he considers “woke” spending in an effort to drastically reduce the size of the government? It’s not clear DOGE makes a distinction.


Now read the rest of The Algorithm

Deeper Learning

Meta has an AI for brain typing, but it’s stuck in the lab

Researchers working for Meta have managed to analyze people’s brains as they type and determine what keys they are pressing, just from their thoughts. The system can determine what letter a typist has pressed as much as 80% of the time. The catch is that it can only be done in a lab.

Why it matters: Though brain scanning with implants like Neuralink has come a long way, this approach from Meta is different. The company says it is oriented toward basic research into the nature of intelligence, part of a broader effort to uncover how the brain structures language.  Read more from Antonio Regalado.

Bites and Bytes

An AI chatbot told a user how to kill himself—but the company doesn’t want to “censor” it

While Nomi’s chatbot is not the first to suggest suicide, researchers and critics say that its explicit instructions—and the company’s response—are striking. Taken together with a separate case—in which the parents of a teen who died by suicide filed a lawsuit against Character.AI, the maker of a chatbot they say played a key role in their son’s death—it’s clear we are just beginning to see whether an AI company is held legally responsible when its models output something unsafe. (MIT Technology Review)

I let OpenAI’s new “agent” manage my life. It spent $31 on a dozen eggs.

Operator, the new AI that can reach into the real world, wants to act like your personal assistant. This fun review shows what it’s good and bad at—and how it can go rogue. (The Washington Post)

Four Chinese AI startups to watch beyond DeepSeek

DeepSeek is far from the only game in town. These companies are all in a position to compete both within China and beyond. (MIT Technology Review)

Meta’s alleged torrenting and seeding of pirated books complicates copyright case

Newly unsealed emails allegedly provide the “most damning evidence” yet against Meta in a copyright case raised by authors alleging that it illegally trained its AI models on pirated books. In one particularly telling email, an engineer told a colleague, “Torrenting from a corporate laptop doesn’t feel right.” (Ars Technica)

What’s next for smart glassesSmart glasses are on the verge of becoming—whisper it—cool. That’s because, thanks to various technological advancements, they’re becoming useful, and they’re only set to become more so. Here’s what’s coming in 2025 and beyond. (MIT Technology Review)

Three things to know as the dust settles from DeepSeek

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The launch of a single new AI model does not normally cause much of a stir outside tech circles, nor does it typically spook investors enough to wipe out $1 trillion in the stock market. Now, a couple of weeks since DeepSeek’s big moment, the dust has settled a bit. The news cycle has moved on to calmer things, like the dismantling of long-standing US federal programs, the purging of research and data sets to comply with recent executive orders, and the possible fallouts from President Trump’s new tariffs on Canada, Mexico, and China.

Within AI, though, what impact is DeepSeek likely to have in the longer term? Here are three seeds DeepSeek has planted that will grow even as the initial hype fades.

First, it’s forcing a debate about how much energy AI models should be allowed to use up in pursuit of better answers. 

You may have heard (including from me) that DeepSeek is energy efficient. That’s true for its training phase, but for inference, which is when you actually ask the model something and it produces an answer, it’s complicated. It uses a chain-of-thought technique, which breaks down complex questions–-like whether it’s ever okay to lie to protect someone’s feelings—into chunks, and then logically answers each one. The method allows models like DeepSeek to do better at math, logic, coding, and more. 

The problem, at least to some, is that this way of “thinking” uses up a lot more electricity than the AI we’ve been used to. Though AI is responsible for a small slice of total global emissions right now, there is increasing political support to radically increase the amount of energy going toward AI. Whether or not the energy intensity of chain-of-thought models is worth it, of course, depends on what we’re using the AI for. Scientific research to cure the world’s worst diseases seems worthy. Generating AI slop? Less so. 

Some experts worry that the impressiveness of DeepSeek will lead companies to incorporate it into lots of apps and devices, and that users will ping it for scenarios that don’t call for it. (Asking DeepSeek to explain Einstein’s theory of relativity is a waste, for example, since it doesn’t require logical reasoning steps, and any typical AI chat model can do it with less time and energy.) Read more from me here

Second, DeepSeek made some creative advancements in how it trains, and other companies are likely to follow its lead. 

Advanced AI models don’t just learn on lots of text, images, and video. They rely heavily on humans to clean that data, annotate it, and help the AI pick better responses, often for paltry wages. 

One way human workers are involved is through a technique called reinforcement learning with human feedback. The model generates an answer, human evaluators score that answer, and those scores are used to improve the model. OpenAI pioneered this technique, though it’s now used widely by the industry. 

As my colleague Will Douglas Heaven reports, DeepSeek did something different: It figured out a way to automate this process of scoring and reinforcement learning. “Skipping or cutting down on human feedback—that’s a big thing,” Itamar Friedman, a former research director at Alibaba and now cofounder and CEO of Qodo, an AI coding startup based in Israel, told him. “You’re almost completely training models without humans needing to do the labor.” 

It works particularly well for subjects like math and coding, but not so well for others, so workers are still relied upon. Still, DeepSeek then went one step further and used techniques reminiscent of how Google DeepMind trained its AI model back in 2016 to excel at the game Go, essentially having it map out possible moves and evaluate their outcomes. These steps forward, especially since they are outlined broadly in DeepSeek’s open-source documentation, are sure to be followed by other companies. Read more from Will Douglas Heaven here

Third, its success will fuel a key debate: Can you push for AI research to be open for all to see and push for US competitiveness against China at the same time?

Long before DeepSeek released its model for free, certain AI companies were arguing that the industry needs to be an open book. If researchers subscribed to certain open-source principles and showed their work, they argued, the global race to develop superintelligent AI could be treated like a scientific effort for public good, and the power of any one actor would be checked by other participants.

It’s a nice idea. Meta has largely spoken in support of that vision, and venture capitalist Marc Andreessen has said that open-source approaches can be more effective at keeping AI safe than government regulation. OpenAI has been on the opposite side of that argument, keeping its models closed off on the grounds that it can help keep them out of the hands of bad actors. 

DeepSeek has made those narratives a bit messier. “We have been on the wrong side of history here and need to figure out a different open-source strategy,” OpenAI’s Sam Altman said in a Reddit AMA on Friday, which is surprising given OpenAI’s past stance. Others, including President Trump, doubled down on the need to make the US more competitive on AI, seeing DeepSeek’s success as a wake-up call. Dario Amodei, a founder of Anthropic, said it’s a reminder that the US needs to tightly control which types of advanced chips make their way to China in the coming years, and some lawmakers are pushing the same point. 

The coming months, and future launches from DeepSeek and others, will stress-test every single one of these arguments. 


Now read the rest of The Algorithm

Deeper Learning

OpenAI launches a research tool

On Sunday, OpenAI launched a tool called Deep Research. You can give it a complex question to look into, and it will spend up to 30 minutes reading sources, compiling information, and writing a report for you. It’s brand new, and we haven’t tested the quality of its outputs yet. Since its computations take so much time (and therefore energy), right now it’s only available to users with OpenAI’s paid Pro tier ($200 per month) and limits the number of queries they can make per month. 

Why it matters: AI companies have been competing to build useful “agents” that can do things on your behalf. On January 23, OpenAI launched an agent called Operator that could use your computer for you to do things like book restaurants or check out flight options. The new research tool signals that OpenAI is not just trying to make these mundane online tasks slightly easier; it wants to position AI as able to handle  professional research tasks. It claims that Deep Research “accomplishes in tens of minutes what would take a human many hours.” Time will tell if users will find it worth the high costs and the risk of including wrong information. Read more from Rhiannon Williams

Bits and Bytes

Déjà vu: Elon Musk takes his Twitter takeover tactics to Washington

Federal agencies have offered exits to millions of employees and tested the prowess of engineers—just like when Elon Musk bought Twitter. The similarities have been uncanny. (The New York Times)

AI’s use in art and movies gets a boost from the Copyright Office

The US Copyright Office finds that art produced with the help of AI should be eligible for copyright protection under existing law in most cases, but wholly AI-generated works probably are not. What will that mean? (The Washington Post)

OpenAI releases its new o3-mini reasoning model for free

OpenAI just released a reasoning model that’s faster, cheaper, and more accurate than its predecessor. (MIT Technology Review)

Anthropic has a new way to protect large language models against jailbreaks

This line of defense could be the strongest yet. But no shield is perfect. (MIT Technology Review). 

AI’s energy obsession just got a reality check

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Just a week in, the AI sector has already seen its first battle of wits under the new Trump administration. The clash stems from two key pieces of news: the announcement of the Stargate project, which would spend $500 billion—more than the Apollo space program—on new AI data centers, and the release of a powerful new model from China. Together, they raise important questions the industry needs to answer about the extent to which the race for more data centers—with their heavy environmental toll—is really necessary.

A reminder about the first piece: OpenAI, Oracle, SoftBank, and an Abu Dhabi–based investment fund called MGX plan to spend up to $500 billion opening massive data centers around the US to build better AI. Much of the groundwork for this project was laid in 2024, when OpenAI increased its lobbying spending sevenfold (which we were first to report last week) and AI companies started pushing for policies that were less about controlling problems like deepfakes and misinformation, and more about securing more energy.

Still, Trump received credit for it from tech leaders when he announced the effort on his second day in office. “I think this will be the most important project of this era,” OpenAI’s Sam Altman said at the launch event, adding, “We wouldn’t be able to do this without you, Mr. President.”

It’s an incredible sum, just slightly less than the inflation-adjusted cost of building the US highway system over the course of more than 30 years. However, not everyone sees Stargate as having the same public benefit. Environmental groups say it could strain local grids and further drive up the cost of energy for the rest of us, who aren’t guzzling it to train and deploy AI models. Previous research has also shown that data centers tend to be built in areas that use much more carbon-intensive sources of energy, like coal, than the national average. It’s not clear how much, if at all, Stargate will rely on renewable energy. 

Even louder critics of Stargate, though, include Elon Musk. None of Musk’s companies are involved in the project, and he has attempted to publicly sow doubt that OpenAI and SoftBank have enough of the money needed for the plan anyway, claims that Altman disputed on X. Musk’s decision to publicly criticize the president’s initiative has irked people in Trump’s orbit, Politico reports, but it’s not clear if those people have expressed that to Musk directly. 

On to the second piece. On the day Trump was inaugurated, a Chinese startup released an AI model that started making a whole bunch of important people in Silicon Valley very worried about their competition. (This close timing is almost certainly not an accident.)

The model, called DeepSeek R1, is a reasoning model. These types of models are designed to excel at math, logic, pattern-finding, and decision-making. DeepSeek proved it could “reason” through complicated problems as well as one of OpenAI’s reasoning models, o1—and more efficiently. What’s more, DeepSeek isn’t a super-secret project kept behind lock and key like OpenAI’s. It was released for all to see.

DeepSeek was released as the US has made outcompeting China in the AI race a top priority. This goal was a driving force behind the 2022 CHIPS Act to make more chips domestically. It’s influenced the position of tech companies like OpenAI, which has embraced lending its models to national security work and has partnered with the defense-tech company Anduril to help the military take down drones. It’s led to export controls that limit what types of chips Nvidia can sell to China.

The success of DeepSeek signals that these efforts aren’t working as well as AI leaders in the US would like (though it’s worth noting that the impact of export controls for chips isn’t felt for a few years, so the policy wouldn’t be expected to have prevented a model like DeepSeek).  

Still, the model poses a threat to the bottom line of certain players in Big Tech. Why pay for an expensive model from OpenAI when you can get access to DeepSeek for free? Even other makers of open-source models, especially Meta, are panicking about the competition, according to The Information. The company has set up a number of “war rooms” to figure out how DeepSeek was made so efficient. (A couple of days after the Stargate announcement, Meta said it would increase its own capital investments by 70% to build more AI infrastructure.)

What does this all mean for the Stargate project? Let’s think about why OpenAI and its partners are willing to spend $500 billion on data centers to begin with. They believe that AI in its various forms—not just chatbots or generative video or even new AI agents, but also developments yet to be unveiled—will be the most lucrative tool humanity has ever built. They also believe that access to powerful chips inside massive data centers is the key to getting there. 

DeepSeek poked some holes in that approach. It didn’t train on yet-unreleased chips that are light-years ahead. It didn’t, to our knowledge, require the eye-watering amounts of computing power and energy behind the models from US companies that have made headlines. Its designers made clever decisions in the name of efficiency.

In theory, it could make a project like Stargate seem less urgent and less necessary. If, in dissecting DeepSeek, AI companies discover some lessons about how to make models use existing resources more effectively, perhaps constructing more and more data centers won’t be the only winning formula for better AI. That would be welcome to the many people affected by the problems data centers can bring, like lots of emissions, the loss of fresh, drinkable water used to cool them, and the strain on local power grids. 

Thus far, DeepSeek doesn’t seem to have sparked such a change in approach. OpenAI researcher Noam Brown wrote on X, “I have no doubt that with even more compute it would be an even more powerful model.”

If his logic wins out, the players with the most computing power will win, and getting it is apparently worth at least $500 billion to AI’s biggest companies. But let’s remember—announcing it is the easiest part.


Now read the rest of The Algorithm

Deeper Learning

What’s next for robots

Many of the big questions about AI–-how it learns, how well it works, and where it should be deployed—are now applicable to robotics. In the year ahead, we will see humanoid robots being put to the test in warehouses and factories, robots learning in simulated worlds, and a rapid increase in the military’s adoption of autonomous drones, submarines, and more. 

Why it matters: Jensen Huang, the highly influential CEO of the chipmaker Nvidia, stated last month that the next advancement in AI will mean giving the technology a “body” of sorts in the physical world. This will come in the form of advanced robotics. Even with the caveat that robotics is full of futuristic promises that usually aren’t fulfilled by their deadlines, the marrying of AI methods with new advancements in robots means the field is changing quickly. Read more here.

Bits and Bytes

Leaked documents expose deep ties between Israeli army and Microsoft

Since the attacks of October 7, the Israeli military has relied heavily on cloud and AI services from Microsoft and its partner OpenAI, and the tech giant’s staff has embedded with different units to support rollout, a joint investigation reveals. (+972 Magazine)

The tech arsenal that could power Trump’s immigration crackdown

The effort by federal agencies to acquire powerful technology to identify and track migrants has been unfolding for years across multiple administrations. These technologies may be called upon more directly under President Trump. (The New York Times)

OpenAI launches Operator—an agent that can use a computer for you

Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or making an online grocery order. (MIT Technology Review)

The second wave of AI coding is here

A string of startups are racing to build models that can produce better and better software. But it’s not only AI’s increasingly powerful ability to write code that’s impressive. They claim it’s the shortest path to superintelligent AI. (MIT Technology Review)

Here’s our forecast for AI this year

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In December, our small but mighty AI reporting team was asked by our editors to make a prediction: What’s coming next for AI? 

In 2024, AI contributed both to Nobel Prize–winning chemistry breakthroughs and a mountain of cheaply made content that few people asked for but that nonetheless flooded the internet. Take AI-generated Shrimp Jesus images, among other examples. There was also a spike in greenhouse-gas emissions last year that can be attributed partly to the surge in energy-intensive AI. Our team got to thinking about how all of this will shake out in the year to come. 

As we look ahead, certain things are a given. We know that agents—AI models that do more than just converse with you and can actually go off and complete tasks for you—are the focus of many AI companies right now. Building them will raise lots of privacy questions about how much of our data and preferences we’re willing to give up in exchange for tools that will (allegedly) save us time. Similarly, the need to make AI faster and more energy efficient is putting so-called small language models in the spotlight. 

We instead wanted to focus on less obvious predictions. Mine were about how AI companies that previously shunned work in defense and national security might be tempted this year by contracts from the Pentagon, and how Donald Trump’s attitudes toward China could escalate the global race for the best semiconductors. Read the full list.

What’s not evident in that story is that the other predictions were not so clear-cut. Arguments ensued about whether or not 2025 will be the year of intimate relationships with chatbots, AI throuples, or traumatic AI breakups. To witness the fallout from our team’s lively debates (and hear more about what didn’t make the list), you can join our upcoming LinkedIn Live this Thursday, January 16. I’ll be talking it all over with Will Douglas Heaven, our senior editor for AI, and our news editor, Charlotte Jee. 

There are a couple other things I’ll be watching closely in 2025. One is how little the major AI players—namely OpenAI, Microsoft, and Google—are disclosing about the environmental burden of their models. Lots of evidence suggests that asking an AI model like ChatGPT about knowable facts, like the capital of Mexico, consumes much more energy (and releases far more emissions) than simply asking a search engine. Nonetheless, OpenAI’s Sam Altman in recent interviews has spoken positively about the idea of ChatGPT replacing the googling that we’ve all learned to do in the past two decades. It’s already happening, in fact. 

The environmental cost of all this will be top of mind for me in 2025, as will the possible cultural cost. We will go from searching for information by clicking links and (hopefully) evaluating sources to simply reading the responses that AI search engines serve up for us. As our editor in chief, Mat Honan, said in his piece on the subject, “Who wants to have to learn when you can just know?”


Now read the rest of The Algorithm

Deeper Learning

What’s next for our privacy?

The US Federal Trade Commission has taken a number of enforcement actions against data brokers, some of which have  tracked and sold geolocation data from users at sensitive locations like churches, hospitals, and military installations without explicit consent. Though limited in nature, these actions may offer some new and improved protections for Americans’ personal information. 

Why it matters: A consensus is growing that Americans need better privacy protections—and that the best way to deliver them would be for Congress to pass comprehensive federal privacy legislation. Unfortunately, that’s not going to happen anytime soon. Enforcement actions from agencies like the FTC might be the next best thing in the meantime. Read more in Eileen Guo’s excellent story here.

Bits and Bytes

Meta trained its AI on a notorious piracy database

New court records, Wired reports, reveal that Meta used “a notorious so-called shadow library of pirated books that originated in Russia” to train its generative AI models. (Wired)

OpenAI’s top reasoning model struggles with the NYT Connections game

The game requires players to identify how groups of words are related. OpenAI’s o1 reasoning model had a hard time. (Mind Matters)

Anthropic’s chief scientist on 5 ways agents will be even better in 2025

The AI company Anthropic is now worth $60 billion. The company’s cofounder and chief scientist, Jared Kaplan, shared how AI agents will develop in the coming year. (MIT Technology Review)

A New York legislator attempts to regulate AI with a new bill

This year, a high-profile bill in California to regulate the AI industry was vetoed by Governor Gavin Newsom. Now, a legislator in New York is trying to revive the effort in his own state. (MIT Technology Review)

How US AI policy might change under Trump

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

President Biden first witnessed the capabilities of ChatGPT in 2022 during a demo from Arati Prabhakar, the director of the White House Office of Science and Technology Policy, in the oval office. That demo set a slew of events into motion and encouraged President Biden to support the US’s AI sector while managing the safety risks that will come from it. 

Prabhakar was a key player in passing the president’s executive order on AI in 2023, which sets rules for tech companies to make AI safer and more transparent (though it relies on voluntary participation). Before serving in President Biden’s cabinet, she held a number of government roles, from rallying for domestic production of semiconductors to heading up DARPA, the Pentagon’s famed research department. 

I had a chance to sit down with Prabhakar earlier this month. We discussed AI risks, immigration policies, the CHIPS Act, the public’s faith in science, and how it all may change under Trump.

The change of administrations comes at a chaotic time for AI. Trump’s team has not presented a clear thesis on how it will handle artificial intelligence, but plenty of people in it want to see that executive order dismantled. Trump said as much in July, endorsing the Republican platform that says the executive order “hinders AI innovation and imposes Radical Leftwing ideas on the development of this technology.” Powerful industry players, like venture capitalist Marc Andreessen, have said they support that move. However, complicating that narrative will be Elon Musk, who for years has expressed fears about doomsday AI scenarios and has been supportive of some regulations aiming to promote AI safety. No one really knows exactly what’s coming next, but Prabhakar has plenty of thoughts about what’s happened so far.

For her insights about the most important AI developments of the last administration, and what might happen in the next one, read my conversation with Arati Prabhakar


Now read the rest of The Algorithm

Deeper Learning

These AI Minecraft characters did weirdly human stuff all on their own

The video game Minecraft is increasingly popular as a testing ground for AI models and agents. That’s a trend startup Altera recently embraced. It unleashed up to 1,000 software agents at a time, powered by large language models (LLMs), to interact with one another. Given just a nudge through text prompting, they developed a remarkable range of personality traits, preferences, and specialist roles, with no further inputs from their human creators. Remarkably, they spontaneously made friends, invented jobs, and even spread religion.

Why this matters: AI agents can execute tasks and exhibit autonomy, taking initiative in digital environments. This is another example of how the behaviors of such agents, with minimal prompting from humans, can be both impressive and downright bizarre. The people working to bring agents into the world have bold ambitions for them. Altera’s founder, Robert Yang sees the Minecraft experiments as an early step towards large-scale “AI civilizations” with agents that can coexist and work alongside us in digital spaces. “The true power of AI will be unlocked when we have truly autonomous agents that can collaborate at scale,” says Yang. Read more from Niall Firth.

Bits and Bytes

OpenAI is exploring advertising

Building and maintaining some of the world’s leading AI models doesn’t come cheap. The Financial Times has reported that OpenAI is hiring advertising talent from big tech rivals in a push to increase revenues. (Financial Times)

Landlords are using AI to raise rents, and cities are starting to push back

RealPage is a tech company that collects proprietary lease information on how much renters are paying and then uses an AI model to suggest to realtors how much to charge on apartments. Eight states and many municipalities have joined antitrust suits against the company, saying it constitutes an “unlawful information-sharing scheme” and inflates rental prices. (The Markup)

The way we measure progress in AI is terrible

Whenever new models come out, the companies that make them advertise how they perform in benchmark tests against other models. There are even leaderboards that rank them. But new research suggests these measurement methods aren’t helpful. (MIT Technology Review)

Nvidia has released a model that can create sounds and music

AI tools to make music and audio have received less attention than their counterparts that create images and video, except when the companies that make them get sued. Now, chip maker Nvidia has entered the space with a tool that creates impressive sound effects and music. (Ars Technica)

Artists say they leaked OpenAI’s Sora video model in protest

Many artists are outraged at the tech company for training its models on their work without compensating them. Now, a group of artists who were beta testers for OpenAI’s Sora model say they leaked it out of protest. (The Verge)

How the largest gathering of US police chiefs is talking about AI

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

It can be tricky for reporters to get past certain doors, and the door to the International Association of Chiefs of Police conference is one that’s almost perpetually shut to the media. Thus, I was pleasantly surprised when I was able to attend for a day in Boston last month. 

It bills itself as the largest gathering of police chiefs in the United States, where leaders from many of the country’s 18,000 police departments and even some from abroad convene for product demos, discussions, parties, and awards. 

I went along to see how artificial intelligence was being discussed, and the message to police chiefs seemed crystal clear: If your department is slow to adopt AI, fix that now. The future of policing will rely on it in all its forms.

In the event’s expo hall, the vendors (of which there were more than 600) offered a glimpse into the ballooning industry of police-tech suppliers. Some had little to do with AI—booths showcased body armor, rifles, and prototypes of police-branded Cybertrucks, and others displayed new types of gloves promising to protect officers from needles during searches. But one needed only to look to where the largest crowds gathered to understand that AI was the major draw. 

The hype focused on three uses of AI in policing. The flashiest was virtual reality, exemplified by the booth from V-Armed, which sells VR systems for officer training. On the expo floor, V-Armed built an arena complete with VR goggles, cameras, and sensors, not unlike the one the company recently installed at the headquarters of the Los Angeles Police Department. Attendees could don goggles and go through training exercises on responding to active shooter situations. Many competitors of V-Armed were also at the expo, selling systems they said were cheaper, more effective, or simpler to maintain. 

The pitch on VR training is that in the long run, it can be cheaper and more engaging to use than training with actors or in a classroom. “If you’re enjoying what you’re doing, you’re more focused and you remember more than when looking at a PDF and nodding your head,” V-Armed CEO Ezra Kraus told me. 

The effectiveness of VR training systems has yet to be fully studied, and they can’t completely replicate the nuanced interactions police have in the real world. AI is not yet great at the soft skills required for interactions with the public. At a different company’s booth, I tried out a VR system focused on deescalation training, in which officers were tasked with calming down an AI character in distress. It suffered from lag and was generally quite awkward—the character’s answers felt overly scripted and programmatic. 

The second focus was on the changing way police departments are collecting and interpreting data. Rather than buying a gunshot detection tool from one company and a license plate reader or drone from another, police departments are increasingly using expanding suites of sensors, cameras, and so on from a handful of leading companies that promise to integrate the data collected and make it useful. 

Police chiefs attended classes on how to build these systems, like one taught by Microsoft and the NYPD about the Domain Awareness System, a web of license plate readers, cameras, and other data sources used to track and monitor crime in New York City. Crowds gathered at massive, high-tech booths from Axon and Flock, both sponsors of the conference. Flock sells a suite of cameras, license plate readers, and drones, offering AI to analyze the data coming in and trigger alerts. These sorts of tools have come in for heavy criticism from civil liberties groups, which see them as an assault on privacy that does little to help the public. 

Finally, as in other industries, AI is also coming for the drudgery of administrative tasks and reporting. Many companies at the expo, including Axon, offer generative AI products to help police officers write their reports. Axon’s offering, called Draft One, ingests footage from body cameras, transcribes it, and creates a first draft of a report for officers. 

“We’ve got this thing on an officer’s body, and it’s recording all sorts of great stuff about the incident,” Bryan Wheeler, a senior vice president at Axon, told me at the expo. “Can we use it to give the officer a head start?”

On the surface, it’s a writing task well suited for AI, which can quickly summarize information and write in a formulaic way. It could also save lots of time officers currently spend on writing reports. But given that AI is prone to “hallucination,” there’s an unavoidable truth: Even if officers are the final authors of their reports, departments adopting these sorts of tools risk injecting errors into some of the most critical documents in the justice system. 

“Police reports are sometimes the only memorialized account of an incident,” wrote Andrew Ferguson, a professor of law at American University, in July in the first law review article about the serious challenges posed by police reports written with AI. “Because criminal cases can take months or years to get to trial, the accuracy of these reports are critically important.” Whether certain details were included or left out can affect the outcomes of everything from bail amounts to verdicts. 

By showing an officer a generated version of a police report, the tools also expose officers to details from their body camera recordings before they complete their report, a document intended to capture the officer’s memory of the incident. That poses a problem. 

“The police certainly would never show video to a bystander eyewitness before they ask the eyewitness about what took place, as that would just be investigatory malpractice,” says Jay Stanley, a senior policy analyst with the ACLU Speech, Privacy, and Technology Project, who will soon publish work on the subject. 

A spokesperson for Axon says this concern “isn’t reflective of how the tool is intended to work,” and that Draft One has robust features to make sure officers read the reports closely, add their own information, and edit the reports for accuracy before submitting them.

My biggest takeaway from the conference was simply that the way US police are adopting AI is inherently chaotic. There is no one agency governing how they use the technology, and the roughly 18,000 police departments in the United States—the precise figure is not even known—have remarkably high levels of autonomy to decide which AI tools they’ll buy and deploy. The police-tech companies that serve them will build the tools police departments find attractive, and it’s unclear if anyone will draw proper boundaries for ethics, privacy, and accuracy. 

That will only be made more apparent in an upcoming Trump administration. In a policing agenda released last year during his campaign, Trump encouraged more aggressive tactics like “stop and frisk,” deeper cooperation with immigration agencies, and increased liability protection for officers accused of wrongdoing. The Biden administration is now reportedly attempting to lock in some of its proposed policing reforms before January. 

Without federal regulation on how police departments can and cannot use AI, the lines will be drawn by departments and police-tech companies themselves.

“Ultimately, these are for-profit companies, and their customers are law enforcement,” says Stanley. “They do what their customers want, in the absence of some very large countervailing threat to their business model.”


Now read the rest of The Algorithm

Deeper Learning

The AI lab waging a guerrilla war over exploitative AI

When generative AI tools landed on the scene, artists were immediately concerned, seeing them as a new kind of theft. Computer security researcher Ben Zhao jumped into action in response, and his lab at the University of Chicago started building tools like Nightshade and Glaze to help artists keep their work from being scraped up by AI models. My colleague Melissa Heikkilä spent time with Zhao and his team to look at the ongoing effort to make these tools strong enough to stop AI’s relentless hunger for more images, art, and data to train on.  

Why this matters: The current paradigm in AI is to build bigger and bigger models, and these require vast data sets to train on. Tech companies argue that anything on the public internet is fair game, while artists demand compensation or the right to refuse. Settling this fight in the courts or through regulation could take years, so tools like Nightshade and Glaze are what artists have for now. If the tools disrupt AI companies’ efforts to make better models, that could push them to the negotiating table to bargain over licensing and fair compensation. But it’s a big “if.” Read more from Melissa Heikkilä.

Bits and Bytes

Tech elites are lobbying Elon Musk for jobs in Trump’s administration

Elon Musk is the tech leader who most has Trump’s ear. As such, he’s reportedly the conduit through which AI and tech insiders are pushing to have an influence in the incoming administration. (The New York Times)

OpenAI is getting closer to launching an AI agent to automate your tasks

AI agents—models that can do tasks for you on your behalf—are all the rage. OpenAI is reportedly closer to releasing one, news that comes a few weeks after Anthropic announced its own. (Bloomberg)

How this grassroots effort could make AI voices more diverse

A massive volunteer-led effort to collect training data in more languages, from people of more ages and genders, could help make the next generation of voice AI more inclusive and less exploitative. (MIT Technology Review

Google DeepMind has a new way to look inside an AI’s “mind”

Autoencoders let us peer into the black box of artificial intelligence. They could help us create AI that is better understood and more easily controlled. (MIT Technology Review)

Musk has expanded his legal assault on OpenAI to target Microsoft

Musk has expanded his federal lawsuit against OpenAI, which alleges that the company has abandoned its nonprofit roots and obligations. He’s now going after Microsoft too, accusing it of antitrust violations in its work with OpenAI. (The Washington Post)

How ChatGPT search paves the way for AI agents

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

OpenAI’s Olivier Godement, head of product for its platform, and Romain Huet, head of developer experience, are on a whistle-stop tour around the world. Last week, I sat down with the pair in London before DevDay, the company’s annual developer conference. London’s DevDay is the first one for the company outside San Francisco. Godement and Huet are heading to Singapore next. 

It’s been a busy few weeks for the company. In London, OpenAI announced updates to its new Realtime API platform, which allows developers to build voice features into their applications. The company is rolling out new voices and a function that lets developers generate prompts, which will allow them to build apps and more helpful voice assistants more quickly. Meanwhile for consumers, OpenAI announced it was launching ChatGPT search, which allows users to search the internet using the chatbot. Read more here

Both developments pave the way for the next big thing in AI: agents. These are AI assistants that can complete complex chains of tasks, such as booking flights. (You can read my explainer on agents here.) 

“Fast-forward a few years—every human on Earth, every business, has an agent. That agent knows you extremely well. It knows your preferences,” Godement says. The agent will have access to your emails, apps, and calendars and will act like a chief of staff, interacting with each of these tools and even working on long-term problems, such as writing a paper on a particular topic, he says. 

OpenAI’s strategy is to both build agents itself and allow developers to use its software to build their own agents, says Godement. Voice will play an important role in what agents will look and feel like. 

“At the moment most of the apps are chat based … which is cool, but not suitable for all use cases. There are some use cases where you’re not typing, not even looking at the screen, and so voice essentially has a much better modality for that,” he says. 

But there are two big hurdles that need to be overcome before agents can become a reality, Godement says. 

The first is reasoning. Building AI agents requires us to be able to trust that they will be able to complete complex tasks and do the right things, says Huet. That’s where OpenAI “reasoning” feature comes in. Introduced in OpenAI’s o1 model last month, it uses reinforcement learning to teach the model how to process information using “chain of thought.” Giving the model more time to generate answers allows it to recognize and correct mistakes, break down problems into smaller ones, and try different approaches to answering questions, Godement says. 

But OpenAI’s claims about reasoning should be taken with a pinch of salt, says Chirag Shah, a computer science professor at the University of Washington. Large language models are not exhibiting true reasoning. It’s most likely that they have picked up what looks like logic from something they’ve seen in their training data.

“These models sometimes seem to be really amazing at reasoning, but it’s just like they’re really good at pretending, and it only takes a little bit of picking at them to break them,” he says.

There is still much more work to be done, Godement admits. In the short term, AI models such as o1 need to be much more reliable, faster, and cheaper. In the long term, the company needs to apply its chain-of-thought technique to a wider pool of use cases. OpenAI has focused on science, coding, and math. Now it wants to address other fields, such as law, accounting, and economics, he says. 

Second on the to-do list is the ability to connect different tools, Godement says. An AI model’s capabilities will be limited if it has to rely on its training data alone. It needs to be able to surf the web and look for up-to-date information. ChatGPT search is one powerful way OpenAI’s new tools can now do that. 

These tools need to be able not only to retrieve information but to take actions in the real world. Competitor Anthropic announced a new feature where its Claude chatbot can “use” a computer by interacting with its interface to click on things, for example. This is an important feature for agents if they are going to be able to execute tasks like booking flights. Godement says o1 can “sort of” use tools, though not very reliably, and that research on tool use is a “promising development.” 

In the next year, Godemont says, he expects the adoption of AI for customer support and other assistant-based tasks to grow. However, he says that it can be hard to predict how people will adopt and use OpenAI’s technology. 

“Frankly, looking back every year, I’m surprised by use cases that popped up that I did not even anticipate,” he says. “I expect there will be quite a few surprises that you know none of us could predict.” 


Now read the rest of The Algorithm

Deeper Learning

This AI-generated version of Minecraft may represent the future of real-time video generation

When you walk around in a version of the video game Minecraft from the AI companies Decart and Etched, it feels a little off. Sure, you can move forward, cut down a tree, and lay down a dirt block, just like in the real thing. If you turn around, though, the dirt block you just placed may have morphed into a totally new environment. That doesn’t happen in Minecraft. But this new version is entirely AI-generated, so it’s prone to hallucinations. Not a single line of code was written.

Ready, set, go: This version of Minecraft is generated in real time, using a technique known as next-frame prediction. The AI companies behind it did this by training their model, Oasis, on millions of hours of Minecraft game play and recordings of the corresponding actions a user would take in the game. The AI is able to sort out the physics, environments, and controls of Minecraft from this data alone. Read more from Scott J. Mulligan.

Bits and Bytes

AI search could break the web
At its best, AI search can better infer a user’s intent, amplify quality content, and synthesize information from diverse sources. But if AI search becomes our primary portal to the web, it threatens to disrupt an already precarious digital economy, argues Benjamin Brooks, a fellow at the Berkman Klein Center at Harvard University, who used to lead public policy for Stability AI. (MIT Technology Review

AI will add to the e-waste problem. Here’s what we can do about it.
Equipment used to train and run generative AI models could produce up to 5 million tons of e-waste by 2030, a relatively small but significant fraction of the global total. (MIT Technology Review

How an “interview” with a dead luminary exposed the pitfalls of AI
A state-funded radio station in Poland fired its on-air talent and brought in AI-generated presenters. But the experiment caused an outcry and was stopped when tone of them  “interviewed” a dead Nobel laureate. (The New York Times

Meta says yes, please, to more AI-generated slop
In Meta’s latest earnings call, CEO Mark Zuckerberg said we’re likely to see 
“a whole new category of content, which is AI generated or AI summarized content or kind of existing content pulled together by AI in some way.” Zuckerberg added that he thinks “that’s going to be just very exciting.” (404 Media

Palmer Luckey’s vision for the future of mixed reality

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

War is a catalyst for change, an expert in AI and warfare told me in 2022. At the time, the war in Ukraine had just started, and the military AI business was booming. Two years later, things have only ramped up as geopolitical tensions continue to rise.

Silicon Valley players are poised to benefit. One of them is Palmer Luckey, the founder of the virtual-reality headset company Oculus, which he sold to Facebook for $2 billion. After Luckey’s highly public ousting from Meta, he founded Anduril, which focuses on drones, cruise missiles, and other AI-enhanced technologies for the US Department of Defense. The company is now valued at $14 billion. My colleague James O’Donnell interviewed Luckey about his new pet project: headsets for the military. 

Luckey is increasingly convinced that the military, not consumers, will see the value of mixed-reality hardware first: “You’re going to see an AR headset on every soldier, long before you see it on every civilian,” he says. In the consumer world, any headset company is competing with the ubiquity and ease of the smartphone, but he sees entirely different trade-offs in defense. Read the interview here

The use of AI for military purposes is controversial. Back in 2018, Google pulled out of the Pentagon’s Project Maven, an attempt to build image recognition systems to improve drone strikes, following staff walkouts over the ethics of the technology. (Google has since returned to offering services for the defense sector.) There has been a long-standing campaign to ban autonomous weapons, also known as “killer robots,” which powerful militaries such as the US have refused to agree to.  

But the voices that boom even louder belong to an influential faction in Silicon Valley, such as Google’s former CEO Eric Schmidt, who has called for the military to adopt and invest more in AI to get an edge over adversaries. Militaries all over the world have been very receptive to this message.

That’s good news for the tech sector. Military contracts are long and lucrative, for a start. Most recently, the Pentagon purchased services from Microsoft and OpenAI to do search, natural-language processing, machine learning, and data processing, reports The Intercept. In the interview with James, Palmer Luckey says the military is a perfect testing ground for new technologies. Soldiers do as they are told and aren’t as picky as consumers, he explains. They’re also less price-sensitive: Militaries don’t mind spending a premium to get the latest version of a technology.

But there are serious dangers in adopting powerful technologies prematurely in such high-risk areas. Foundation models pose serious national security and privacy threats by, for example, leaking sensitive information, argue researchers at the AI Now Institute and Meredith Whittaker, president of the communication privacy organization Signal, in a new paper. Whittaker, who was a core organizer of the Project Maven protests, has said that the push to militarize AI is really more about enriching tech companies than improving military operations. 

Despite calls for stricter rules around transparency, we are unlikely to see governments restrict their defense sectors in any meaningful way beyond voluntary ethical commitments. We are in the age of AI experimentation, and militaries are playing with the highest stakes of all. And because of the military’s secretive nature, tech companies can experiment with the technology without the need for transparency or even much accountability. That suits Silicon Valley just fine. 


Now read the rest of The Algorithm

Deeper Learning

How Wayve’s driverless cars will meet one of their biggest challenges yet

The UK driverless-car startup Wayve is headed west. The firm’s cars learned to drive on the streets of London. But Wayve has announced that it will begin testing its tech in and around San Francisco as well. And that brings a new challenge: Its AI will need to switch from driving on the left to driving on the right.

Full speed ahead: As visitors to or from the UK will know, making that switch is harder than it sounds. Your view of the road, how the vehicle turns—it’s all different. The move to the US will be a test of Wayve’s technology, which the company claims is more general-purpose than what many of its rivals are offering. Across the Atlantic, the company will now go head to head with the heavyweights of the growing autonomous-car industry, including Cruise, Waymo, and Tesla. Join Will Douglas Heaven on a ride in one of its cars to find out more

Bits and Bytes

Kids are learning how to make their own little language models
Little Language Models is a new application from two PhD researchers at MIT’s Media Lab that helps children understand how AI models work—by getting to build small-scale versions themselves. (MIT Technology Review

Google DeepMind is making its AI text watermark open source
Google DeepMind has developed a tool for identifying AI-generated text called SynthID, which is part of a larger family of watermarking tools for generative AI outputs. The company is applying the watermark to text generated by its Gemini models and making it available for others to use too. (MIT Technology Review

Anthropic debuts an AI model that can “use” a computer
The tool enables the company’s Claude AI model to interact with computer interfaces and take actions such as moving a cursor, clicking on things, and typing text. It’s a very cumbersome and error-prone version of what some have said AI agents will be able to do one day. (Anthropic

Can an AI chatbot be blamed for a teen’s suicide?
A 14-year-old boy committed suicide, and his mother says it was because he was obsessed with an AI chatbot created by Character.AI. She is suing the company. Chatbots have been touted as cures for loneliness, but critics say they actually worse isolation.  (The New York Times

Google, Microsoft, and Perplexity are promoting scientific racism in search results
The internet’s biggest AI-powered search engines are featuring the widely debunked idea that white people are genetically superior to other races. (Wired

A data bottleneck is holding AI science back, says new Nobel winner

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

David Baker is sleep-deprived but happy. He’s just won the Nobel prize, after all. 

The call from the Royal Swedish Academy of Sciences woke him in the middle of the night. Or rather, his wife did. She answered the phone at their home in Washington, D.C. and screamed that he’d won the Nobel Prize for Chemistry. The prize is the ultimate recognition of his work as a biochemist at the University of Washington.

“I woke up at two [a.m.] and basically didn’t sleep through the whole day, which was all parties and stuff,” he told me the day after the announcement. “I’m looking forward to getting back to normal a little bit today.”

Last week was a major milestone for AI, with two Nobel prizes awarded for AI-related discoveries. 

Baker wasn’t alone in winning the Nobel Prize for Chemistry. The Royal Swedish Academy of Sciences awarded it to Demis Hassabis, the cofounder and CEO of Google DeepMind, and John M. Jumper, a director at the same company, too. Google DeepMind was awarded for its research on AlphaFold, a tool which can predict how proteins are structured, while Baker was recognized for his work using AI to design new proteinsRead more about it here

Meanwhile, the physics prize went to Geoffrey Hinton, a computer scientist whose pioneering work on deep learning in the 1980s and ’90s underpins all of the most powerful AI models in the world today, and fellow computer scientist John Hopfield, who invented a type of pattern-matching neural network that can store and reconstruct data. Read more about it here.

Speaking to reporters after the prize was announced, Hassabis said he believes that it will herald more AI tools being used for significant scientific discoveries. 

But there is one problem. AI needs masses of high-quality data to be useful for science, and databases containing that sort of data are rare, says Baker. 

The prize is a recognition for the whole community of people working as protein designers. It will help move protein design from the “lunatic fringe of stuff that no one ever thought would be useful for anything to being at the center stage,” he says.  

AI has been a gamechanger for biochemists like Baker. Seeing what DeepMind was able to do with AlphaFold made it clear that deep learning was going to be a powerful tool for their work. 

“There’s just all these problems that were really hard before that we are now having much more success with thanks to generative AI methods. We can do much more complicated things,” Baker says. 

Baker is already busy at work. He says his team is focusing on designing enzymes, which carry out all the chemical reactions that living things rely upon to exist. His team is also working on medicines that only act at the right time and place in the body. 

But Baker is hesitant in calling this a watershed moment for AI in science. 

In AI there’s a saying: Garbage in, garbage out. If the data that is fed into AI models is not good, the outcomes won’t be dazzling either. 

The power of the Chemistry Nobel Prize-winning AI tools lies in the Protein Data Bank (PDB), a rare treasure trove of high-quality, curated and standardized data. This is exactly the kind of data that AI needs to do anything useful. But the current trend in AI development is training ever-larger models on the entire content of the internet, which is increasingly full of AI-generated slop. This slop in turn gets sucked into datasets and pollutes the outcomes, leading to bias and errors. That’s just not good enough for rigorous scientific discovery.

“If there were many databases as good as the PDB, I would say, yes, this [prize] probably is just the first of many, but it is kind of a unique database in biology,” Baker says. “It’s not just the methods, it’s the data. And there aren’t so many places where we have that kind of data.”


Now read the rest of The Algorithm

Deeper Learning

Adobe wants to make it easier for artists to blacklist their work from AI scraping

Adobe has announced a new tool to help creators watermark their work and opt out of having it used to train generative AI models. The web app, called Adobe Content Authenticity, also gives artists the opportunity to add “content credentials,” including their verified identity, social media handles, or other online domains, to their work.

A digital signature: Content credentials are based on C2PA, an internet protocol that uses cryptography to securely label images, video, and audio with information clarifying where they came from—the 21st-century equivalent of an artist’s signature. Creators can apply them to their content regardless of whether it was created using Adobe tools. The company is launching a public beta in early 2025. Read more from Rhiannon Williams here.

Bits and Bytes

Why artificial intelligence and clean energy need each other
A geopolitical battle is raging over the future of AI. The key to winning it is a clean-energy revolution, argue Michael Kearney and Lisa Hansmann, from Engine Ventures, a firm that invests in startups commercializing breakthrough science and engineering. They believe that AI’s huge power demands represent a chance to scale the next generation of clean energy technologies. (MIT Technology Review)

The state of AI in 2025
AI investor Nathan Benaich and Air Street Capital have released their annual analysis of the state of AI. Their predictions for the next year? Big, proprietary models will start to lose their edge, and labs will focus more on planning and reasoning. Perhaps unsurprisingly, the investor also bets that a handful of AI companies will begin to generate serious revenue. 

Silicon Valley, the new lobbying monster
Big Tech’s tentacles reach everywhere in Washington DC. This is a fascinating look at how tech companies lobby politicians to influence how AI is regulated in the United States.  (The New Yorker

Why OpenAI’s new model is such a big deal

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

Last weekend, I got married at a summer camp, and during the day our guests competed in a series of games inspired by the show Survivor that my now-wife and I orchestrated. When we were planning the games in August, we wanted one station to be a memory challenge, where our friends and family would have to memorize part of a poem and then relay it to their teammates so they could re-create it with a set of wooden tiles. 

I thought OpenAI’s GPT-4o, its leading model at the time, would be perfectly suited to help. I asked it to create a short wedding-themed poem, with the constraint that each letter could only appear a certain number of times so we could make sure teams would be able to reproduce it with the provided set of tiles. GPT-4o failed miserably. The model repeatedly insisted that its poem worked within the constraints, even though it didn’t. It would correctly count the letters only after the fact, while continuing to deliver poems that didn’t fit the prompt. Without the time to meticulously craft the verses by hand, we ditched the poem idea and instead challenged guests to memorize a series of shapes made from colored tiles. (That ended up being a total hit with our friends and family, who also competed in dodgeball, egg tosses, and capture the flag.)    

However, last week OpenAI released a new model called o1 (previously referred to under the code name “Strawberry” and, before that, Q*) that blows GPT-4o out of the water for this type of purpose

Unlike previous models that are well suited for language tasks like writing and editing, OpenAI o1 is focused on multistep “reasoning,” the type of process required for advanced mathematics, coding, or other STEM-based questions. It uses a “chain of thought” technique, according to OpenAI. “It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working,” the company wrote in a blog post on its website.

OpenAI’s tests point to resounding success. The model ranks in the 89th percentile on questions from the competitive coding organization Codeforces and would be among the top 500 high school students in the USA Math Olympiad, which covers geometry, number theory, and other math topics. The model is also trained to answer PhD-level questions in subjects ranging from astrophysics to organic chemistry. 

In math olympiad questions, the new model is 83.3% accurate, versus 13.4% for GPT-4o. In the PhD-level questions, it averaged 78% accuracy, compared with 69.7% from human experts and 56.1% from GPT-4o. (In light of these accomplishments, it’s unsurprising the new model was pretty good at writing a poem for our nuptial games, though still not perfect; it used more Ts and Ss than instructed to.)

So why does this matter? The bulk of LLM progress until now has been language-driven, resulting in chatbots or voice assistants that can interpret, analyze, and generate words. But in addition to getting lots of facts wrong, such LLMs have failed to demonstrate the types of skills required to solve important problems in fields like drug discovery, materials science, coding, or physics. OpenAI’s o1 is one of the first signs that LLMs might soon become genuinely helpful companions to human researchers in these fields. 

It’s a big deal because it brings “chain-of-thought” reasoning in an AI model to a mass audience, says Matt Welsh, an AI researcher and founder of the LLM startup Fixie. 

“The reasoning abilities are directly in the model, rather than one having to use separate tools to achieve similar results. My expectation is that it will raise the bar for what people expect AI models to be able to do,” Welsh says.

That said, it’s best to take OpenAI’s comparisons to “human-level skills” with a grain of salt, says Yves-Alexandre de Montjoye, an associate professor in math and computer science at Imperial College London. It’s very hard to meaningfully compare how LLMs and people go about tasks such as solving math problems from scratch.

Also, AI researchers say that measuring how well a model like o1 can “reason” is harder than it sounds. If it answers a given question correctly, is that because it successfully reasoned its way to the logical answer? Or was it aided by a sufficient starting point of knowledge built into the model? The model “still falls short when it comes to open-ended reasoning,” Google AI researcher François Chollet wrote on X.

Finally, there’s the price. This reasoning-heavy model doesn’t come cheap. Though access to some versions of the model is included in premium OpenAI subscriptions, developers using o1 through the API will pay three times as much as they pay for GPT-4o—$15 per 1 million input tokens in o1, versus $5 for GPT-4o. The new model also won’t be most users’ first pick for more language-heavy tasks, where GPT-4o continues to be the better option, according to OpenAI’s user surveys. 

What will it unlock? We won’t know until researchers and labs have the access, time, and budget to tinker with the new mode and find its limits. But it’s surely a sign that the race for models that can outreason humans has begun. 

Now read the rest of The Algorithm


Deeper learning

Chatbots can persuade people to stop believing in conspiracy theories

Researchers believe they’ve uncovered a new tool for combating false conspiracy theories: AI chatbots. Researchers from MIT Sloan and Cornell University found that chatting about a conspiracy theory with a large language model (LLM) reduced people’s belief in it by about 20%—even among participants who claimed that their beliefs were important to their identity. 

Why this matters: The findings could represent an important step forward in how we engage with and educate people who espouse such baseless theories, says Yunhao (Jerry) Zhang, a postdoc fellow affiliated with the Psychology of Technology Institute who studies AI’s impacts on society. “They show that with the help of large language models, we can—I wouldn’t say solve it, but we can at least mitigate this problem,” he says. “It points out a way to make society better.” Read more from Rhiannon Williams here.

Bits and bytes

Google’s new tool lets large language models fact-check their responses

Called DataGemma, it uses two methods to help LLMs check their responses against reliable data and cite their sources more transparently to users. (MIT Technology Review)

Meet the radio-obsessed civilian shaping Ukraine’s drone defense 

Since Russia’s invasion, Serhii “Flash” Beskrestnov has become an influential, if sometimes controversial, force—sharing expert advice and intel on the ever-evolving technology that’s taken over the skies. His work may determine the future of Ukraine, and wars far beyond it. (MIT Technology Review)

Tech companies have joined a White House commitment to prevent AI-generated sexual abuse imagery

The pledges, signed by firms like OpenAI, Anthropic, and Microsoft, aim to “curb the creation of image-based sexual abuse.” The companies promise to set limits on what models will generate and to remove nude images from training data sets where possible.  (Fortune)

OpenAI is now valued at $150 billion

The valuation arose out of talks it’s currently engaged in to raise $6.5 billion. Given that OpenAI is becoming increasingly costly to operate, and could lose as much as $5 billion this year, it’s tricky to see how it all adds up. (The Information)