Why It Matters Archives Why It Matters

App Artificial intelligence Why It Matters

Feb 20 2026

Microsoft has a new plan to prove what’s real and what’s AI online

AI-enabled deception now permeates our online lives. There are the high-profile cases you may easily spot, like when White House officials recently shared a manipulated image of a protester in Minnesota and then mocked those asking about it. Other times, it slips quietly into social media feeds and racks up views, like the videos that Russian influence campaigns are currently spreading to discourage Ukrainians from enlisting.

It is into this mess that Microsoft has put forward a blueprint, shared with MIT Technology Review, for how to prove what’s real online.

An AI safety research team at the company recently evaluated how methods for documenting digital manipulation are faring against today’s most worrying AI developments, like interactive deepfakes and widely accessible hyperrealistic models. It then recommended technical standards that can be adopted by AI companies and social media platforms.

To understand the gold standard that Microsoft is pushing, imagine you have a Rembrandt painting and you are trying to document its authenticity. You might describe its provenance with a detailed manifest of where the painting came from and all the times it changed hands. You might apply a watermark that would be invisible to humans but readable by a machine. And you could digitally scan the painting and generate a mathematical signature, like a fingerprint, based on the brush strokes. If you showed the piece at a museum, a skeptical visitor could then examine these proofs to verify that it’s an original.

All of these methods are already being used to varying degrees in the effort to vet content online. Microsoft evaluated 60 different combinations of them, modeling how each setup would hold up under different failure scenarios—from metadata being stripped to content being slightly altered or deliberately manipulated. The team then mapped which combinations produce sound results that platforms can confidently show to people online, and which ones are so unreliable that they may cause more confusion than clarification.

The company’s chief scientific officer, Eric Horvitz, says the work was prompted by legislation—like California’s AI Transparency Act, which will take effect in August—and the speed at which AI has developed to combine video and voice with striking fidelity.

“You might call this self-regulation,” Horvitz told MIT Technology Review. But it’s clear he sees pursuing the work as boosting Microsoft’s image: “We’re also trying to be a selected, desired provider to people who want to know what’s going on in the world.”

Nevertheless, Horvitz declined to commit to Microsoft using its own recommendation across its platforms. The company sits at the center of a giant AI content ecosystem: It runs Copilot, which can generate images and text; it operates Azure, the cloud service through which customers can access OpenAI and other major AI models; it owns LinkedIn, one of the world’s largest professional platforms; and it holds a significant stake in OpenAI. But when asked about in-house implementation, Horvitz said in a statement, “Product groups and leaders across the company were involved in this study to inform product road maps and infrastructure, and our engineering teams are taking action on the report’s findings.”

It’s important to note that there are inherent limits to these tools; just as they would not tell you what your Rembrandt means, they are not built to determine if content is accurate or not. They only reveal if it has been manipulated. It’s a point that Horvitz says he has to make to lawmakers and others who are skeptical of Big Tech as an arbiter of fact.

“It’s not about making any decisions about what’s true and not true,” he said. “It’s about coming up with labels that just tell folks where stuff came from.”

Hany Farid, a professor at UC Berkeley who specializes in digital forensics but wasn’t involved in the Microsoft research, says that if the industry adopted the company’s blueprint, it would be meaningfully more difficult to deceive the public with manipulated content. Sophisticated individuals or governments can work to bypass such tools, he says, but the new standard could eliminate a significant portion of misleading material.

“I don’t think it solves the problem, but I think it takes a nice big chunk out of it,” he says.

Still, there are reasons to see Microsoft’s approach as an example of somewhat naïve techno-optimism. There is growing evidence that people are swayed by AI-generated content even when they know that it is false. And in a recent study of pro-Russian AI-generated videos about the war in Ukraine, comments pointing out that the videos were made with AI received far less engagement than comments treating them as genuine.

“Are there people who, no matter what you tell them, are going to believe what they believe?” Farid asks. “Yes.” But, he adds, “there are a vast majority of Americans and citizens around the world who I do think want to know the truth.”

That desire has not exactly led to urgent action from tech companies. Google started adding a watermark to content generated by its AI tools in 2023, which Farid says has been helpful in his investigations. Some platforms use C2PA, a provenance standard Microsoft helped launch in 2021. But the full suite of changes that Microsoft suggests, powerful as they are, might remain only suggestions if they threaten the business models of AI companies or social media platforms.

“If the Mark Zuckerbergs and the Elon Musks of the world think that putting ‘AI generated’ labels on something will reduce engagement, then of course they’re incentivized not to do it,” Farid says. Platforms like Meta and Google have already said they’d include labels for AI-generated content, but an audit conducted by Indicator last year found that only 30% of its test posts on Instagram, LinkedIn, Pinterest, TikTok, and YouTube were correctly labeled as AI-generated.

More forceful moves toward content verification might come from the many pieces of AI regulation pending around the world. The European Union’s AI Act, as well as proposed rules in India and elsewhere, would all compel AI companies to require some form of disclosure that a piece of content was generated with AI.

One priority from Microsoft is, unsurprisingly, to play a role in shaping these rules. The company waged a lobbying effort during the drafting of California’s AI Transparency Act, which Horvitz said made the legislation’s requirements on how tech companies must disclose AI-generated content “a bit more realistic.”

But another is a very real concern about what could happen if the rollout of such content-verification technology is done poorly. Lawmakers are demanding tools that can verify what’s real, but the tools are fragile. If labeling systems are rushed out, inconsistently applied, or frequently wrong, people could come to distrust them altogether, and the entire effort would backfire. That’s why the researchers argue that it may be better in some cases to show nothing at all than a verdict that could be wrong.

Inadequate tools could also create new avenues for what the researchers call sociotechnical attacks. Imagine that someone takes a real image of a fraught political event and uses an AI tool to change only an inconsequential share of pixels in the image. When it spreads online, it could be misleadingly classified by platforms as AI-manipulated. But combining provenance and watermark tools would mean platforms could clarify that the content was only partially AI generated, and point out where the changes were made.

California’s AI Transparency Act will be the first major test of these tools in the US, but enforcement could be challenged by President Trump’s executive order from late last year seeking to curtail state AI regulations that are “burdensome” to the industry. The administration has also generally taken a posture against efforts to curb disinformation, and last year, via DOGE, it canceled grants related to misinformation. And, of course, official government channels in the Trump administration have shared content manipulated with AI (MIT Technology Review reported that the Department of Homeland Security, for example, uses video generators from Google and Adobe to make content it shares with the public).

I asked Horvitz whether fake content from this source worries him as much as that coming from the rest of social media. He initially declined to comment, but then he said, “Governments have not been outside the sectors that have been behind various kinds of manipulative disinformation, and this is worldwide.”

Ecommerce MGMT 0 Comments

App Climate change and energy Electric Vehicles Why It Matters

Feb 12 2026

EVs could be cheaper to own than gas cars in Africa by 2040

Electric vehicles could be economically competitive in Africa sooner than expected. Just 1% of new cars sold across the continent in 2025 were electric, but a new analysis finds that with solar off-grid charging, EVs could be cheaper to own than gas vehicles by 2040.

There are major barriers to higher EV uptake in many countries in Africa, including a sometimes unreliable grid, limited charging infrastructure, and a lack of access to affordable financing. As a result some previous analyses have suggested that fossil-fuel vehicles would dominate in Africa through at least 2050.

But as batteries and the vehicles they power continue to get cheaper, the economic case for EVs is building. Electric two-wheelers, cars, larger automobiles, and even minibuses could compete in most African countries in just 15 years, according to the new study, published in Nature Energy.

“EVs have serious economic potential in most African countries in the not-so-distant future,” says Bessie Noll, a senior researcher at ETH Zürich and one of the authors of the study.

The study considered the total cost of ownership over the lifetime of a vehicle. That includes the sticker price, financing costs, and the cost of fueling (or charging). The researchers didn’t consider policy-related costs like taxes, import fees, and government subsidies, choosing to focus instead on only the underlying economics.

EVs are getting cheaper every year as battery and vehicle manufacturing improve and production scales, and the researchers found that in most cases and in most places across Africa, EVs are expected to be cheaper than equivalent gas-powered vehicles by 2040. EVs should also be less expensive than vehicles that use synthetic fuels.

For two-wheelers like electric scooters, EVs could be the cheaper option even sooner: with smaller, cheaper batteries, these vehicles will be economically competitive by the end of the decade. On the other hand, one of the most difficult segments for EVs to compete in is small cars, says Christian Moretti, a researcher at ETH Zürich and the Paul Scherrer Institute in Switzerland.

Because some countries still have limited or unreliable grid access, charging is a major barrier to EV uptake, Noll says. So for EVs, the authors analyzed the cost of buying not only the vehicle but also a solar off-grid charging system. This includes solar panels, batteries, and the inverter required to transform the electricity into a version that can charge an EV. (The additional batteries help the system store energy for charging at times when the sun isn’t shining.)

Mini grids and other standalone systems that include solar panels and energy storage are increasingly common across Africa. It’s possible that this might be a primary way that EV owners in Africa will charge their vehicles in the future, Noll says.

One of the bigger barriers to EVs in Africa is financing costs, she adds. In some cases, the cost of financing can be more than the up-front cost of the vehicle, significantly driving up the cost of ownership.

Today, EVs are more expensive than equivalent gas-powered vehicles in much of the world. But in places where it’s relatively cheap to borrow money, that difference can be spread out across the course of a vehicle’s whole lifetime for little cost. Then, since it’s often cheaper to charge an EV than fuel a gas-powered car, the EV is less expensive over time.

In some African countries, however, political instability and uncertain economic conditions make borrowing money more expensive. To some extent, the high financing costs affect the purchase of any vehicle, regardless of how it’s powered. But EVs are more expensive up front than equivalent gas-powered cars, and that higher up-front cost adds up to more interest paid over time. In some cases, financing an EV can also be more expensive than financing a gas vehicle—the technology is newer, and banks may see the purchase as more of a risk and charge a higher interest rate, says Kelly Carlin, a manager in the program on carbon-free transportation at the Rocky Mountain Institute, an energy think tank.

The picture varies widely depending on the country, too. In South Africa, Mauritius, and Botswana, financing conditions are already close to levels required to allow EVs to reach cost parity, according to the study. In higher-risk countries (the study gives examples including Sudan, which is currently in a civil war, and Ghana, which is recovering from a major economic crisis), financing costs would need to be cut drastically for that to be the case.

Making EVs an affordable option will be a key first step to putting more on the roads in Africa and around the world. “People will start to pick up these technologies when they’re competitive,” says Nelson Nsitem, lead Africa energy transition analyst at BloombergNEF, an energy consultancy.

Solar-based charging systems, like the ones mentioned in the study, could help make electricity less of a constraint, bringing more EVs to the roads, Nsitem says. But there’s still a need for more charging infrastructure, a major challenge in many countries where the grid needs major upgrades for capacity and reliability, he adds.

Globally, more EVs are hitting the roads every year. “The global trend is unmistakable,” Carlin says. There are questions about how quickly it’s happening in different places, he says, “but the momentum is there.”

Ecommerce MGMT 0 Comments

App Climate change and energy What's Next in Tech Why It Matters

Feb 3 2026

What’s next for EV batteries in 2026

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

Demand for electric vehicles and the batteries that power them has never been hotter.

In 2025, EVs made up over a quarter of new vehicle sales globally, up from less than 5% in 2020. Some regions are seeing even higher uptake: In China, more than 50% of new vehicle sales last year were battery electric or plug-in hybrids. In Europe, more purely electric vehicles hit the roads in December than gas-powered ones. (The US is the notable exception here, dragging down the global average with a small sales decline from 2024.)

As EVs become increasingly common on the roads, the battery world is growing too. Looking ahead, we could soon see wider adoption of new chemistries, including some that deliver lower costs or higher performance. Meanwhile, the geopolitics of batteries are shifting, and so is the policy landscape. Here’s what’s coming next for EV batteries in 2026 and beyond.

A big opportunity for sodium-ion batteries

Lithium-ion batteries are the default chemistry used in EVs, personal devices, and even stationary storage systems on the grid today. But in a tough environment in some markets like the US, there’s a growing interest in cheaper alternatives. Automakers right now largely care just about batteries’ cost, regardless of performance improvements, says Kara Rodby, a technical principal at Volta Energy Technologies, a venture capital firm that focuses on energy storage technology.

Sodium-ion cells have long been held up as a potentially less expensive alternative to lithium. The batteries are limited in their energy density, so they deliver a shorter range than lithium-ion. But sodium is also more abundant, so they could be cheaper.

Sodium’s growth has been cursed, however, by the very success of lithium-based batteries, says Shirley Meng, a professor of molecular engineering at the University of Chicago. A lithium-ion battery cell cost $568 per kilowatt-hour in 2013, but that cost had fallen to just $74 per kilowatt-hour by 2025—quite the moving target for cheaper alternatives to chase.

Sodium-ion batteries currently cost about $59 per kilowatt-hour on average. That’s less expensive than the average lithium-ion battery. But if you consider only lithium iron phosphate (LFP) cells, a lower-end type of lithium-ion battery that averages $52 per kilowatt-hour, sodium is still more expensive today.

We could soon see an opening for sodium-batteries, though. Lithium prices have been ticking up in recent months, a shift that could soon slow or reverse the steady downward march of prices for lithium-based batteries.

Sodium-ion batteries are already being used commercially, largely for stationary storage on the grid. But we’re starting to see sodium-ion cells incorporated into vehicles, too. The Chinese companies Yadea, JMEV, and HiNa Battery have all started producing sodium-ion batteries in limited numbers for EVs, including small, short-range cars and electric scooters that don’t require a battery with high energy density. CATL, a Chinese battery company that’s the world’s largest, says it recently began producing sodium-ion cells. The company plans to launch its first EV using the chemistry by the middle of this year.

Today, both production and demand for sodium-ion batteries are heavily centered in China. That’s likely to continue, especially after a cutback in tax credits and other financial support for the battery and EV industries in the US. One of the biggest sodium-battery companies in the US, Natron, ceased operations last year after running into funding issues.

We could also see progress in sodium-ion research: Companies and researchers are developing new materials for components including the electrolyte and electrodes, so the cells could get more comparable to lower-end lithium-ion cells in terms of energy density, Meng says.

Major tests for solid-state batteries

As we enter the second half of this decade, many eyes in the battery world are on big promises and claims about solid-state batteries.

These batteries could pack more energy into a smaller package by removing the liquid electrolyte, the material that ions move through when a battery is charging and discharging. With a higher energy density, they could unlock longer-range EVs.

Companies have been promising solid-state batteries for years. Toyota, for example, once planned to have them in vehicles by 2020. That timeline has been delayed several times, though the company says it’s now on track to launch the new cells in cars in 2027 or 2028.

Historically, battery makers have struggled to produce solid-state batteries at the scale needed to deliver a commercially relevant supply for EVs. There’s been progress in manufacturing techniques, though, and companies could soon actually make good on their promises, Meng says.

Factorial Energy, a US-based company making solid-state batteries, provided cells for a Mercedes test vehicle that drove over 745 miles on a single charge in a real-world test in September. The company says it plans to bring its tech to market as soon as 2027. Quantumscape, another major solid-state player in the US, is testing its cells with automotive partners and plans to have its batteries in commercial production later this decade.

Before we see true solid-state batteries, we could see hybrid technologies, often referred to as semi-solid-state batteries. These commonly use materials like gel electrolytes, reducing the liquid inside cells without removing it entirely. Many Chinese companies are looking to build semi-solid-state batteries before transitioning to entirely solid-state ones, says Evelina Stoikou, head of battery technologies and supply chains at BloombergNEF, an energy consultancy.

A global patchwork

The picture for the near future of the EV industry looks drastically different depending on where you’re standing.

Last year, China overtook Japan as the country with the most global auto sales. And more than one in three EVs made in 2025 had a CATL battery in it. Simply put, China is dominating the global battery industry, and that doesn’t seem likely to change anytime soon.

China’s influence outside its domestic market is growing especially quickly. CATL is expected to begin production this year at its second European site; the factory, located in Hungary, is an $8.2 billion project that will supply automakers including BMW and the Mercedes-Benz group. Canada recently signed a deal that will lower the import tax on Chinese EVs from 100% to roughly 6%, effectively opening the Canadian market for Chinese EVs.

Some countries that haven’t historically been major EV markets could become bigger players in the second half of the decade. Annual EV sales in Thailand and Vietnam, where the market was virtually nonexistent just a few years ago, broke 100,000 in 2025. Brazil, in particular, could see its new EV sales more than double in 2026 as major automakers including Volkswagen and BYD set up or ramp up production in the country.

On the flip side, EVs are facing a real test in 2026 in the US, as this will be the first calendar year after the sunset of federal tax credits that were designed to push more drivers to purchase the vehicles. With those credits gone, growth in sales is expected to continue lagging.

One bright spot for batteries in the US is outside the EV market altogether. Battery manufacturers are starting to produce low-cost LFP batteries in the US, largely for energy storage applications. LG opened a massive factory to make LFP batteries in mid-2025 in Michigan, and the Korean battery company SK On plans to start making LFP batteries at its facility in Georgia later this year. Those plants could help battery companies cash in on investments as the US EV market faces major headwinds.

Even as the US lags behind, the world is electrifying transportation. By 2030, 40% of new vehicles sold around the world are projected to be electric. As we approach that milestone, expect to see more global players, a wider selection of EVs, and an even wider menu of batteries to power them.

Ecommerce MGMT 0 Comments

App biotech Biotechnology and health Why It Matters

Jan 28 2026

The first human test of a rejuvenation method will begin “shortly”

When Elon Musk was at Davos last week, an interviewer asked him if he thought aging could be reversed. Musk said he hasn’t put much time into the problem but suspects it is “very solvable” and that when scientists discover why we age, it’s going to be something “obvious.”

Not long after, the Harvard professor and life-extension evangelist David Sinclair jumped into the conversation on X to strongly agree with the world’s richest man. “Aging has a relatively simple explanation and is apparently reversible,” wrote Sinclair. “Clinical Trials begin shortly.”

“ER-100?” Musk asked.

“Yes” replied Sinclair.

ER-100 turns out to be the code name of a treatment created by Life Biosciences, a small Boston startup that Sinclair cofounded and which he confirmed today has won FDA approval to proceed with the first targeted attempt at age reversal in human volunteers.

The company plans to try to treat eye disease with a radical rejuvenation concept called “reprogramming” that has recently attracted hundreds of millions in investment for Silicon Valley firms like Altos Labs, New Limit, and Retro Biosciences, backed by many of the biggest names in tech.

The technique attempts to restore cells to a healthier state by broadly resetting their epigenetic controls—switches on our genes that determine which are turned on and off.

“Reprogramming is like the AI of the bio world. It’s the thing everyone is funding,” says Karl Pfleger, an investor who backs a smaller UK startup, Shift Bioscience. He says Sinclair’s company has recently been seeking additional funds to keep advancing its treatment.

Reprogramming is so powerful that it sometimes creates risks, even causing cancer in lab animals, but the version of the technique being advanced by Life Biosciences passed initial safety tests in animals.

But it’s still very complex. The trial will initially test the treatment on about a dozen patients with glaucoma, a condition where high pressure inside the eye damages the optic nerve. In the tests, viruses carrying three powerful reprogramming genes will be injected into one eye of each patient, according to a description of the study first posted in December.

To help make sure the process doesn’t go too far, the reprogramming genes will be under the control of a special genetic switch that turns them on only while the patients take a low dose of the antibiotic doxycycline. Initially, they will take the antibiotic for about two months while the effects are monitored.

Executives at the company have said for months that a trial could begin this year, sometimes characterizing it as a starting bell for a new era of age reversal. “It’s an incredibly big deal for us as an industry,” Michael Ringel, chief operating officer at Life Biosciences, said at an event this fall. “It’ll be the first time in human history, in the millennia of human history, of looking for something that rejuvenates … So watch this space.”

The technology is based on the Nobel Prize–winning discovery, 20 years ago, that introducing a few potent genes into a cell will cause it to turn back into a stem cell, just like those found in an early embryo that develop into the different specialized cell types. These genes, known as Yamanaka factors, have been likened to a “factory reset” button for cells.

But they’re dangerous, too. When turned on in a living animal, they can cause an eruption of tumors.

That is what led scientists to a new idea, termed “partial” or “transient” reprogramming. The idea is to limit exposure to the potent genes—or use only a subset of them—in the hope of making cells act younger without giving them complete amnesia about what their role in the body is.

In 2020, Sinclair claimed that such partial reprogramming could restore vision to mice after their optic nerves were smashed, saying there was even evidence that the nerves regrew. His report appeared on the cover of the influential journal Nature alongside the headline “Turning Back Time.”

Not all scientists agree that reprogramming really counts as age reversal. But Sinclair has doubled down. He’s been advancing the theory that the gradual loss of correct epigenetic information in our cells is, in fact, the ultimate cause of aging—just the kind of root cause that Musk was alluding to.

“Elon does seem to be paying attention to the field and [is] seemingly in sync with [my theory],” Sinclair said in an email.

Reprogramming isn’t the first longevity fix championed by Sinclair, who’s written best-selling books and commands stratospheric fees on the longevity lecture circuit. Previously, he touted the longevity benefits of molecules called sirtuins as well as resveratrol, a molecule found in red wine. But some critics say he greatly exaggerates scientific progress, pushback that culminated in a 2024 Wall Street Journal story that dubbed him a “reverse-aging guru” whose companies “have not panned out.”

Life Biosciences has been among those struggling companies. Initially formed in 2017, it at first had a strategy of launching subsidiaries, each intended to pursue one aspect of the aging problem. But after these made limited progress, in 2021 it hired a new CEO, Jerry McLaughlin, who has refocused its efforts on Sinclair’s mouse vision results and the push toward a human trial.

The company has discussed the possibility of reprogramming other organs, including the brain. And Ringel, like Sinclair, entertains the idea that someday even whole-body rejuvenation might be feasible. But for now, it’s better to think of the study as a proof of concept that’s still far from a fountain of youth. “The optimistic case is this solves some blindness for certain people and catalyzes work in other indications,” says Pfleger, the investor. “It’s not like your doctor will be writing a prescription for a pill that will rejuvenate you.”

Life’s treatment also relies on an antibiotic switching mechanism that, while often used in lab animals, hasn’t been tried in humans before. Since the switch is built from gene components taken from E. coli and the herpes virus, it’s possible that it could cause an immune reaction in humans, scientists say.

“I was always thinking that for widespread use you might need a different system,” says Noah Davidsohn, who helped Sinclair implement the technique and is now chief scientist at a different company, Rejuvenate Bio. And Life’s choice of reprogramming factors—it’s picked three, which go by the acronym OSK—may also be risky. They are expected to turn on hundreds of other genes, and in some circumstances the combination can cause cells to revert to a very primitive, stem-cell-like state.

Other companies studying reprogramming say their focus is on researching which genes to use, in order to achieve time reversal without unwanted side effects. New Limit, which has been carrying out an extensive search for such genes, says it won’t be ready for a human study for two years. At Shift, experiments on animals are only beginning now.

“Are their factors the best version of rejuvenation? We don’t think they are. I think they are working with what they’ve got,” Daniel Ives, the CEO of Shift, says of Life Biosciences. “But I think they’re way ahead of anybody else in terms of getting into humans. They have found a route forward in the eye, which is a nice self-contained system. If it goes wrong, you’ve still got one left.”

Ecommerce MGMT 0 Comments

App Artificial intelligence Why It Matters

Jan 27 2026

Inside OpenAI’s big play for science

In the three years since ChatGPT’s explosive debut, OpenAI’s technology has upended a remarkable range of everyday activities at home, at work, in schools—anywhere people have a browser open or a phone out, which is everywhere.

Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them.

The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community.

And yet OpenAI is also late to the party. Google DeepMind, the rival firm behind groundbreaking scientific models such as AlphaFold and AlphaEvolve, has had an AI-for-science team for years. (When I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that team, he told me: “This is the reason I started DeepMind … In fact, it’s why I’ve worked my whole career in AI.”)

So why now? How does a push into science fit with OpenAI’s wider mission? And what exactly is the firm hoping to achieve?

I put these questions to Kevin Weil, a vice president at OpenAI who leads the new OpenAI for Science team, in an exclusive interview last week.

On mission

Weil is a product guy. He joined OpenAI a couple of years ago as chief product officer after being head of product at Twitter and Instagram. But he started out as a scientist. He got two-thirds of the way through a PhD in particle physics at Stanford University before ditching academia for the Silicon Valley dream. Weil is keen to highlight his pedigree: “I thought I was going to be a physics professor for the rest of my life,” he says. “I still read math books on vacation.”

Asked how OpenAI for Science fits with the firm’s existing lineup of white-collar productivity tools or the viral video app Sora, Weil recites the company mantra: “The mission of OpenAI is to try and build artificial general intelligence and, you know, make it beneficial for all of humanity.”

Just imagine the future impact this technology could have on science he says: New medicines, new materials, new devices. “Think about it helping us understand the nature of reality, helping us think through open problems. Maybe the biggest, most positive impact we’re going to see from AGI will actually be from its ability to accelerate science.”

He adds: “With GPT-5, we saw that becoming possible.”

As Weil tells it, LLMs are now good enough to be useful scientific collaborators. They can spitball ideas, suggest novel directions to explore, and find fruitful parallels between new problems and old solutions published in obscure journals decades ago or in foreign languages.

That wasn’t the case a year or so ago. Since it announced its first so-called reasoning model—a type of LLM that can break down problems into multiple steps and work through them one by one—in December 2024, OpenAI has been pushing the envelope of what the technology can do. Reasoning models have made LLMs far better at solving math and logic problems than they used to be. “You go back a few years and we were all collectively mind-blown that the models could get an 800 on the SAT,” says Weil.

But soon LLMs were acing math competitions and solving graduate-level physics problems. Last year, OpenAI and Google DeepMind both announced that their LLMs had achieved gold-medal-level performance in the International Math Olympiad, one of the toughest math contests in the world. “These models are no longer just better than 90% of grad students,” says Weil. “They’re really at the frontier of human abilities.”

That’s a huge claim, and it comes with caveats. Still, there’s no doubt that GPT-5, which includes a reasoning model, is a big improvement on GPT-4 when it comes to complicated problem-solving. Measured against an industry benchmark known as GPQA, which includes more than 400 multiple-choice questions that test PhD-level knowledge in biology, physics, and chemistry, GPT-4 scores 39%, well below the human-expert baseline of around 70%. According to OpenAI, GPT-5.2 (the latest update to the model, released in December) scores 92%.

Overhyped

The excitement is evident—and perhaps excessive. In October, senior figures at OpenAI, including Weil, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians were quick to point out that in fact what GPT-5 appeared to have done was dig up existing solutions in old research papers, including at least one written in German. That was still useful, but it wasn’t the achievement OpenAI seemed to have claimed. Weil and his colleagues deleted their posts.

Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don’t spend time struggling on a problem that is already solved, that’s an acceleration all of its own.”

He plays down the idea that LLMs are about to come up with a game-changing new discovery. “I don’t think models are there yet,” he says. “Maybe they’ll get there. I’m optimistic that they will.”

But, he insists, that’s not the mission: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”

For Weil, the question is this: “Does science actually happen faster because scientists plus models can do much more, and do it more quickly, than scientists alone? I think we’re already seeing that.”

In November, OpenAI published a series of anecdotal case studies contributed by scientists, both inside and outside the company, that illustrated how they had used GPT-5 and how it had helped. “Most of the cases were scientists that were already using GPT-5 directly in their research and had come to us one way or another saying, ‘Look at what I’m able to do with these tools,’” says Weil.

The key things that GPT-5 seems to be good at are finding references and connections to existing work that scientists were not aware of, which sometimes sparks new ideas; helping scientists sketch mathematical proofs; and suggesting ways for scientists to test hypotheses in the lab.

“GPT 5.2 has read substantially every paper written in the last 30 years,” says Weil. “And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields.”

“That’s incredibly powerful,” he continues. “You can always find a human collaborator in an adjacent field, but it’s difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter. And in addition to that, I can work with the model late at night—it doesn’t sleep—and I can ask it 10 things in parallel, which is kind of awkward to do to a human.”

Solving problems

Most of the scientists OpenAI reached out to back up Weil’s position.

Robert Scherrer, a professor of physics and astronomy at Vanderbilt University, only played around with ChatGPT for fun (“I used to it rewrite the theme song for Gilligan’s Island in the style of Beowulf, which it did very well,” he tells me) until his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, told him that GPT-5 had helped solve a problem he’d been working on.

Lupsasca gave Scherrer access to GPT-5 Pro, OpenAI’s $200-a-month premium subscription. “It managed to solve a problem that I and my graduate student could not solve despite working on it for several months,” says Scherrer.

It’s not perfect, he says: “GTP-5 still makes dumb mistakes. Of course, I do too, but the mistakes GPT-5 makes are even dumber.” And yet it keeps getting better, he says: “If current trends continue—and that’s a big if—I suspect that all scientists will be using LLMs soon.”

Derya Unutmaz, a professor of biology at the Jackson Laboratory, a nonprofit research institute, uses GPT-5 to brainstorm ideas, summarize papers, and plan experiments in his work studying the immune system. In the case study he shared with OpenAI, Unutmaz used GPT-5 to analyze an old data set that his team had previously looked at. The model came up with fresh insights and interpretations.

“LLMs are already essential for scientists,” he says. “When you can complete analysis of data sets that used to take months, not using them is not an option anymore.”

Nikita Zhivotovskiy, a statistician at the University of California, Berkeley, says he has been using LLMs in his research since the first version of ChatGPT came out.

Like Scherrer, he finds LLMs most useful when they highlight unexpected connections between his own work and existing results he did not know about. “I believe that LLMs are becoming an essential technical tool for scientists, much like computers and the internet did before,” he says. “I expect a long-term disadvantage for those who do not use them.”

But he does not expect LLMs to make novel discoveries anytime soon. “I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own,” he says. “So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches.”

I also contacted a handful of scientists who are not connected to OpenAI.

Andy Cooper, a professor of chemistry at the University of Liverpool and director of the Leverhulme Research Centre for Functional Materials Design, is less enthusiastic. “We have not found, yet, that LLMs are fundamentally changing the way that science is done,” he says. “But our recent results suggest that they do have a place.”

Cooper is leading a project to develop a so-called AI scientist that can fully automate parts of the scientific workflow. He says that his team doesn’t use LLMs to come up with ideas. But the tech is starting to prove useful as part of a wider automated system where an LLM can help direct robots, for example.

“My guess is that LLMs might stick more in robotic workflows, at least initially, because I’m not sure that people are ready to be told what to do by an LLM,” says Cooper. “I’m certainly not.”

Making errors

LLMs may be becoming more and more useful, but caution is still key. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that had made its way into a scientific journal. “OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea—possibly the first peer-reviewed paper where an LLM generated the core contribution,” Oppenheim posted on X. “One small problem: GPT-5’s idea tests the wrong thing.”

He continued: “GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. Related-sounding, but different. It’s like asking for a COVID test, and the LLM cheerfully hands you a test for chickenpox.”

It is clear that a lot of scientists are finding innovative and intuitive ways to engage with LLMs. It is also clear that the technology makes mistakes that can be so subtle even experts miss them.

Part of the problem is the way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core issue is that LLMs are being trained to validate the user, while science needs tools that challenge us.” In an extreme case, one individual (who was not a scientist) was persuaded by ChatGPT into thinking for months that he’d invented a new branch of mathematics.

Of course, Weil is well aware of the problem of hallucination. But he insists that newer models are hallucinating less and less. Even so, focusing on hallucination might be missing the point, he says.

“One of my teammates here, an ex math professor, said something that stuck with me,” says Weil. “He said: ‘When I’m doing research, if I’m bouncing ideas off a colleague, I’m wrong 90% of the time and that’s kind of the point. We’re both spitballing ideas and trying to find something that works.’”

“That’s actually a desirable place to be,” says Weil. “If you say enough wrong things and then somebody stumbles on a grain of truth and then the other person seizes on it and says, ‘Oh, yeah, that’s not quite right, but what if we—’ You gradually kind of find your trail through the woods.”

This is Weil’s core vision for OpenAI for Science. GPT-5 is good, but it is not an oracle. The value of this technology is in pointing people in new directions, not coming up with definitive answers, he says.

In fact, one of the things OpenAI is now looking at is making GPT-5 dial down its confidence when it delivers a response. Instead of saying Here’s the answer, it might tell scientists: Here’s something to consider.

“That’s actually something that we are spending a bunch of time on,” says Weil. “Trying to make sure that the model has some sort of epistemological humility.”

Watching the watchers

Another thing OpenAI is looking at is how to use GPT-5 to fact-check GPT-5. It’s often the case that if you feed one of GPT-5’s answers back into the model, it will pick it apart and highlight mistakes.

“You can kind of hook the model up as its own critic,” says Weil. “Then you can get a workflow where the model is thinking and then it goes to another model, and if that model finds things that it could improve, then it passes it back to the original model and says, ‘Hey, wait a minute—this part wasn’t right, but this part was interesting. Keep it.’ It’s almost like a couple of agents working together and you only see the output once it passes the critic.”

What Weil is describing also sounds a lot like what Google DeepMind did with AlphaEvolve, a tool that wrapped the firms LLM, Gemini, inside a wider system that filtered out the good responses from the bad and fed them back in again to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.

OpenAI faces stiff competition from rival firms, whose own LLMs can do most, if not all, of the things it claims for its own models. If that’s the case, why should scientists use GPT-5 instead of Gemini or Anthropic’s Claude, families of models that are themselves improving every year? Ultimately, OpenAI for Science may be as much an effort to plant a flag in new territory as anything else. The real innovations are still to come.

“I think 2026 will be for science what 2025 was for software engineering,” says Weil. “At the beginning of 2025, if you were using AI to write most of your code, you were an early adopter. Whereas 12 months later, if you’re not using AI to write most of your code, you’re probably falling behind. We’re now seeing those same early flashes for science as we did for code.”

He continues: “I think that in a year, if you’re a scientist and you’re not heavily using AI, you’ll be missing an opportunity to increase the quality and pace of your thinking.”

Ecommerce MGMT 0 Comments

App Business Why It Matters

Jan 15 2026

Data centers are amazing. Everyone hates them.

Behold, the hyperscale data center!

Massive structures, with thousands of specialized computer chips running in parallel to perform the complex calculations required by advanced AI models. A single facility can cover millions of square feet, built with millions of pounds of steel, aluminum, and concrete; feature hundreds of miles of wiring, connecting some hundreds of thousands of high-end GPU chips, and chewing through hundreds of megawatt-hours of electricity. These facilities run so hot from all that computing power that their cooling systems are triumphs of engineering complexity in themselves. But the star of the show are those chips with their advanced processors. A single chip in these vast arrays can cost upwards of $30,000. Racked together and working in concert, they process hundreds of thousands of tokens—the basic building blocks of an AI model—per second. Ooooomph.

Given the incredible amounts of capital that the world’s biggest companies have been pouring into building data centers you can make the case (and many people have) that their construction is single-handedly propping up the US stock market and the economy.

So important are they to our way of life that none other than the President of the United States himself, on his very first full day in office, stood side by side with the CEO of OpenAI to announce a $500 billion private investment in data center construction.

Truly, the hyperscale datacenter is a marvel of our age. A masterstroke of engineering across multiple disciplines. They are nothing short of a technological wonder.

People hate them.

People hate them in Virginia, which leads the nation in their construction. They hate them in Nevada, where they slurp up the state’s precious water. They hate them in Michigan, and Arizona, and South Dakota, where the good citizens of Sioux Falls hurled obscenities at their city councilmembers following a vote to permit a data center on the city’s northeastern side. They hate them all around the world, it’s true. But they really hate them in Georgia.

So, let’s go to Georgia. The purplest of purple states. A state with both woke liberal cities and MAGA magnified suburbs and rural areas. The state of Stacey Abrams and Newt Gingrich. If there is one thing just about everyone there seemingly agrees on, it’s that they’ve had it with data centers.

Last year, the state’s Public Service Commission election became unexpectedly tight, and wound up delivering a stunning upset to incumbent Republican commissioners. Although there were likely shades of national politics at play (voters favored Democrats in an election cycle where many things went that party’s way), the central issue was skyrocketing power bills. And that power bill inflation was oft-attributed to a data center building boom rivaled only by Virginia’s.

This boom did not come out of the blue. At one point, Georgia wanted data centers. Or at least, its political leadership did. In 2018 the state’s General Assembly passed legislation that provided data centers with tax breaks for their computer systems and cooling infrastructure, more tax breaks for job creation, and even more tax breaks for property taxes. And then… boom!

But things have not played out the way the Assembly and other elected officials may have expected.

Journey with me now to Bolingbroke, Georgia. Not far outside of Atlanta, in Monroe County (population 27,954), county commissioners were considering rezoning 900 acres of land to make room for a new data center near the town of Bolingbroke (population 492). Data centers have been popping up all across the state, but especially in areas close to Atlanta. Public opinion is, often enough, irrelevant. In nearby Twiggs County, despite strong and organized opposition, officials decided to allow a 300-acre data center to move forward. But at a packed meeting to discuss the Bolingbroke plans, some 900 people showed up to voice near unanimous opposition to the proposed data center, according to Macon, Georgia’s The Telegraph. Seeing which way the wind had blown, the Monroe county commission shot it down in August last year.

The would-be developers of the proposed site had claimed it would bring in millions of dollars for the county. That it would be hidden from view. That it would “uphold the highest environmental standards.” That it would bring jobs and prosperity. Yet still, people came gunning for it.

Why!? Data centers have been around for years. So why does everyone hate them all of the sudden?

What is it about these engineering marvels that will allow us to build AI that will cure all diseases, bring unprecedented prosperity, and even cheat death (if you believe what the AI sellers are selling) that so infuriates their prospective neighbors?

There are some obvious reasons. First is just the speed and scale of their construction, which has had effects on power grids. No one likes to see their power bills go up. The rate hikes that so incensed Georgians come as monthly reminders that the eyesore in your backyard profits California billionaires at your expense, on your grid. In Wyoming, for example, a planned Meta data center will require more electricity than every household in the state, combined. To meet demand for power-hungry data centers, utilities are adding capacity to the grid. But although that added capacity may benefit tech companies, the cost is shared by local consumers.

Similarly, there are environmental concerns. To meet their electricity needs, data centers often turn to dirty forms of energy. xAI, for example, famously threw a bunch of polluting methane-powered generators at its data center in Memphis. While nuclear energy is oft-bandied about as a greener solution, traditional plants can take a decade or more to build; even new and more nimble reactors will take years to come online. In addition, data centers often require massive amounts of water. But the amount can vary widely depending on the facility, and is often shrouded in secrecy. (A number of states are attempting to require facilities to disclose water usage.)

A different type of environmental consequence of data centers is that they are noisy. A low, constant, machine hum. Not just sometimes, but always. 24 hours a day. 365 days a year. “A highway that never stops.”

And as to the jobs they bring to communities. Well, I have some bad news there too. Once construction ends, they tend to employ very few people, especially for such resource-intensive facilities.

These are all logical reasons to oppose data centers. But I suspect there is an additional, emotional one. And it echoes one we’ve heard before.

More than a decade ago, the large tech firms of Silicon Valley began operating buses to ferry workers to their campuses from San Francisco and other Bay Area cities. Like data centers, these buses used shared resources such as public roads without, people felt, paying their fair share. Protests erupted. But while the protests were certainly about shared resource use, they were also about something much bigger.

Tech companies, big and small, were transforming San Francisco. The early 2010s were a time of rapid gentrification in the city. And what’s more, the tech industry itself was transforming society. Smartphones were newly ubiquitous. The way we interacted with the world was fundamentally changing, and people were, for the most part, powerless to do anything about it. You couldn’t stop Google.

But you could stop a Google bus.

You could stand in front of it and block its path. You could yell at the people getting on it. You could yell at your elected officials and tell them to do something. And in San Francisco, people did. The buses were eventually regulated.

The data center pushback has a similar vibe. AI, we are told, is transforming society. It is suddenly everywhere. Even if you opt not to use ChatGPT or Claude or Gemini, generative AI is increasingly built into just about every app and service you likely use. People are worried AI will harvest jobs in the coming years. Or even kill us all. And for what? So far, the returns have certainly not lived up to the hype.

You can’t stop Google. But maybe, just maybe, you can stop a Google data center.

Then again, maybe not. The tech buses in San Francisco, though regulated, remain commonplace. And the city is more gentrified than ever. Meanwhile, in Monroe County, life goes on. In October, Google confirmed it had purchased 950 acres of land just off the interstate. It plans to build a data center there.

Ecommerce MGMT 0 Comments

App Biotechnology and health Why It Matters

Jan 10 2026

A new CRISPR startup is betting regulators will ease up on gene-editing

Here at MIT Technology Review we’ve been writing about the gene-editing technology CRISPR since 2013, calling it the biggest biotech breakthrough of the century. Yet so far, there’s been only one gene-editing drug approved. It’s been used commercially on only about 40 patients, all with sickle-cell disease.

It’s becoming clear that the impact of CRISPR isn’t as big as we all hoped. In fact, there’s a pall of discouragement over the entire field—with some journalists saying the gene-editing revolution has “lost its mojo.”

So what will it take for CRISPR to help more people? A new startup says the answer could be an “umbrella approach” to testing and commercializing treatments. Aurora Therapeutics, which has $16 million from Menlo Ventures and counts CRISPR co-inventor Jennifer Doudna as an advisor, essentially hopes to win approval for gene-editing drugs that can be slightly adjusted, or personalized, without requiring costly new trials or approvals for every new version.

The need to change regulations around gene-editing treatments was endorsed in November by the head of the US Food and Drug Administration, Martin Makary, who said the agency would open a “new” regulatory pathway for “bespoke, personalized therapies” that can’t easily be tested in conventional ways.

Aurora’s first target, the rare inherited disease phenylketonuria, also known as PKU, is a case in point. People with PKU lack a working version of an enzyme needed to use up the amino acid phenylalanine, a component of pretty much all meat and protein. If the amino acid builds up, it causes brain damage. So patients usually go on an onerous “diet for life” of special formula drinks and vegetables.

In theory, gene editing can fix PKU. In mice, scientists have already restored the gene for the enzyme by rewriting DNA in liver cells, which both make the enzyme and are some of the easiest to reach with a gene-editing drug. The problem is that in human patients, many different mutations can affect the critical gene. According to Cory Harding, a researcher at Oregon Health Sciences University, scientists know about 1,600 different DNA mutations that cause PKU.

There’s no way anyone will develop 1,600 different gene-editing drugs. Instead, Aurora’s goal is to eventually win approval for a single gene editor that, with minor adjustments, could be used to correct several of the most common mutations, including one that’s responsible for about 10% of the estimated 20,000 PKU cases in the US.

“We can’t have a separate clinical trial for each mutation,” says Edward Kaye, the CEO of Aurora. “The way the FDA approves gene editing has to change, and I think they’ve been very understanding that is the case.”

A gene editor is a special protein that can zero in on a specific location in the genome and change it. To prepare one, Aurora will put genetic code for the editor into a nanoparticle along with a targeting molecule. In total, it will involve about 5,000 gene letters. But only 20 of them need to change in order to redirect the treatment to repair a different mutation.

“Over 99% of the drug stays the same,” says Johnny Hu, a partner at Menlo Ventures, which put up the funding for the startup.

The new company came together after Hu met over pizza with Fyodor Urnov, an outspoken gene-editing scientist at the University of California, Berkeley, who is Aurora’s cofounder and sits on its board.

In 2022, Urnov had written a New York Times editorial bemoaning the “chasm” between what editing technology can do and the “legal, financial, and organizational” realities preventing researchers from curing people.

“I went to Fyodor and said, ‘Hey, we’re getting all these great results in the clinic with CRISPR, but why hasn’t it scaled?” says Hu. Part of the reason is that most gene-editing companies are chasing the same few conditions, such as sickle-cell, where (as luck would have it) a single edit works for all patients. But that leaves around 400 million people who have 7,000 other inherited conditions without much hope to get their DNA fixed, Urnov estimated in his editorial.

Then, last May, came the dramatic demonstration of the first fully “personalized” gene-editing treatment. A team in Philadelphia, assisted by Urnov and others, succeeded in correcting the DNA of a baby, named KJ Muldoon, who had an entirely unique mutation that caused a metabolic disease. Though it didn’t target PKU, the project showed that gene editing could theoretically fix some inherited diseases “on demand.”

It also underscored a big problem. Treating a single child required a large team and cost millions in time, effort, and materials—all to create a drug that would never be used again.

That’s exactly the sort of situation the new “umbrella” trials are supposed to address. Kiran Musunuru, who co-led the team at the University of Pennsylvania, says he’s been in discussions with the FDA to open a study of bespoke gene editors this year focusing on diseases of the type Baby KJ had, called urea cycle disorders. Each time a new patient appears, he says, they’ll try to quickly put together a variant of their gene-editing drug that’s tuned to fix that child’s particular genetic problem.

Musunuru, who isn’t involved with Aurora, does not think the company’s plans for PKU count as fully personalized editors. “These corporate PKU efforts have nothing whatsoever to do with Baby KJ,” he says. He says his center continues to focus on mutations “so ultra-rare that we don’t see any scenario where a for-profit gene-editing company would find that indication to be commercially viable.”

Instead, what’s occurring in PKU, says Musunuru, is that researchers have realized they can assemble “a bunch” of the most frequent mutations “into a large enough group of patients to make a platform PKU therapy commercially viable.”

While that would still leave out many patients with extra-rare gene errors, Musunuru says any gene-editing treatment at all would still be “a big improvement over the status quo, which is zero genetic therapies for PKU.”

Ecommerce MGMT 0 Comments

Dec 25 2025

Four bright spots in climate news in 2025

Climate news hasn’t been great in 2025. Global greenhouse-gas emissions hit record highs (again). This year is set to be either the second or third warmest on record. Climate-fueled disasters like wildfires in California and flooding in Indonesia and Pakistan devastated communities and caused billions in damage.

In addition to these worrying indicators of our continued contributions to climate change and their obvious effects, the world’s largest economy has made a sharp U-turn on climate policy this year. The US under the Trump administration withdrew from the Paris Agreement, cut funds for climate research, and scrapped billions of dollars in funding for climate tech projects.

We’re in a severe situation with climate change. But for those looking for bright spots, there was some good news in 2025. Here are a few of the positive stories our climate reporters noticed this year.

China’s flattening emissions

One of the most notable and encouraging signs of progress this year occurred in China. The world’s second-biggest economy and biggest climate polluter has managed to keep carbon dioxide emissions flat for the last year and a half, according to an analysis in Carbon Brief.

That’s happened before, but only when the nation’s economy was retracting, including in the midst of the covid-19 pandemic. But emissions are now falling even as China’s economy is on track to grow about 5% this year, and electricity demands continue to rise.

So what’s changed? China has now installed so much solar and wind, and put so many EVs on the road, that its economy can continue to expand without increasing the amount of carbon dioxide it’s pumping into the atmosphere, decoupling the traditional link between emissions and growth.

Specifically, China added an astounding 240 gigawatts of solar power capacity and 61 gigawatts of wind power in the first nine months of the year, the Carbon Brief analysis noted. That’s nearly as much solar power as the US has installed in total, in just the first three quarters of this year.

It’s too early to say China’s emissions have peaked, but the country has said it will officially reach that benchmark before 2030.

To be clear, China still isn’t moving fast enough to keep the world on track for meeting relatively safe temperature targets. (Indeed, very few countries are.) But it’s now both producing most of the world’s clean energy technologies and curbing its emissions growth, providing a model for cleaning up industrial economies without sacrificing economic prosperity—and setting the stage for faster climate progress in the coming years.

Batteries on the grid

looking down a row on battery storage units on an overcast day

It’s hard to articulate just how quickly batteries for grid storage are coming online. These massive arrays of cells can soak up electricity when sources like solar are available and prices are low, and then discharge power back to the grid when it’s needed most.

Back in 2015, the battery storage industry had installed only a fraction of a gigawatt of battery storage capacity across the US. That year, it set a seemingly bold target of adding 35 gigawatts by 2035. The sector passed that goal a decade early this year and then hit 40 gigawatts a couple of months later.

Costs are still falling, which could help maintain the momentum for the technology’s deployment. This year, battery prices for EVs and stationary storage fell yet again, reaching a record low, according to data from BloombergNEF. Battery packs specifically used for grid storage saw prices fall even faster than the average; they cost 45% less than last year.

We’re starting to see what happens on grids with lots of battery capacity, too: in California and Texas, batteries are already helping meet demand in the evenings, reducing the need to run natural-gas plants. The result: a cleaner, more stable grid.

AI’s energy funding influx

Aerial view of a large Google Data Centre being built in Cheshunt, Hertfordshire, UK

The AI boom is complicated for our energy system, as we covered at length this year. Electricity demand is ticking up: the amount of power utilities supplied to US data centers jumped 22% this year and will more than double by 2030.

But at least one positive shift is coming out of AI’s influence on energy: It’s driving renewed interest and investment in next-generation energy technologies.

In the near term, much of the energy needed for data centers, including those that power AI, will likely come from fossil fuels, especially new natural-gas power plants. But tech giants like Google, Microsoft, and Meta all have goals on the books to reduce their greenhouse-gas emissions, so they’re looking for alternatives.

Meta signed a deal with XGS Energy in June to purchase up to 150 megawatts of electricity from a geothermal plant. In October, Google signed an agreement that will help reopen Duane Arnold Energy Center in Iowa, a previously shuttered nuclear power plant.

Geothermal and nuclear could be key pieces of the grid of the future, as they can provide constant power in a way that wind and solar don’t. There’s a long way to go for many of the new versions of the tech, but more money and interest from big, powerful players can’t hurt.

Good news, bad news

Aerial view of solar power and battery storage units in the desert

Perhaps the strongest evidence of collective climate progress so far: We’ve already avoided the gravest dangers that scientists feared just a decade ago.

The world is on track for about 2.6 °C of warming over preindustrial conditions by 2100, according to Climate Action Tracker, an independent scientific effort to track the policy progress that nations have made toward their goals under the Paris climate agreement.

That’s a lot warmer than we want the planet to ever get. But it’s also a whole degree better than the 3.6 °C path that we were on a decade ago, just before nearly 200 countries signed the Paris deal.

That progress occurred because more and more nations passed emissions mandates, funded subsidies, and invested in research and development—and private industry got busy cranking out vast amounts of solar panels, wind turbines, batteries, and EVs.

The bad news is that progress has stalled. Climate Action Tracker notes that its warming projections have remained stubbornly fixed for the last four years, as nations have largely failed to take the additional action needed to bend that curve closer to the 2 °C goal set out in the international agreement.

But having shaved off a degree of danger is still demonstrable proof that we can pull together in the face of a global threat and address a very, very hard problem. And it means we’ve done the difficult work of laying down the technical foundation for a society that can largely run without spewing ever more greenhouse gas into the atmosphere.

Hopefully, as cleantech continues to improve and climate change steadily worsens, the world will find the collective will to pick up the pace again soon.

Ecommerce MGMT 0 Comments

Dec 16 2025

AI coding is now everywhere. But not everyone is convinced.

Depending who you ask, AI-powered coding is either giving software developers an unprecedented productivity boost or churning out masses of poorly designed code that saps their attention and sets software projects up for serious long term-maintenance problems.

The problem is right now, it’s not easy to know which is true.

As tech giants pour billions into large language models (LLMs), coding has been touted as the technology’s killer app. Both Microsoft CEO Satya Nadella and Google CEO Sundar Pichai have claimed that around a quarter of their companies’ code is now AI-generated. And in March, Anthropic’s CEO, Dario Amodei, predicted that within six months 90% of all code would be written by AI. It’s an appealing and obvious use case. Code is a form of language, we need lots of it, and it’s expensive to produce manually. It’s also easy to tell if it works—run a program and it’s immediately evident whether it’s functional.

This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.

Executives enamored with the potential to break through human bottlenecks are pushing engineers to lean into an AI-powered future. But after speaking to more than 30 developers, technology executives, analysts, and researchers, MIT Technology Review found that the picture is not as straightforward as it might seem.

For some developers on the front lines, initial enthusiasm is waning as they bump up against the technology’s limitations. And as a growing body of research suggests that the claimed productivity gains may be illusory, some are questioning whether the emperor is wearing any clothes.

The pace of progress is complicating the picture, though. A steady drumbeat of new model releases mean these tools’ capabilities and quirks are constantly evolving. And their utility often depends on the tasks they are applied to and the organizational structures built around them. All of this leaves developers navigating confusing gaps between expectation and reality.

Is it the best of times or the worst of times (to channel Dickens) for AI coding? Maybe both.

A fast-moving field

It’s hard to avoid AI coding tools these days. There are a dizzying array of products available, both from model developers like Anthropic, OpenAI, and Google and from companies like Cursor and Windsurf, which wrap these models in polished code-editing software. And according to Stack Overflow’s 2025 Developer Survey, they’re being adopted rapidly, with 65% of developers now using them at least weekly.

AI coding tools first emerged around 2016 but were supercharged with the arrival of LLMs. Early versions functioned as little more than autocomplete for programmers, suggesting what to type next. Today they can analyze entire code bases, edit across files, fix bugs, and even generate documentation explaining how the code works. All this is guided through natural-language prompts via a chat interface.

“Agents”—autonomous LLM-powered coding tools that can take a high-level plan and build entire programs independently—represent the latest frontier in AI coding. This leap was enabled by the latest reasoning models, which can tackle complex problems step by step and, crucially, access external tools to complete tasks. “This is how the model is able to code, as opposed to just talk about coding,” says Boris Cherny, head of Claude Code, Anthropic’s coding agent.

These agents have made impressive progress on software engineering benchmarks—standardized tests that measure model performance. When OpenAI introduced the SWE-bench Verified benchmark in August 2024, offering a way to evaluate agents’ success at fixing real bugs in open-source repositories, the top model solved just 33% of issues. A year later, leading models consistently score above 70%.

In February, Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, coined the term “vibe coding”—meaning an approach where people describe software in natural language and let AI write, refine, and debug the code. Social media abounds with developers who have bought into this vision, claiming massive productivity boosts.

But while some developers and companies report such productivity gains, the hard evidence is more mixed. Early studies from GitHub, Google, and Microsoft—all vendors of AI tools—found developers completing tasks 20% to 55% faster. But a September report from the consultancy Bain & Company described real-world savings as “unremarkable.”

Data from the developer analytics firm GitClear shows that most engineers are producing roughly 10% more durable code—code that isn’t deleted or rewritten within weeks—since 2022, likely thanks to AI. But that gain has come with sharp declines in several measures of code quality. Stack Overflow’s survey also found trust and positive sentiment toward AI tools falling significantly for the first time. And most provocatively, a July study by the nonprofit research organization Model Evaluation & Threat Research (METR) showed that while experienced developers believed AI made them 20% faster, objective tests showed they were actually 19% slower.

Growing disillusionment

For Mike Judge, principal developer at the software consultancy Substantial, the METR study struck a nerve. He was an enthusiastic early adopter of AI tools, but over time he grew frustrated with their limitations and the modest boost they brought to his productivity. “I was complaining to people because I was like, ‘It’s helping me but I can’t figure out how to make it really help me a lot,’” he says. “I kept feeling like the AI was really dumb, but maybe I could trick it into being smart if I found the right magic incantation.”

When asked by a friend, Judge had estimated the tools were providing a roughly 25% speedup. So when he saw similar estimates attributed to developers in the METR study he decided to test his own. For six weeks, he guessed how long a task would take, flipped a coin to decide whether to use AI or code manually, and timed himself. To his surprise, AI slowed him down by an median of 21%—mirroring the METR results.

This got Judge crunching the numbers. If these tools were really speeding developers up, he reasoned, you should see a massive boom in new apps, website registrations, video games, and projects on GitHub. He spent hours and several hundred dollars analyzing all the publicly available data and found flat lines everywhere.

“Shouldn’t this be going up and to the right?” says Judge. “Where’s the hockey stick on any of these graphs? I thought everybody was so extraordinarily productive.” The obvious conclusion, he says, is that AI tools provide little productivity boost for most developers.

Developers interviewed by MIT Technology Review generally agree on where AI tools excel: producing “boilerplate code” (reusable chunks of code repeated in multiple places with little modification), writing tests, fixing bugs, and explaining unfamiliar code to new developers. Several noted that AI helps overcome the “blank page problem” by offering an imperfect first stab to get a developer’s creative juices flowing. It can also let nontechnical colleagues quickly prototype software features, easing the load on already overworked engineers.

These tasks can be tedious, and developers are typically glad to hand them off. But they represent only a small part of an experienced engineer’s workload. For the more complex problems where engineers really earn their bread, many developers told MIT Technology Review, the tools face significant hurdles.

Perhaps the biggest problem is that LLMs can hold only a limited amount of information in their “context window”—essentially their working memory. This means they struggle to parse large code bases and are prone to forgetting what they’re doing on longer tasks. “It gets really nearsighted—it’ll only look at the thing that’s right in front of it,” says Judge. “And if you tell it to do a dozen things, it’ll do 11 of them and just forget that last one.”

LLMs’ myopia can lead to headaches for human coders. While an LLM-generated response to a problem may work in isolation, software is made up of hundreds of interconnected modules. If these aren’t built with consideration for other parts of the software, it can quickly lead to a tangled, inconsistent code base that’s hard for humans to parse and, more important, to maintain.

Developers have traditionally addressed this by following conventions—loosely defined coding guidelines that differ widely between projects and teams. “AI has this overwhelming tendency to not understand what the existing conventions are within a repository,” says Bill Harding, the CEO of GitClear. “And so it is very likely to come up with its own slightly different version of how to solve a problem.”

The models also just get things wrong. Like all LLMs, coding models are prone to “hallucinating”—it’s an issue built into how they work. But because the code they output looks so polished, errors can be difficult to detect, says James Liu, director of software engineering at the advertising technology company Mediaocean. Put all these flaws together, and using these tools can feel a lot like pulling a lever on a one-armed bandit. “Some projects you get a 20x improvement in terms of speed or efficiency,” says Liu. “On other things, it just falls flat on its face, and you spend all this time trying to coax it into granting you the wish that you wanted and it’s just not going to.”

Judge suspects this is why engineers often overestimate productivity gains. “You remember the jackpots. You don’t remember sitting there plugging tokens into the slot machine for two hours,” he says.

And it can be particularly pernicious if the developer is unfamiliar with the task. Judge remembers getting AI to help set up a Microsoft cloud service called an Azure Functions, which he’d never used before. He thought it would take about two hours, but nine hours later he threw in the towel. “It kept leading me down these rabbit holes and I didn’t know enough about the topic to be able to tell it ‘Hey, this is nonsensical,’” he says.

The debt begins to mount up

Developers constantly make trade-offs between speed of development and the maintainability of their code—creating what’s known as “technical debt,” says Geoffrey G. Parker, professor of engineering innovation at Dartmouth College. Each shortcut adds complexity and makes the code base harder to manage, accruing “interest” that must eventually be repaid by restructuring the code. As this debt piles up, adding new features and maintaining the software becomes slower and more difficult.

Accumulating technical debt is inevitable in most projects, but AI tools make it much easier for time-pressured engineers to cut corners, says GitClear’s Harding. And GitClear’s data suggests this is happening at scale. Since 2020, the company has seen a significant rise in the amount of copy-pasted code—an indicator that developers are reusing more code snippets, most likely based on AI suggestions—and an even bigger decline in the amount of code moved from one place to another, which happens when developers clean up their code base.

And as models improve, the code they produce is becoming increasingly verbose and complex, says Tariq Shaukat, CEO of Sonar, which makes tools for checking code quality. This is driving down the number of obvious bugs and security vulnerabilities, he says, but at the cost of increasing the number of “code smells”—harder-to-pinpoint flaws that lead to maintenance problems and technical debt.

Recent research by Sonar found that these make up more than 90% of the issues found in code generated by leading AI models. “Issues that are easy to spot are disappearing, and what’s left are much more complex issues that take a while to find,” says Shaukat. “That’s what worries us about this space at the moment. You’re almost being lulled into a false sense of security.”

If AI tools make it increasingly difficult to maintain code, that could have significant security implications, says Jessica Ji, a security researcher at Georgetown University. “The harder it is to update things and fix things, the more likely a code base or any given chunk of code is to become insecure over time,” says Ji.

There are also more specific security concerns, she says. Researchers have discovered a worrying class of hallucinations where models reference nonexistent software packages in their code. Attackers can exploit this by creating packages with those names that harbor vulnerabilities, which the model or developer may then unwittingly incorporate into software.

LLMs are also vulnerable to “data-poisoning attacks,” where hackers seed the publicly available data sets models train on with data that alters the model’s behavior in undesirable ways, such as generating insecure code when triggered by specific phrases. In October, research by Anthropic found that as few as 250 malicious documents can introduce this kind of back door into an LLM regardless of its size.

The converted

Despite these issues, though, there’s probably no turning back. “Odds are that writing every line of code on a keyboard by hand—those days are quickly slipping behind us,” says Kyle Daigle, chief operating officer at the Microsoft-owned code-hosting platform GitHub, which produces a popular AI-powered tool called Copilot (not to be confused with the Microsoft product of the same name).

The Stack Overflow report found that despite growing distrust in the technology, usage has increased rapidly and consistently over the past three years. Erin Yepis, a senior analyst at Stack Overflow, says this suggests that engineers are taking advantage of the tools with a clear-eyed view of the risks. The report also found that frequent users tend to be more enthusiastic and more than half of developers are not using the latest coding agents, perhaps explaining why many remain underwhelmed by the technology.

Those latest tools can be a revelation. Trevor Dilley, CTO at the software development agency Twenty20 Ideas, says he had found some value in AI editors’ autocomplete functions, but when he tried anything more complex it would “fail catastrophically.” Then in March, while on vacation with his family, he set the newly released Claude Code to work on one of his hobby projects. It completed a four-hour task in two minutes, and the code was better than what he would have written.

“I was like, Whoa,” he says. “That, for me, was the moment, really. There’s no going back from here.” Dilley has since cofounded a startup called DevSwarm, which is creating software that can marshal multiple agents to work in parallel on a piece of software.

The challenge, says Armin Ronacher, a prominent open-source developer, is that the learning curve for these tools is shallow but long. Until March he’d remained unimpressed by AI tools, but after leaving his job at the software company Sentry in April to launch a startup, he started experimenting with agents. “I basically spent a lot of months doing nothing but this,” he says. “Now, 90% of the code that I write is AI-generated.”

Getting to that point involved extensive trial and error, to figure out which problems tend to trip the tools up and which they can handle efficiently. Today’s models can tackle most coding tasks with the right guardrails, says Ronacher, but these can be very task and project specific.

To get the most out of these tools, developers must surrender control over individual lines of code and focus on the overall software architecture, says Nico Westerdale, chief technology officer at the veterinary staffing company IndeVets. He recently built a data science platform 100,000 lines of code long almost exclusively by prompting models rather than writing the code himself.

Westerdale’s process starts with an extended conversation with the modelagent to develop a detailed plan for what to build and how. He then guides it through each step. It rarely gets things right on the first try and needs constant wrangling, but if you force it to stick to well-defined design patterns, the models can produce high-quality, easily maintainable code, says Westerdale. He reviews every line, and the code is as good as anything he’s ever produced, he says: “I’ve just found it absolutely revolutionary,. It’s also frustrating, difficult, a different way of thinking, and we’re only just getting used to it.”

But while individual developers are learning how to use these tools effectively, getting consistent results across a large engineering team is significantly harder. AI tools amplify both the good and bad aspects of your engineering culture, says Ryan J. Salva, senior director of product management at Google. With strong processes, clear coding patterns, and well-defined best practices, these tools can shine.

But if your development process is disorganized, they’ll only magnify the problems. It’s also essential to codify that institutional knowledge so the models can draw on it effectively. “A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads,” he says.

The cryptocurrency exchange Coinbase has been vocal about its adoption of AI tools. CEO Brian Armstrong made headlines in August when he revealed that the company had fired staff unwilling to adopt AI tools. But Coinbase’s head of platform, Rob Witoff, tells MIT Technology Review that while they’ve seen massive productivity gains in some areas, the impact has been patchy. For simpler tasks like restructuring the code base and writing tests, AI-powered workflows have achieved speedups of up to 90%. But gains are more modest for other tasks, and the disruption caused by overhauling existing processes often counteracts the increased coding speed, says Witoff.

One factor is that AI tools let junior developers produce far more code,. As in almost all engineering teams, this code has to be reviewed by others, normally more senior developers, to catch bugs and ensure it meets quality standards. But the sheer volume of code now being churned out i whichs quickly saturatinges the ability of midlevel staff to review changes. “This is the cycle we’re going through almost every month, where we automate a new thing lower down in the stack, which brings more pressure higher up in the stack,” he says. “Then we’re looking at applying automation to that higher-up piece.”

Developers also spend only 20% to 40% of their time coding, says Jue Wang, a partner at Bain, so even a significant speedup there often translates to more modest overall gains. Developers spend the rest of their time analyzing software problems and dealing with customer feedback, product strategy, and administrative tasks. To get significant efficiency boosts, companies may need to apply generative AI to all these other processes too, says Jue, and that is still in the works.

Rapid evolution

Programming with agents is a dramatic departure from previous working practices, though, so it’s not surprising companies are facing some teething issues. These are also very new products that are changing by the day. “Every couple months the model improves, and there’s a big step change in the model’s coding capabilities and you have to get recalibrated,” says Anthropic’s Cherny.

For example, in June Anthropic introduced a built-in planning mode to Claude; it has since been replicated by other providers. In October, the company also enabled Claude to ask users questions when it needs more context or faces multiple possible solutions, which Cherny says helps it avoid the tendency to simply assume which path is the best way forward.

Most significant, Anthropic has added features that make Claude better at managing its own context. When it nears the limits of its working memory, it summarizes key details and uses them to start a new context window, effectively giving it an “infinite” one, says Cherny. Claude can also invoke sub-agents to work on smaller tasks, so it no longer has to hold all aspects of the project in its own head. The company claims that its latest model, Claude 4.5 Sonnet, can now code autonomously for more than 30 hours without major performance degradation.

Novel approaches to software development could also sidestep coding agents’ other flaws. MIT professor Max Tegmark has introduced something he calls “vericoding,” which could allow agents to produce entirely bug-free code from a natural-language description. It builds on an approach known as “formal verification,” where developers create a mathematical model of their software that can prove incontrovertibly that it functions correctly. This approach is used in high-stakes areas like flight-control systems and cryptographic libraries, but it remains costly and time-consuming, limiting its broader use.

Rapid improvements in LLMs’ mathematical capabilities have opened up the tantalizing possibility of models that produce not only software but the mathematical proof that it’s bug free, says Tegmark. “You just give the specification, and the AI comes back with provably correct code,” he says. “You don’t have to touch the code. You don’t even have to ever look at the code.”

When tested on about 2,000 vericoding problems in Dafny—a language designed for formal verification—the best LLMs solved over 60%, according to non-peer-reviewed research by Tegmark’s group. This was achieved with off-the-shelf LLMs, and Tegmark expects that training specifically for vericoding could improve scores rapidly.

And counterintuitively, Tthe speed at which AI generates code could actuallylso ease maintainability concerns. Alex Worden, principal engineer at the business software giant Intuit, notes that maintenance is often difficult because engineers reuse components across projects, creating a tangle of dependencies where one change triggers cascading effects across the code base. Reusing code used to save developers time, but in a world where AI can produce hundreds of lines of code in seconds, that imperative has gone, says Worden.

Instead, he advocates for “disposable code,” where each component is generated independently by AI without regard for whether it follows design patterns or conventions. They are then connected via APIs—sets of rules that let components request information or services from each other. Each component’s inner workings are not dependent on other parts of the code base, making it possible to rip them out and replace them without wider impact, says Worden.

“The industry is still concerned about humans maintaining AI-generated code,” he says. “I question how long humans will look at or care about code.”

A narrowing talent pipeline

For the foreseeable future, though, humans will still need to understand and maintain the code that underpins their projects. And one of the most pernicious side effects of AI tools may be a shrinking pool of people capable of doing so.

Early evidence suggests that fears around the job-destroying effects of AI may be justified. A recent Stanford University study found that employment among software developers aged 22 to 25 fell nearly 20% between 2022 and 2025, coinciding with the rise of AI-powered coding tools.

Experienced developers could face difficulties too. Luciano Nooijen, an engineer at the video-game infrastructure developer Companion Group, used AI tools heavily in his day job, where they were provided for free. But when he began a side project without access to those tools, he found himself struggling with tasks that previously came naturally. “I was feeling so stupid because things that used to be instinct became manual, sometimes even cumbersome,” says Nooijen.

Just as athletes still perform basic drills, he thinks the only way to maintain an instinct for coding is to regularly practice the grunt work. That’s why he’s largely abandoned AI tools, though he admits that deeper motivations are also at play.

Part of the reason Nooijen and other developers MIT Technology Review spoke to are pushing back against AI tools is a sense that they are hollowing out the parts of their jobs that they love. “I got into software engineering because I like working with computers. I like making machines do things that I want,” Nooijen says. “It’s just not fun sitting there with my work being done for me.”

Ecommerce MGMT 0 Comments

App Artificial intelligence Hype Correction Why It Matters

Dec 16 2025

AI might not be coming for lawyers’ jobs anytime soon

When the generative AI boom took off in 2022, Rudi Miller and her law school classmates were suddenly gripped with anxiety. “Before graduating, there was discussion about what the job market would look like for us if AI became adopted,” she recalls.

So when it came time to choose a speciality, Miller—now a junior associate at the law firm Orrick—decided to become a litigator, the kind of lawyer who represents clients in court. She hoped the courtroom would be the last human stage. “Judges haven’t allowed ChatGPT-enabled robots to argue in court yet,” she says.

This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.

She had reason to be worried. The artificial-intelligence job apocalypse seemed to be coming for lawyers. In March 2023, researchers reported that GPT-4 had smashed the Uniform Bar Exam. That same month, an industry report predicted that 44% of legal work could be automated. The legal tech industry entered a boom as law firms began adopting generative AI to mine mountains of documents and draft contracts, work ordinarily done by junior associates. Last month, the law firm Clifford Chance axed 10% of its staff in London, citing increased use of AI as a reason.

But for all the hype, LLMs are still far from thinking like lawyers—let alone replacing them. The models continue to hallucinate case citations, struggle to navigate gray areas of the law and reason about novel questions, and stumble when they attempt to synthesize information scattered across statutes, regulations, and court cases. And there are deeper institutional reasons to think the models could struggle to supplant legal jobs. While AI is reshaping the grunt work of the profession, the end of lawyers may not be arriving anytime soon.

The big experiment

The legal industry has long been defined by long hours and grueling workloads, so the promise of superhuman efficiency is appealing. Law firms are experimenting with general-purpose tools like ChatGPT and Microsoft Copilot and specialized legal tools like Harvey and Thomson Reuters’ CoCounsel, with some building their own in-house tools on top of frontier models. They’re rolling out AI boot camps and letting associates bill hundreds of hours to AI experimentation. As of 2024, 47.8% of attorneys at law firms employing 500 or more lawyers used AI, according to the American Bar Association.

But lawyers say that LLMs are a long way from reasoning well enough to replace them. Lucas Hale, a junior associate at McDermott Will & Schulte, has been embracing AI for many routine chores. He uses Relativity to sift through long documents and Microsoft Copilot for drafting legal citations. But when he turns to ChatGPT with a complex legal question, he finds the chatbot spewing hallucinations, rambling off topic, or drawing a blank.

“In the case where we have a very narrow question or a question of first impression for the court,” he says, referring to a novel legal question that a court has never decided before, “that’s the kind of thinking that the tool can’t do.”

Much of Lucas’s work involves creatively applying the law to new fact patterns. “Right now, I don’t think very much of the work that litigators do, at least not the work that I do, can be outsourced to an AI utility,” he says.

Allison Douglis, a senior associate at Jenner & Block, uses an LLM to kick off her legal research. But the tools only take her so far. “When it comes to actually fleshing out and developing an argument as a litigator, I don’t think they’re there,” she says. She has watched the models hallucinate case citations and fumble through ambiguous areas of the law.

“Right now, I would much rather work with a junior associate than an AI tool,” she says. “Unless they get extraordinarily good very quickly, I can’t imagine that changing in the near future.”

Beyond the bar

The legal industry has seemed ripe for an AI takeover ever since ChatGPT’s triumph on the bar exam. But passing a standardized test isn’t the same as practicing law. The exam tests whether people can memorize legal rules and apply them to hypothetical situations—not whether they can exercise strategic judgment in complicated realities or craft arguments in uncharted legal territory. And models can be trained to ace benchmarks without genuinely improving their reasoning.

But new benchmarks are aiming to better measure the models’ ability to do legal work in the real world. The Professional Reasoning Benchmark, published by ScaleAI in November, evaluated leading LLMs on legal and financial tasks designed by professionals in the field. The study found that the models have critical gaps in their reliability for professional adoption, with the best-performing model scoring only 37% on the most difficult legal problems, meaning it met just over a third of possible points on the evaluation criteria. The models frequently made inaccurate legal judgments, and if they did reach correct conclusions, they did so through incomplete or opaque reasoning processes.

“The tools actually are not there to basically substitute [for] your lawyer,” says Afra Feyza Akyurek, the lead author of the paper. “Even though a lot of people think that LLMs have a good grasp of the law, it’s still lagging behind.”

The paper builds on other benchmarks measuring the models’ performance on economically valuable work. The AI Productivity Index, published by the data firm Mercor in September and updated in December, found that the models have “substantial limitations” in performing legal work. The best-performing model scored 77.9% on legal tasks, meaning it satisfied roughly four out of five evaluation criteria. A model with such a score might generate substantial economic value in some industries, but in fields where errors are costly, it may not be useful at all, the early version of the study noted.

Professional benchmarks are a big step forward in evaluating the LLMs’ real-world capabilities, but they may still not capture what lawyers actually do. “These questions, although more challenging than those in past benchmarks, still don’t fully reflect the kinds of subjective, extremely challenging questions lawyers tackle in real life,” says Jon Choi, a law professor at the University of Washington School of Law, who coauthored a study on legal benchmarks in 2023.

Unlike math or coding, in which LLMs have made significant progress, legal reasoning may be challenging for the models to learn. The law deals with messy real-world problems, riddled with ambiguity and subjectivity, that often have no right answer, says Choi. Making matters worse, a lot of legal work isn’t recorded in ways that can be used to train the models, he says. When it is, documents can span hundreds of pages, scattered across statutes, regulations, and court cases that exist in a complex hierarchy.

But a more fundamental limitation might be that LLMs are simply not trained to think like lawyers. “The reasoning models still don’t fully reason about problems like we humans do,” says Julian Nyarko, a law professor at Stanford Law School. The models may lack a mental model of the world—the ability to simulate a scenario and predict what will happen—and that capability could be at the heart of complex legal reasoning, he says. It’s possible that the current paradigm of LLMs trained on next-word prediction gets us only so far.

The jobs remain

Despite early signs that AI is beginning to affect entry-level workers, labor statistics have yet to show that lawyers are being displaced. 93.4% of law school graduates in 2024 were employed within 10 months of graduation—the highest rate on record—according to the National Association for Law Placement. The number of graduates working in law firms rose by 13% from 2023 to 2024.

For now, law firms are slow to shrink their ranks. “We’re not reducing headcounts at this point,” said Amy Ross, the chief of attorney talent at the law firm Ropes & Gray.

Even looking ahead, the effects could be incremental. “I will expect some impact on the legal profession’s labor market, but not major,” says Mert Demirer, an economist at MIT. “AI is going to be very useful in terms of information discovery and summary,” he says, but for complex legal tasks, “the law’s low risk tolerance, plus the current capabilities of AI, are going to make that case less automatable at this point.” Capabilities may evolve over time, but that’s a big unknown.

It’s not just that the models themselves are not ready to replace junior lawyers. Institutional barriers may also shape how AI is deployed. Higher productivity reduces billable hours, challenging the dominant business model of law firms. Liability looms large for lawyers, and clients may still want a human on the hook. Regulations could also constrain how lawyers use the technology.

Still, as AI takes on some associate work, law firms may need to reinvent their training system. “When junior work dries up, you have to have a more formal way of teaching than hoping that an apprenticeship works,” says Ethan Mollick, a management professor at the Wharton School of the University of Pennsylvania.

Zach Couger, a junior associate at McDermott Will & Schulte, leans on ChatGPT to comb through piles of contracts he once slogged through by hand. He can’t imagine going back to doing the job himself, but he wonders what he’s missing.

“I’m worried that I’m not getting the same reps that senior attorneys got,” he says, referring to the repetitive training that has long defined the early experiences of lawyers. “On the other hand, it is very nice to have a semi–knowledge expert to just ask questions to that’s not a partner who’s also very busy.”

Even though an AI job apocalypse looks distant, the uncertainty sticks with him. Lately, Couger finds himself staying up late, wondering if he could be part of the last class of associates at big law firms: “I may be the last plane out.”

Ecommerce MGMT 0 Comments