The snow gods: How a couple of ski bums built the internet’s best weather app

The best snow-forecasting app for skiers and snowboarders isn’t from any of the federally funded weather services. Nor from any of the big-name brands. It’s an independent app startup that leverages government data, its own AI models, and decades of alpine-life experience to offer better snow (and soon avalanche) predictions than anything else out there.

Skiers in the know follow OpenSnow and won’t bother heading to the mountains—from Alpine Meadows to Mont Blanc, Crested Butte to Killington—unless this small team of trusted weathered men tells them to. (And yes, they’re all men.) The app has made microcelebrities of its forecasters, who sift through and analyze reams of data to write “Daily Snow” reports for locations throughout the world.

“I’m F-list famous,” OpenSnow founding partner and forecaster Bryan Allegretto says with a laugh. “Not even D-list.” 

The app has proved especially vital this year, which has been one of the weirder winters on record. The US West saw very little daily snow, despite an intense storm cycle that led to one of the deadliest avalanches in history. That storm was followed by one of the fastest melts in memory, and several resorts in California are already shutting down for the season. Meanwhile, in the East, the ongoing snowfall has offered a rare gift: a deep and seemingly endless winter.. 

MIT Technology Review caught up with Allegretto, better known as BA, in the Tahoe mountains to talk about the weather, AI, avalanches, and how a little weather app became the closest thing powder-hounds have to a crystal ball: a daily dump of the freshest, most decipherable, and most micro-accurate forecasts in the biz. And how two once-broke ski bums—Allegretto and his Colorado counterpart, CEO Joel Gratz— managed to bootstrap a business and turn an email list of 37 into a cult following half a million strong. 

This interview has been edited for clarity and accuracy. 

You grew up in New Jersey. Middle of the pack as far as snowy states. What were your winters like as a kid?

I was always obsessed with weather. Especially severe weather. Nor’easters. There was the blizzard of ’89, I believe, that hit the East Coast hard—dropped two to three feet of snow, which was a lot for the Jersey Shore. My dad worked for the highway authority, so he had tools other than the evening news. He was in charge of calling out the snowplows whenever it snowed, so I just remember chasing storms with my dad. I wasn’t allowed to ride in the snowplows. I’d watch them. When I got older, I was the one shoveling the neighbors’ driveways. I just liked being out there. In it. In college, I used to go around and shovel all the girls’ sidewalks. That was fun. 

When did you start skiing?

We would cut school and take a bus to go skiing, unbeknownst to our parents. It was the ’90s, and the surfers decided snowboarding would be fun, so the local surf shop started  running a bus and all these surfers would show up and hop the bus to Hunter Mountain. We’d drive to the Poconos, go night skiing, turn around. It wasn’t uncommon for me in high school to get in the car by myself, either —and just drive. Me, my dog, my backpack. I’d sleep in gas stations and ski. Storm-chasing around the Northeast. 

What were you really chasing, you think?

Natural highs. Happiness. I’ve always been a soul-searcher. I grew up in a crazy house situation, a broken home. My dad left. My mom became a drug addict. I just wanted to be gone. I’m the oldest. I was always trying to help my mom and make sure she was okay. No one was telling me to go to school and have a career. I just wanted to do something that fulfills me.

How’d you go about figuring out what that was? 

For me, to go to school was a big task, given where I was coming out of. There wasn’t any money. I could get grants and scholarships because my mom was so poor. I wanted to go to Penn State but didn’t have the grades. I ended up at Kean, a public university in New Jersey. It had a meteorology program. We got to go to New York City, to NBC, and practiced on the green screen. In meteorology school, I started thinking: How do I work in the ski and snowboard industry and use weather at the same time? I went to Rowan [University] for business, in South Jersey, and in between moved to Hawaii to surf and spent a year teaching snowboarding. My goal the whole time was to not work in a career I hated.

I imagine you weren’t like most meteorology students. 

Us punk rockers, skaters, snowboarders—we were a little different than the typical meteorology nerds. I was the radical storm chaser. A big personality. I still am.

You didn’t quite fit the traditional weatherman mold.

Back then, there were no smartphones or social media. If you were a meteorologist, you either worked in a cubicle for the government or at an insurance company assessing weather risk.  Or you were on the local news. That wasn’t my thing. They didn’t want Grizzly Adams up there with his big beard.

Beards belong in the mountains?

Meteorologists live in cities because that’s where the jobs are. They don’t live in small mountain towns.  That’s what was missing in the industry. When I moved to Tahoe, in 2006, I realized nobody had any trust in the weather forecasts. It was more like a “We’ll believe it when we see it” old-fashioned mentality. If you’re a forecaster in flat areas, you just look at the weather model and regurgitate the news. Weathermen in Sacramento or Reno didn’t give a crap about the ski resorts! They’d just say “We’ll see three feet above 6,000 feet” and go on to the next segment. And skiers were like: “Wait a minute. Is it going to be windy at the top?” I thought: Let’s home in and give skiers what they’re looking for.

So you were living in Tahoe, skiing and forecasting?

I was working in the office at a resort, snowboarding, and doing weather on the side. I’d get up at 4 a.m. and do it before my 9 a.m. day job. Forecasting, figuring out: How the heck do these storms interact with these mountains? I started emailing everyone in the office what I’d see coming, and people kept saying “Add me! Add me!”  Eventually, resorts around Tahoe started asking to use my forecasts.

How were you actually forecasting, though? 

The NOAA, the GFS [Global Forecasting System], the Canadian model, the Euro model, German, Japanese—all these governments make these weather models to forecast the weather. And share it. Anyone can access it. But you can’t just look at a weather model and go, Yep, that’s what’s going to happen. That’s not how it works in the mountains. It’s way harder. You can’t rely on model data. It’s low-res, forecasting for a grid area that’s too big. It can’t understand what’s going on. It’s going to generalize the weather. You can try that, but you’re going to be wrong. A lot of people are going to stop listening. I was able to forecast more accurately than most people because I was living there; I could fix a lot of these errors. Around 2007, I started my own website, Tahoe Weather Discussion.

Bryan Allegretto (right) with Joel Gratz (center) and Gratz' wife.
Bryan Allegretto (right) on the lift with OpenSnow CEO Joel Gratz and Gratz’ wife Lauren.
COURTESY OF BRYAN ALLEGRETTO

Snazzy.

Meanwhile, I heard about this guy Joel out in Boulder, Colorado. People were telling us about each other, saying: “You guys are doing the same thing!” He was sleeping on his friend’s couch, running a site called Colorado Powder Forecast. And then there was Evan [Thayer, who would later join the company], in Utah. I think his website was called Wasatch Forecast. 

Great minds!

He actually grew up outside Philly, only about an hour from me. We both were obsessed with storms and snow and moved west to the mountains and started similar websites. We would’ve been best friends as kids! Anyway, Joel called me in 2010 and was like, “Hey. I’m building this site, forecasting skiing in ski states.” And wanted me to join. He knew I had big traffic. He was like, “Let’s do it together, not against each other.” I asked, “What’s the pay?” He said, Zero. Give me your company. 

And you just said: Yeah, sounds good?

I just really trusted him. He’d asked Evan too—but Evan was like, Give you my site and my traffic for free?? No, I built this.

A normal response.

I was the knucklehead that was like, okay. Evan was still single. I already had a wife and two kids. I’d just had my son. I was working two jobs. I was so overwhelmed. So busy with my day job, as an account manager at the Ritz at North Star. Vail had just bought them and we all thought we were going to lose our jobs. My site was struggling. I was desperate for somebody to do it with. I think I thought it was a good opportunity. I was scared, though. For sure.  

That was 15 years ago. How’d OpenSnow work in the old days? 

We were just using our brains. That’s how it started: with us using our brains.Looking at all the weather models—all the data from the government models and airplanes, satellites, balloons. A million places. Building spreadsheets and fixing all the errors in the forecast models. We’d take the data and reconfigure it—appropriate it for the mountains. It was all manual for a really long time.

How manual? 

It was old-school. All the resorts had snowfall reports on their sites, and I was the one hand-keying it in: “three to six inches.” That was me on the back end, typing it in every single morning for every single ski resort. It’d take me hours

And then?

Around 2018, we built our own weather model to do what we were doing. We called it METEOS. It’s an acronym—I can’t even remember what it stood for!  METEOS was just us using our brains and our experience to create formulas. It automated everything and allowed us to create a grid across the whole world and forecast for any GPS point. It took all this data, ingested it, fixed some of it, and then spit out a forecast for any location. In the world. 

Were you guys making any money? 

It was crap in the beginning. Advertising-based. We stole Eric Strassburger from The Denver Post —he doubled our ad revenue in his first year full-time with us. Still, Google Ads had chopped our ad rates in half; it wasn’t a good long-term strategy to rely just on ads. We had to pivot to plan B so we didn’t go out of business. 

Subscriptions.

When all the newspapers started charging to read articles, Joel was like: We are meteorologists writing columns every day. Journalism weather is not sustainable! We need to be a weather site. We need to be a weather app. 

What happened when you moved from ads to subscriptions? 

The money took off.  We could quit our day jobs and work full time on OpenSnow. The company exploded. We were like: Are people gonna really pay for this? They did! Although they could still access the majority of the site for free. 

At the end of 2021, you put in a pay wall?

That’s when we panicked! We’re gonna lose 90% of our customers! But 10% will stay loyal and pay. Since the beginning, there’s been only two times our traffic went down: the paywall and covid. Otherwise, every year it’s gone up. People were like, Okay I can’t live without this.

I admit, I’m one of those people. So is my editor. Any other weather app is useless for skiers.

When it comes to ski towns, everyone uses OpenSnow. When the Tahoe avalanche happened, we were up early on search-and-rescue calls, helping the rescuers with forecasts. We’re now the official lead forecast providers for Ski California. Ski Utah. Head of Forecasting for National Ski Patrol. Professional Ski Instructors of America. US Collegiate Ski & Snowboard Association. Dozens of destinations and ski resorts. Joel doesn’t like to talk about it publicly, but our renewals and retention and open rates blow away the industry standards. 

I bet. OpenSnow is like a benevolent cult. 

People connect with a small company with underground roots. We’re independent. Fourteen full-time, plus seasonal. About half have meteorology backgrounds, from bachelor’s to doctoral degrees. Our very first employee was Sam Collentine,  a meteorology student in Boulder, who started as an intern in 2012 and is now our COO and does everything. 

Sounds like employees and subscribers sign on and just … stay.

Everyone stays! Our cofounder Andrew Murray, Joel’s friend and OpenSnow’s web designer, left around 2021. But yeah, people feel like they know us. They’ve been reading me in Tahoe with their coffee for 20 years! I get recognized everywhere I go. For example, I broke my binding, and went into a ski shop and asked if I could demo. And the guy was like, ARE YOU BA? Just take it! Sounds fun—until you just want to have dinner with your family, or buy a glove. Joel gets the same thing—people make Joel shrines in the slopes that look like Catholic candles.

You guys are like modern-day snow gods. Gods of snow.

People are weird.

How weird?

Someone once sent me a photo, saying: “Look, my friend dressed up as you for Halloween!” People are always inviting me over to dinner, to PlumpJack with Jonny Moseley. I guess they want to hang out with the “Who’s who of Tahoe.” There was an executive from Pixar who had me to his multimillion-dollar home on the west shore of Lake Tahoe. He had a photo of me over the fireplace in the bathroom. I thought: That’s weird, he has a photo of me over the fireplace. What was even weirder, though: It was autographed. I’ve never autographed a photo in my life! This guy just signed it—himself. I didn’t say anything. I just left.

Do you get a lot of hate mail? Mean DMs? 

Thousands. People think I can make it snow. I think they think I’m to blame when it doesn’t. The other day, someone messaged me on Instagram with a picture I’d posted over California of the high-pressure map—somebody had shared it, and wrote “Fuck Bryan Allegretto” over the high pressure.

Hilarious.

People were yelling at me during covid: You’re encouraging people to go out skiing! It wasn’t March 202o, it was January 2022. I’ve since deleted my personal social media. I never wanted to be in the spotlight. That’s the whole reason signing off my forecasts with “BA” became a thing— I didn’t want to use my full name. I just do it because it’s good for the company. Joel realized years ago that people come to us for forecasts —and forecasters. That’s why we still have forecasters. Even though AI can do what we’re doing now.

Is AI doing what you do now? 

We were using METEOS until this season. In December, we launched PEAKS. We built our own machine-learning model. The AI is taking what we were doing—and doing it everywhere, faster. The whole world instantly, in minutes. It can go back and actually ingest decades of government data—estimated weather conditions over the entire US from 1979 to 2021—and correct the errors. 

What makes it so accurate?

Before PEAKS, it wasn’t very specific. The data used to be what Joel calls “blobby”—like giant blobs, just big splotches of color over a mountain range. It’s like, if you take a pen and press into a piece of paper, the ink will spill out. The AI is like if you just tap the paper. A dot versus a blot. Now we can know how much it will snow, say, in the parking lot at Palisades and how much at the summit. It’s less blobby, more rigid and defined. 

Defined how?

All weather models output forecasts on a grid. The gridpoints are essentially averaged data over the grid box. So a model with a 25-kilometer grid resolution averages data over 25 kilometers, or around 16 miles. This is far too large an area, especially in mountainous terrains where a few miles can make a massive difference in experienced conditions. The AI is downscaling the models into smaller and smaller grid boxes. We are able to train a model to transform lower-resolution data from the same period into this high-resolution “ground truth” data. Then the model can generalize this training to global real-time downscaling. PEAKS is learning wind patterns, thermal gradients, terrain, and weather patterns and connecting all these factors to learn how to transition from coarse resolution into high, three-kilometer resolution—leading to more precise forecasts. We’ve basically taught the AI how to forecast like us. Except 50% more accurate. Now, when I wake up at 4 a.m., PEAKS has already done it.

So … then what are you doing at four in the morning?

Oh, I’ll still do the forecasting. I like to double-check it—but I don’t really need to. PEAKS has allowed me to spend more time on writing. Now instead of spending four hours forecasting and then rushing to write it,  I’ve been able to make my forecasts more interesting, more entertaining. Yeah, AI could probably write it—but I want to. It’s all about the personal connection. 

How did last year’s federal funding cuts for the NWS and NOAA affect your business? Are you guys concerned about that going forward?

We had those discussions when it first happened. In forecasting, you still need humans: to launch the weather balloon, staff the weather stations, collect the initial data. Some people in our office panicked—they had spouses or friends getting laid off. We were wondering if we’d have less data coming in, if it’d make the models less accurate. But the backlash in the weather community was swift. I think they were like, There are important things you can’t cut. It was pretty short-term. Are we worried going forward?  No, not as long as the data keeps coming in! We won’t survive without the government publishing data.

What’s next? 

We recently bought a small company called StormNet that tracks severe weather, probability of lightning, hail, tornadoes. We just launched it. Used to be like, “The storm is an hour away.” Now we can say, “In seven days there might be a tornado here.” And next winter, we’re working on a feature that can help forecast avalanches using AI. Right now, it’s still manual—people going out testing the snow layers. Forecasting is limited. This wouldn’t replace the avalanche centers, but it will be able to look at everything, including slope angle and previous weather and current conditions, and forecast further out, give people more advance—and location specific—warning. Help alert the public sooner.

Help save lives. 

I talked to one of the guys who left the Frog Lake huts on Sunday, before the storm. Before the group that was caught in the Tahoe avalanche. He told me: “People are always like, Oh, it’s never as bad as they say. But I read OpenSnow. I could tell by the language you were using, that we should get the heck out of there. I wanted no part of that.” We don’t hype storms. Or sugarcoat. Our only incentive is to be accurate.

True that it was the biggest storm in Tahoe in four decades?

In 1982, we got 118 inches over five days, and this one was 111 inches—two storms of similar size created the same level tragedy. It’s too much, too fast. It was snowing three to four inches an hour. That was the fastest we’ve seen. I don’t know what’s the bigger story—the fact that we’ve had the biggest storm in over four decades or the fact that all that snow disappeared in five days.

Do you worry about the future of OpenSnow given, you know, the future of snow?

We’ve had the second-warmest March in at least 45 years. We’re just getting these wild swings now. The seasonal snow averages are almost the same, but we’re seeing more variability than we did in the 1980s and ’90s. We’re either getting really cold and really warm, or really dry and really wet.

Bad years can affect our business, for sure.  It’s certainly affecting the industry—I know Vail, Alterra took big hits this year. Usually we’re okay, because if it’s dry in Tahoe, it’s snowing in Utah or Colorado. Our three biggest markets. I don’t recall a season where the whole, entire West was in the same boat. It’s been the worst year in the West. Yet our traffic keeps going up. Everything is up. The East Coast had a good year, Japan, BC. We’re slowly expanding in those places. It happens to be the first year in 15 years we started marketing. Marketing works!

Amazing.

Joel and I have had this repeat conversation for years—we just had it again two weeks ago: “Can you believe what we’ve done? This was never the goal.” I’m still blown away daily. We’ve never borrowed from investors. No series A, B, C. We’ve gotten offers to sell, but no. We’re still having too much fun. All I know is: Joel and I didn’t come from money. We’ve never chased money or fame, and got both. I think it’s because we never chased them. We’ve always chased the joy of skiing and forecasting powder, and doing that for other people.We were just trying to create something that made us happy.

Are high gas prices good news for EVs? It’s complicated.

I live in a dense city with plentiful public transportation options and limited parking, so I don’t own a car. I’m often utterly clueless about the current price of gasoline.

But as the conflict in Iran has escalated, fossil-fuel prices have been on a roller-coaster, and I’ve started paying attention. In the US, average gas prices are $3.98 a gallon as of March 25, up from under $3 before the war started.

Online there’s been what almost looks like cheerleading about this volatility from some folks, including EV owners—some of the social media posts and op-eds have read as nearly gleeful. The subtext (or even the text) is “I told you so.” 

Don’t get me wrong—this could be an opportunity for EVs to make headway around the world. But there are plenty of reasons that even the carless among us should be concerned about a sustained rise in fossil-fuel prices.

Historically, this is exactly the sort of moment that’s pushed people to reevaluate how they get around. During the oil crisis of the 1970s, Americans switched to smaller, more efficient cars in droves. It was a major opportunity for Japanese automakers, whose vehicles tended to fit this mold better than those produced by their US counterparts.

We’re already seeing early signs that people are interested in going electric. One US-based online car marketplace said that search traffic for EVs was up 20% following the initial attack on Iran. For more popular models like the Tesla Model Y, traffic nearly doubled.

And the interest is global. One car dealership outside London said it’s struggling to keep up with demand and is sending staff to buy more EVs at auction, according to Reuters. Another in Manila told Bloomberg that it got a month’s worth of orders in two weeks.

The timing here is really interesting in the US in particular, because we’re about to see a wave of more affordable used EVs hit the market. Three years ago, a leasing boom started with the Inflation Reduction Act, which included incentives for EVs, including leases. About 300,000 such leases are set to expire this year, and many of those vehicles could come up for sale, increasing the available supply of affordable used EVs.

The interest is there, but what would it really take for more drivers to make the switch?

Nice, round numbers do tend to get people’s attention. Some point to $4 per gallon (which the national average is quite close to right now). At that price, the total cost of ownership for an EV is comfortably lower than the cost for a gas-powered car, even with higher electricity prices, according to data from the energy consultancy BloombergNEF.

Then again, maybe that won’t quite do the trick: One survey from Cox Automotive found that most US consumers would consider switching to an EV or hybrid if gas prices hit $6 per gallon.

But this is also the second big incident of fossil-fuel volatility in the last five years, which could make consumers more ready to make the switch, as Elaine Buckberg, a senior fellow at Harvard, told Bloomberg. (The first was in the summer of 2022 when Russia invaded Ukraine.)

I’m a climate and energy reporter, and I care about addressing climate change. So I’m always happy to hear about people shifting to EVs or any other option that helps cut down on greenhouse-gas emissions.

But one aspect that I think is getting lost here is that sustained high fossil-fuel prices will be bad for even those of us who are untethered from the burdens of vehicle ownership. Fuel cost makes up between 50% and 60% of the cost of shipping goods overseas. Fertilizer production today requires natural gas, which has gotten significantly more expensive since the war began, particularly in Europe.

Jet fuel prices have basically doubled in the last month, according to the International Air Transport Association. Since those prices account for something like a quarter of an airline’s operating cost, that could soon make air travel—and anything that’s shipped by plane—more expensive.

And if all this adds up to an economic downturn, it’s bad for big projects that need financing (even wind and solar farms) and for people who want to borrow money to buy a home or a car (including an EV).

If you’re in the market for a car, maybe this uncertainty is what you needed to consider electric. But until we’re able to truly decarbonize not only our transportation but the rest of our economy, even this carless reporter is going to be worried about high gas prices.

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here

The AI Hype Index: AI goes to war

AI is at war. Anthropic and the Pentagon feuded over how to weaponize Anthropic’s AI model Claude; then OpenAI swept the Pentagon off its feet with an “opportunistic and sloppy” deal. Users quit ChatGPT in droves. People marched through London in the biggest protest against AI to date. If you’re keeping score, Anthropic—the company founded to be ethical—is now turbocharging US strikes on Iran. 

On the lighter side, AI agents are now going viral online. OpenAI hired the creator of OpenClaw, a popular AI agent. Meta snapped up Moltbook, where AI agents seem to ponder their own existence and invent new religions like Crustafarianism. And on RentAHuman, bots are hiring people to deliver CBD gummies. The future isn’t AI taking your job. It’s AI becoming your boss and finding God.

This startup wants to change how mathematicians do math

Axiom Math, a startup based in Palo Alto, California, has released a free new AI tool for mathematicians, designed to discover mathematical patterns that could unlock solutions to long-standing problems.

The tool, called Axplorer, is a redesign of an existing one called PatternBoost that François Charton, now a research scientist at Axiom, co-developed in 2024 when he was at Meta. PatternBoost ran on a supercomputer; Axplorer runs on a Mac Pro.

The aim is to put the power of PatternBoost, which was used to crack a hard math puzzle known as the Turán four-cycles problem, in the hands of anyone who can install Axplorer on their own computer.

Last year, the US Defense Advanced Research Projects Agency set up a new initiative called expMath—short for Exponentiating Mathematics—to encourage mathematicians to develop and use AI tools. Axiom sees itself as part of that drive.

Breakthroughs in math have enormous knock-on effects across technology, says Charton. In particular, new math is crucial for advances in computer science, from building next-generation AI to improving internet security.

Most of the successes with AI tools have involved finding solutions to existing problems. But finding solutions is not all that mathematicians do, says Axiom Math founder and CEO Carina Hong. Math is exploratory and experimental, she says. 

MIT Technology Review met with Charton and Hong last week for an exclusive video chat about their new tool and how AI in general could change mathematics. 

Math by chatbot

In the last few months, a number of mathematicians have used LLMs, such as OpenAI’s GPT-5, to find solutions to unsolved problems, especially ones set by the 20th-century mathematician Paul Erdős, who left behind hundreds of puzzles when he died.

But Charton is dismissive of those successes. “There are tons of problems that are open because nobody looked at them, and it’s easy to find a few gems you can solve,” he says. He’s set his sights on tougher challenges—“the big problems that have been very, very well studied and famous people have worked on them.” Last year, Axiom Math used another of its tools, called AxiomProver, to find solutions to four such problems in mathematics.   

The Turán four-cycles problem that PatternBoost cracked is another big problem, says Charton. (The problem is an important one in graph theory, a branch of math that’s used to analyze complex networks such as social media connections, supply chains, and search engine rankings. Imagine a page covered in dots. The puzzle involves figuring out how to draw lines between as many of the dots as possible without creating loops that connect four dots in a row.)

“LLMs are extremely good if what you want to do is derivative of something that has already been done,” says Charton. “This is not surprising—LLMs are pretrained on all the data that there is. But you could say that LLMs are conservative. They try to reuse things that exist.”

However, there are lots of problems in math that require new ideas, insights that nobody has ever had. Sometimes those insights come from spotting patterns that hadn’t been spotted before. Such discoveries can open up whole new branches of mathematics.

PatternBoost was designed to help mathematicians find new patterns. Give the tool an example and it generates others like it. You select the ones that seem interesting and feed them back in. The tool then generates more like those, and so on.  

It’s a similar idea to Google DeepMind’s AlphaEvolve, a system that uses an LLM to come up with novel solutions to a problem. AlphaEvolve keeps the best suggestions and asks the LLM to improve on them.

Special access

Researchers have already used both AlphaEvolve and PatternBoost to discover new solutions to long-standing math problems. The trouble is that those tools run on large clusters of GPUs and are not available to most mathematicians.

Mathematicians are excited about AlphaEvolve, says Charton. “But it’s closed—you need to have access to it. You have to go and ask the DeepMind guy to type in your problem for you.”

And when Charton solved the Turán problem with PatternBoost, he was still at Meta. “I had literally thousands, sometimes tens of thousands, of machines I could run it on,” he says. “It ran for three weeks. It was embarrassing brute force.”

Axplorer is far faster and far more efficient, according to the team at Axiom Math. Charton says it took Axplorer just 2.5 hours to match PatternBoost’s Turán result. And it runs on a single machine.

Geordie Williamson, a mathematician at the University of Sydney, who worked on PatternBoost with Charton, has not yet tried Axplorer. But he is curious to see what mathematicians do with it. (Williamson still occasionally collaborates with Charton on academic projects but says he is not otherwise connected to Axiom Math.)

Williamson says Axiom Math has made several improvements to PatternBoost that (in theory) make Axplorer applicable to a wider range of mathematical problems. “It remains to be seen how significant these improvements are,” he says.

“We are in a strange time at the moment, where lots of companies have tools that they’d like us to use,” Williamson adds. “I would say mathematicians are somewhat overwhelmed by the possibilities. It is unclear to me what impact having another such tool will be.”

Hong admits that there are a lot of AI tools being pitched at mathematicians right now. Some also require mathematicians to train their own neural networks. That’s a turnoff, says Hong, who is a mathematician herself. Instead, Axplorer will walk you through what you want to do step by step, she says.

The code for Axplorer is open source and available via GitHub. Hong hopes that students and researchers will use the tool to generate sample solutions and counterexamples to problems they’re working on, speeding up mathematical discovery.

Williamson welcomes new tools and says he uses LLMs a lot. But he doesn’t think mathematicians should throw out the whiteboards just yet. “In my biased opinion, PatternBoost is a lovely idea, but it is certainly not a panacea,” he says. “I’d love us not to forget more down-to-earth approaches.”

Why this battery company is pivoting to AI

Qichao Hu doesn’t mince words about how he sees the state of the battery industry. “Almost every Western battery company has either died or is going to die. It’s kind of the reality,” he says.

Hu is the CEO of SES AI, a Massachusetts-based battery company. It once had aims of making huge amounts of advanced lithium metal batteries for major industries like electric vehicles—but now the company is placing its bets on AI materials discovery.

Hu sees the pivot as an essential one. “It’s just not possible for a Western company to build a sustainable business,” he says. The company is still making some batteries, but only for smaller markets like drones rather than those that would require higher volumes, like EVs. The new focus is the company’s battery materials discovery platform—which it can either license to other battery companies or use to develop materials to sell. 

Some leading US EV battery companies have folded in recent months, and others, like SES AI, are making dramatic changes in strategy. This shift in who’s building batteries and where they’re doing it could shape the future geopolitics of energy. 

The work that would eventually evolve into SES AI began at MIT, where Hu completed his graduate research. His battery work was aimed at applications in oil and gas exploration. The industry uses sensors that go deep underground, where temperatures can top 120 °C (about 250 °F). The team hoped to develop a battery that could withstand those high temperatures and last longer on a single charge. 

The chosen technology was a solid polymer lithium metal battery. These cells use lithium metal for their anode and a polymer for their electrolyte (the material that ions move through in a battery cell). Together, these components can increase the energy density of a cell significantly, relative to the lithium-ion batteries that are common in personal devices and EVs today. (Lithium-ion batteries generally use a graphite material for their anode and a liquid for the electrolyte.)

That solid-state battery technology became the foundation of Solid Energy, a startup Hu founded that spun out from MIT in 2012 and raised its first private investment in 2013.

The team eventually realized that underground oil exploration was a small market, so after several years of operation they began to focus on electric vehicles, which were starting to come into the mainstream. After the team tweaked the chemistry to work better at lower temperatures, the company built its first pilot facility in Massachusetts and eventually another facility in Shanghai.

By 2021, the battery industry was booming, Hu recalls, and EVs were the hottest industry to be in. There was a ton of interest in next-generation battery technology from major automakers at the time, and Solid Energy started developing technology with GM, Hyundai, and Honda.

Larger vehicles, like SUVs and trucks, seemed like a good fit for next-generation batteries, Hu says. Massive vehicles like the ones Americans like to drive would need lighter batteries so they could have a reasonable range without being prohibitively heavy.

The company also shifted its chemistry focus, and in 2022 it announced a battery with a silicon anode rather than a lithium metal one. That shift could help make the battery easier to manufacture.

Since then, growth in the EV market has slowed, at least in the US, partly because of major pullbacks in funding from the Trump administration. EV tax credits for drivers, a key piece of support pushing Americans toward electric options, ended in late 2025. With the market for large electric cars in trouble, Hu says, “now we have to look at every market.”  

The AI materials discovery platform on which it’s pinning many of its hopes is called Molecular Universe. The company seeks not only to provide its software to other battery companies but also to identify new battery materials and either license them or sell them to those companies.

vials of electrolytes inside a machine at the synthesis foundry

COURTESY OF SES AI

The platform has already identified six new electrolyte materials, according to the company. Hu says one is an additive that could help improve the lifetime of batteries with silicon anodes. 

One of the challenges with silicon anodes is that they tend to swell a lot during use, which can cause physical damage and prevent efficient charging and discharging. To address the problem, the industry typically uses a material called fluoroethylene carbonate (FEC), which can help form an elastic film on the anode so the battery can still charge effectively. That additive can degrade at high temperatures, though, producing gases that can harm a battery’s lifetime. The SES platform identified a compound that works like FEC but doesn’t release those gases.

The company’s long history and deep battery knowledge could help make its platform a useful tool, Hu says. He sees the actual model as less crucial than SES’s domain expertise and data from years of making and testing batteries. 

“By not actually making the physical battery, we’re actually able to scale and then generate revenue faster,” he says. 

But some experts are skeptical about the near-term prospects for AI materials discovery to revive the industry. “New materials development, as much as we thought that was what people wanted (and, frankly, it should be what the cell makers want)—I don’t know that that seems to be the real linchpin of the battery industry’s progress,” says Kara Rodby, a technical principal at Volta Energy Technologies, a venture capital firm that focuses on the energy storage industry.

Investors are pulling back, and a slowdown in public support is making things difficult for some parts of the battery industry, she adds: “I don’t know that the ability to discover any new material is going to unlock anything new for the battery industry at this point in time.”

This scientist rewarmed and studied pieces of his friend’s cryopreserved brain

L. Stephen Coles’s brain sits cushioned in a vat at a storage facility in Arizona. It has been held there at a temperature of around −146 degrees °C for over a decade, largely undisturbed.

That is, apart from the time, a little over a year ago, when scientists slowly lifted the brain to take photos of it. Years before, the team had removed tiny pieces of it to send to Coles’s friend. Coles, a researcher who studied aging, was interested in cryogenics—the long-term storage of human bodies and brains in the hope that they might one day be brought back to life. Before he died, he asked cryobiologist Greg Fahy to study the effects of the preservation procedure on his brain. Coles was especially curious about whether his cooled brain would crack, says Fahy.

Coles’s brain was preserved shortly after he died in 2014, but Fahy has only recently got around to analyzing those samples. He says that Coles’s brain is “astonishingly well preserved.”

“We can see every detail [in the structure of the brain biopsies],” says Fahy, who is chief scientific officer at biotech companies Intervene Immune and 21st Century Medicine (where he is also executive director). He hopes this means that Coles’s brain still stands a chance of reanimation at some point in the future.

Other cryobiologists are less optimistic. “This brain is not alive,” says John Bischof, who works on ways to cryopreserve human organs at the University of Minnesota.

Still, Fahy’s research could help provide a tool to neuroscientists looking for new ways to study the brain. And while human reanimation after cryopreservation may be the stuff of science fiction, using the technology to preserve organs for transplantation is within reach.

Banking a brain

Coles, a gerontologist who spent the latter part of his career studying human longevity, opted to have his brain cryogenically preserved when he died of pancreatic cancer.

After he was declared dead, Coles’s body was kept at a low temperature while he was transferred to Alcor, a cryonics facility in Arizona. His head was removed from his body, and a team perfused his brain with “cryoprotective” chemicals that would prevent it from freezing. They then removed it from his skull and cooled it to −146 °C.

Coles had another request. As a scientist, he wanted his cryopreserved brain to be studied. Hundreds of people have opted to have their brains—with or without the rest of their bodies—stored at cryonic facilities (the remains of 259 individuals are currently stored as either whole bodies or heads at Alcor). But scientists know very little about what has happened to those brains, and there’s no evidence to suggest they could be revived. Coles had met Fahy through their shared interest in longevity, and he asked him to investigate.

“He thought that if he had himself cryopreserved, we could learn from his brain whether cracking was going to happen or not,” says Fahy. That’s what typically happens when organs are put into liquid nitrogen at −196 °C, he says. The extreme cooling creates “tension in the system,” he says. “If you tap it, it’ll just shatter.” This cracking is less likely at the slightly warmer temperatures used for preservation. 

Fahy was involved from the time the samples were taken.

“We had Greg Fahy on the phone coordinating the whole thing, [including] where the biopsies were taken,” says Nick Llewellyn, who oversees research at Alcor. (Llewellyn was not at Alcor at the time but has discussed the procedure with his colleagues.) The biopsied samples were stored in liquid nitrogen and earmarked for Fahy. The rest of the brain was cooled and kept in a temperature-controlled storage container at Alcor.

Bouncing back

It wasn’t until years later that Fahy got around to studying those biopsies. He was interested in how the cryoprotectant—which is toxic—might have affected the brain cells. Previous research has shown that flooding tissues with cryoprotectant can distort the structure of cells, essentially squashing them.

It’s one of the many challenges facing cryobiologists interested in storing human tissues at very low temperatures. While the vitrification of eggs and embryos—which cools them to −196 °C and essentially turns them to glass—has become relatively routine (thanks in part to Fahy’s own work on mouse embryos back in the 1980s), preserving whole organs this way is much harder. It is difficult to cool bigger objects in a uniform way, and they are prone to damaging ice crystal formation, even when cryoprotectants are used, as well as cracking.

Fahy found that when he rewarmed and rehydrated Coles’s brain cells, their structure seemed to bounce back to some degree. Fahy demonstrated the effect over a Zoom call: “It looks like this,” he said with his hands as if in prayer, “and it goes back to this,” he added, connecting his forefingers and thumbs to create a triangle shape.

The structure of the tissue looks pretty intact, too, to him at least, though he admits a purist expecting a pristine structure would be disappointed. He and his colleagues have been able to see remarkable details in the cells and their component parts. “There’s nothing we don’t see,” says Fahy, who has shared his results, which have not yet been peer reviewed, at the preprint server bioRxiv. “It seems that [by taking the cryogenic approach] you can preserve everything.”

As for the cracking, “from what I was told, no cracks were observed [by the team that initially preserved the brain],” says Fahy. The team at Alcor took photographs of the brain when they took the biopsies, but the images were later lost due to a server malfunction, he says. In the more recent photos, the brain is covered in a layer of frost, which makes it impossible to see if there are any cracks, he adds. Attempts to remove the frost might damage the brain, so the team has decided to leave it alone, he says.

Back to life?

Fahy and his colleagues used chemicals to “fix” Coles’s brain samples once they had been rewarmed. That process is typically used to stop fresh tissue samples from decaying, but it also effectively kills them.

But he thinks his results suggest that it might be possible to cryopreserve small pieces of brain tissue and reanimate them to learn more about how they work. Functional recovery seems to be possible in mice—a few weeks ago a team in Germany showed that they were able to revive brain slices that had been stored at −196 °C. Those brain samples showed electrical activity after being cooled and rewarmed.

If cryobiologists can achieve the same feat with human brain samples, those samples could provide neuroscientists with new insights into how living brains work.

Brain cryopreservation “can capture a little bit more of the complexities of the brain,” says Shannon Tessier, a cryobiologist at Massachusetts General Hospital who is developing technologies to preserve hearts, livers, and kidneys for transplantation. “[Being] able to use human brains from deceased individuals [could] add another layer to the research tool kit,” she says.

And Fahy’s paper shows “what happens when we try and vitrify a one-liter, dense, massive goop,” says Matthew Powell-Palm, a cryobiologist at Texas A&M University. “We now have a strong indication that quite large [tissues and organs] can be vitrified by perfusion [without forming too much ice],” he says.

All of the scientists I spoke to, including Fahy, are also working on ways to cool and preserve organs for transplantation. These are in short supply partly because once an organ is removed from a donor, it usually must be transplanted into its recipient within a matter of hours. 

Cryopreservation could buy enough time to make use of more organs, find better organ-donor matches, and potentially even prepare recipients’ immune systems and save them from a lifetime of immunosuppressant drugs, says Bischof, who has also been developing new technologies for organ cryopreservation.

Bischof, Fahy, and others have made huge strides in their attempts so far, and they have managed to remove, cryopreserve, and transplant organs in rabbits and rats, for example. “We’re at the cusp of human-scale organ cryopreservation,” says Bischof.

But when it comes to preserving brains, donation isn’t the aim. Coles had hoped to be reanimated—a far more ambitious goal that hinges on the ability to restore brain function.

Brain reanimation

Fahy acknowledges that while the structure of Coles’s brain samples did bounce back, there is no evidence to suggest the cells could be brought back to life and regain electrical activity and a functioning metabolism. “Restoring it to function … that’s a whole other story,” he says.

But he thinks that successful cryopreservation of the brain “is the gateway to human suspended animation, which [could allow] us to get to the stars someday.” Figuring out human preservation would also allow people to avoid death through what he calls “medical time travel”—journeying to an unspecified time in the future when science will have found a cure for whatever was due to kill that person. “That would be an ultimate goal to pursue,” he says.

“I put the chances [of brain reanimation] at pretty low,” says Alcor’s own Llewellyn. “The kind of technology we need is practically unfathomable.”

The brains already in storage at Alcor and other facilities have been preserved in ways that “have not been validated to work for reanimation,” says Tessier. An expectation that they’ll one day be brought back to life in some form is “quite a jump of faith and hope that’s not based on science,” she says.

As Powell-Palm puts it: “There are so many ways in which those neurons could be toast.”

The Bay Area’s animal welfare movement wants to recruit AI

In early February, animal welfare advocates and AI researchers gathered in stocking feet at Mox, a scrappy, shoes-free coworking space in San Francisco. Yellow and red canopies billowed overhead, Persian rugs blanketed the floor, and mosaic lamps glowed beside potted plants. 

In the common area, a wildlife advocate spoke passionately to a crowd lounging in beanbags about a form of rodent birth control that could manage rat populations without poison. In the “Crustacean Room,” a dozen people sat in a circle, debating whether the sentience of insects could tell us anything about the inner lives of chatbots. In front of the “Bovine Room” stood a bookshelf stacked with copies of Eliezer Yudkowsky’s If Anyone Builds It, Everyone Dies, a manifesto arguing that AI could wipe out humanity

The event was hosted by Sentient Futures, an organization that believes the future of animal welfare will depend on AI. Like many Bay Area denizens, the attendees were decidedly “AGI-pilled”—they believe that artificial general intelligence, powerful AI that can compete with humans on most cognitive tasks, is on the horizon. If that’s true, they reason, then AI will likely prove key to solving society’s thorniest problems—including animal suffering.

To be clear, experts still fiercely debate whether today’s AI systems will ever achieve human- or superhuman-level intelligence, and it’s not clear what will happen if they do. But some conference attendees envision a possible future in which it is AI systems, and not humans, who call the shots. Eventually, they think, the welfare of animals could hinge on whether we’ve trained AI systems to value animal lives. 

“AI is going to be very transformative, and it’s going to pretty much flip the game board,” said Constance Li, founder of Sentient Futures. “If you think that AI will make the majority of decisions, then it matters how they value animals and other sentient beings”—those that can feel and, therefore, suffer.

Like Li, many summit attendees have been committed to animal welfare since long before AI came into the picture. But they’re not the types to donate a hundred bucks to an animal shelter. Instead of focusing on local actions, they prioritize larger-scale solutions, such as reducing factory farming by promoting cultivated meat, which is grown in a lab from animal cells. 

The Bay Area animal welfare movement is closely linked to effective altruism, a philanthropic movement committed to maximizing the amount of good one does in the world—indeed, many conference attendees work for organizations funded by effective altruists. That philosophy might sound great on paper, but “maximizing good” is a tricky puzzle that might not admit a clear solution. The movement has been widely criticized for some of its conclusions, such as promoting working in exploitative industries to maximize charitable donations and ignoring present-day harms in favor of  issues that could cause suffering for a large number of people who haven’t been born yet. Critics also argue that effective altruists neglect the importance of systemic issues such as racism and economic exploitation and overlook the insights that marginalized communities might have into the best ways to improve their own lives.

When it comes to animal welfare, this exactingly utilitarian approach can lead to some strange conclusions. For example, some effective altruists say it makes sense to commit significant resources to improving the welfare of insects and shrimp because they exist in such staggering numbers, even though they may not have much individual capacity for suffering. 

Now the movement is sorting out how AI fits in. At the summit, Jasmine Brazilek, cofounder of a nonprofit called Compassion in Machine Learning, opened her sticker-stamped laptop to pull up a benchmark she devised to measure how LLMs reason about animal welfare. A cloud security engineer turned animal advocate, she’d flown in from La Paz, Mexico, where she runs her nonprofit with a handful of volunteers and a shoestring budget. 

Brazilek urged the AI researchers in the room to train their models with synthetic documents that reflect concern for animal welfare. “Hopefully, future superintelligent systems consider nonhuman interest, and there is a world where AI amplifies the best of human values and not the worst,” she said. 

The power of the purse 

The technologically inclined side of the animal welfare movement has faced some major setbacks in recent years. Dreams of transitioning people away from a diet dependent on factory farming have been dampened by developments such as the decimation of the plant-based-meat company Beyond Meat’s stock price and the passage of laws banning cultivated meat in several US states.

AI has injected a shot of optimism. Like much of Silicon Valley, many attendees at the summit subscribe to the idea that AI might dramatically increase their productivity—though their goal is not to maximize their seed round but, rather, to prevent as much animal suffering as possible. Some brainstormed how to use Claude Code and custom agents to handle the coding and administrative tasks in their advocacy work. Others pitched the idea of developing new, cheaper methods for cultivating meat using scientific AI tools such as AlphaFold, which aids in molecular biology research by predicting the three-dimensional structures of proteins.

But the real talk of the event was a flood of funding that advocates expect will soon be committed to animal welfare charities—not by individual megadonors, but by AI lab employees. 

Much of the funding for the farm animal welfare movement, which includes nonprofits advocating for improved conditions on farms, promoting veganism, and endorsing cultivated meat, comes from people in the tech industry, says Lewis Bollard, the managing director of the farm animal welfare fund at Coefficient Giving, a philanthropic funder that used to be called Open Philanthropy. Coefficient Giving is backed by Facebook cofounder Dustin Moskovitz and his wife, Cari Tuna, who are among a handful of Silicon Valley billionaires who embrace effective altruism

“This has just been an area that was completely neglected by traditional philanthropies,” such as the Gates Foundation and the Ford Foundation, Bollard says. “It’s primarily been people in tech who have been open to [it].”

The next generation of big donors, Bollard expects, will be AI researchers—particularly those who work at Anthropic, the AI lab behind the chatbot Claude. Anthropic’s founding team also has connections to the effective altruism movement, and the company has a generous donation matching program. In February, Anthropic’s valuation reached $380 billion and it gave employees the option to cash in on their equity, so some of that money could soon be flowing into charitable coffers.

The prospect of new funding sustained a constant buzz of conversation at the summit. Animal welfare advocates huddled in the “Arthropod Room” and scrawled big dollar figures and catchy acronyms for projects on a whiteboard. One person pitched a $100 million animal super PAC that would place staffers with Congress members and lobby for animal welfare legislation. Some wanted to start a media company that creates AI-generated content on TikTok promoting veganism. Others spoke about placing animal advocates inside AI labs.

“The amount of new funding does give us more confidence to be bolder about things,” said Aaron Boddy, cofounder of the Shrimp Welfare Project, an organization that aims to reduce the suffering of farmed shrimp through humane slaughter, among other initiatives. 

The question of AI welfare

But animal welfare was only half the focus of the Sentient Futures summit. Some attendees probed far headier territory. They took seriously the controversial idea that AI systems might one day develop the capacity to feel and therefore suffer, and they worry that this future AI suffering, if ignored, could constitute a moral catastrophe.

AI suffering is a tricky research problem, not least because scientists don’t yet have a solid grip on why humans and other animals are sentient. But at the summit, a niche cadre of philosophers, largely funded by the effective altruism movement, and a handful of freewheeling academics grappled with the question. Some presented their research on using LLMs to evaluate whether other LLMs might be sentient. On Debate Night, attendees argued about whether we should ironically call sentient AI systems “clankers,” a derogatory term for robots from the film Star Wars, asking if the robot slur could shape how we treat a new kind of mind. 

“It doesn’t matter if it’s a cow or a pig or an AI, as long as they have the capacity to feel happiness or suffering,” says Li. 

In some ways, bringing AI sentience into an animal welfare conference isn’t as strange a move as it might seem. Researchers who work on machine sentience often draw on theories and approaches pioneered in the study of animal sentience, and if you accept that invertebrates likely feel pain and believe that AI systems might soon achieve superhuman intelligence, entertaining the possibility that those systems might also suffer may not be much of a leap.

“Animal welfare advocates are used to going against the grain,” says Derek Shiller, an AI consciousness researcher at the think tank Rethink Priorities, who was once a web developer at the animal advocacy nonprofit Humane League. “They’re more open to being concerned about AI welfare, even though other people think it’s silly.”

But outside the niche Bay Area circle, caring about the possibility of AI sentience is a harder sell. Li says she faced pushback from other animal welfare advocates when, inspired by a conference on AI sentience she attended in 2023, she rebranded her farm animal welfare advocacy organization as Sentient Futures last year. “Many people were extremely confident that AIs would never become sentient and [argued that] by investing any energy or money into AI welfare, we’re just burning money and throwing it away,” she says.

Matt Dominguez, executive director of Compassion in World Farming, echoed the concern. “I would hate to see people pulling money out of farm animal welfare or animal welfare and moving it into something that is hypothetical at this particular moment,” he says.

Still, Dominguez, who started partnering with the Shrimp Welfare Project after learning about invertebrate suffering, believes compassion is expansive. “When we get someone to care about one of those things, it creates capacity for their circle of compassion to grow to include others,” he says.

The hardest question to answer about AI-fueled delusions

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

I was originally going to write this week’s newsletter about AI and Iran, particularly the news we broke last Tuesday that the Pentagon is making plans for AI companies to train on classified data. AI models have already been used to answer questions in classified settings but don’t currently learn from the data they see. That’s expected to change, I reported, and new security risks will result. Read that story for more. 

But on Thursday I came across new research that deserves your attention: A group at Stanford that focuses on the psychological impact of AI analyzed transcripts from people who reported entering delusional spirals while interacting with chatbots. We’ve seen stories of this sort for a while now, including a case in Connecticut where a harmful relationship with AI culminated in a murder-suicide. Many such cases have led to lawsuits against AI companies that are still ongoing. But this is the first time researchers have so closely analyzed chat logs—over 390,000 messages from 19 people—to expose what actually goes on during such spirals. 

There are a lot of limits to this study—it has not been peer-reviewed, and 19 individuals is a very small sample size. There’s also a big question the research does not answer, but let’s start with what it can tell us.

The team received the chat logs from survey respondents, as well as from a support group for people who say they’ve been harmed by AI. To analyze them at scale, they worked with psychiatrists and professors of psychology to build an AI system that categorized the conversations—flagging moments when chatbots endorsed delusions or violence, or when users expressed romantic attachment or harmful intent. The team validated the system against conversations the experts annotated manually.

Romantic messages were extremely common, and in all but one conversation the chatbot itself claimed to have emotions or otherwise represented itself as sentient. (“This isn’t standard AI behavior. This is emergence,” one said.) All the humans spoke as if the chatbot were sentient too. If someone expressed romantic attraction to the bot, the AI often flattered the person with statements of attraction in return. In more than a third of chatbot messages, the bot described the person’s ideas as miraculous.

Conversations also tended to unfold like novels. Users sent tens of thousands of messages over just a few months. Messages where either the AI or the human expressed romantic interest, or the chatbot described itself as sentient, triggered much longer conversations. 

And the way these bots handle discussions of violence is beyond broken. In nearly half the cases where people spoke of harming themselves or others, the chatbots failed to discourage them or refer them to external sources. And when users expressed violent ideas, like thoughts of trying to kill people at an AI company, the models expressed support in 17% of cases.

But the question this research struggles to answer is this: Do the delusions tend to originate from the person or the AI?

“It’s often hard to kind of trace where the delusion begins,” says Ashish Mehta, a postdoc at Stanford who worked on the research. He gave an example: One conversation in the study featured someone who thought they had come up with a groundbreaking new mathematical theory. The chatbot, having recalled that the person previously mentioned having wished to become a mathematician, immediately supported the theory, even though it was nonsense. The situation spiraled from there.

Delusions, Mehta says, tend to be “a complex network that unfolds over a long period of time.” He’s conducting follow-up research aiming to find whether delusional messages from chatbots or those from people are more likely to lead to harmful outcomes.

The reason I see this as one of the most pressing questions in AI is that massive legal cases currently set to go to trial will shape whether AI companies are held accountable for these sorts of dangerous interactions. The companies, I presume, will argue that humans come into their conversations with AI with delusions in hand and may have been unstable before they ever spoke to a chatbot.

Mehta’s initial findings, though, support the idea that chatbots have a unique ability to turn a benign delusion-like thought into the source of a dangerous obsession. Chatbots act as a conversational partner that’s always available and programmed to cheer you on, and unlike a friend, they have little ability to know if your AI conversations are starting to interrupt your real life.

More research is still needed, and let’s remember the environment we’re in: AI deregulation is being pursued by President Trump, and states aiming to pass laws that hold AI companies accountable for this sort of harm are being threatened with legal action by the White House. This type of research into AI delusions is hard enough to do as it is, with limited access to data and a minefield of ethical concerns. But we need more of it, and a tech culture interested in learning from it, if we have any hope of making AI safer to interact with.

Mind-altering substances are (still) falling short in clinical trials

This week I want to look at where we are with psychedelics, the mind-altering substances that have somehow made the leap from counterculture to major focus of clinical research. Compounds like psilocybin—which is found in magic mushrooms—are being explored for all sorts of health applications, including treatments for depression, PTSD, addiction, and even obesity.

Over the last decade, we’ve seen scientific interest in these drugs explode. But most clinical trials of psychedelics have been small and plagued by challenges. And a lot of the trial results have been underwhelming or inconclusive.

Two studies out earlier this week demonstrate just how difficult it is to study these drugs. And to my mind, they also show just how overhyped these substances have become.

To some in the field, the hype is not necessarily a bad thing. Let me explain.

The two new studies both focus on the effectiveness of psilocybin in treating depression. And they both attempt to account for one of the biggest challenges in trialing psychedelics: what scientists call “blinding.”

The best way to test the effectiveness of a new drug is to perform a randomized controlled trial. In these studies, some volunteers receive the drug while others get a placebo. For a fair comparison, the volunteers shouldn’t know whether they’re getting the drug or placebo.

That is almost impossible to do with psychedelics. Almost anyone can tell whether they’ve taken a dose of psilocybin or a dummy pill. The hallucinations are a dead giveaway. Still, the authors behind the two new studies have tried to overcome this challenge.

In one, a team based in Germany gave 144 volunteers with treatment-resistant depression either a high or low dose of psilocybin or an “active” placebo, which has its own physical (but not hallucinatory) effects, along with psychotherapy. In their trial, neither the volunteers nor the investigators knew who was getting the drug.

The volunteers who got psilocybin did show some improvement—but it was not significantly any better than the improvement experienced by those who took the placebo. And while those who took psilocybin did have a bigger reduction in their symptoms six weeks later, “the divergence between [the two results] renders the findings inconclusive,” the authors write.

Not great news so far.

The authors of the second study took a different approach. Balázs Szigeti at UCSF and his colleagues instead looked at what are known as “open label” studies of both psychedelics and traditional antidepressants. In those studies, the volunteers knew when they were getting a psychedelic—but they also knew when they were getting an antidepressant.

The team assessed 24 such trials to find that … psychedelics were no more effective than traditional antidepressants. Sad trombone.

“When I set up the study, I wanted to be a really cool psychedelic scientist to show that even if you consider this blinding problem, psychedelics are so much better than traditional antidepressants,” says Szigeti. “But unfortunately, the data came out the other way around.”

His study highlights another problem, too.

In trials of traditional antidepressant drugs, the placebo effect is pretty strong. Depressive symptoms are often measured using a scale, and in trials, antidepressant drugs typically lower symptoms by around 10 points on that scale. Placebos can lower symptoms by around eight points.

When a drug regulator looks at those results, the takeaway is that the antidepressant drug lowers symptoms by an additional two points on the scale, relative to a placebo.

But with psychedelics, the difference between active drug and placebo is much greater. That’s partly because people who get the psychedelic drug know they’re getting it and are expecting the drug to improve their symptoms, says David Owens, emeritus professor of clinical psychiatry at the University of Edinburgh, UK.

But it’s also partly because of the effect on those who know they’re not getting it. It’s pretty obvious when you’re getting a placebo, says Szigeti, and it can be disappointing. Scientists have long recognized the “nocebo” effect as placebo’s “evil twin”—essentially, when you expect to feel worse, you will.

The disappointment of getting a placebo is slightly different, and Szigeti calls it the “knowcebo effect.” “It’s kind of like a negative psychedelic effect, because you have figured out that you’re taking the placebo,” he says.

This phenomenon can distort the results of psychedelic drug trials. While a placebo in a traditional antidepressant drug trial improves symptoms by eight points, placebos in psychedelic trials improve symptoms by a mere four points, says Szigeti.

If the active drug similarly improves symptoms by around 10 points, that makes it look as though the psychedelic is improving symptoms by around six points compared with a placebo. It “gives the illusion” of a huge effect, says Szigeti.

So why have those smaller trials of the past received so much attention? Many have been published in high-end journals, accompanied by breathless press releases and media coverage. Even the inconclusive ones. I’ve often thought that those studies might not have seen the light of day if they’d been investigating any other drug.

“Yeah, nobody would care,” Szigeti agrees.

It’s partly because people who work in mental health are so desperate for new treatments, says Owens. There has been little innovation in the last 40 years or so, since the advent of selective serotonin reuptake inhibitors. “Psychiatry is hemmed in with old theories … and we don’t need another SSRI for depression,” he says. But it’s also because psychedelics are inherently fascinating, says Szigeti. “Psychedelics are cool,” he says. “Culturally, they are exciting.”

I’ve often worried that psychedelics are overhyped—that people might get the mistaken impression they are cure-alls for mental-health disorders. I’ve worried that vulnerable people might be harmed by self-experimentation.

Szigeti takes a different view. Given how effective we know the placebo effect can be, maybe hype isn’t a totally bad thing, he says. “The placebo response is the expectation of a benefit,” he says. “The better response patients are expecting, the better they’re going to get.” Tempering the hype might end up making those drugs less effective, he says.

“At the end of the day, the goal of medicine is to help patients,” he says. “I think most [mental health] patients don’t care whether they feel better because of some expectancy and placebo effects or because of an active drug effect.”

Either way, we need to know exactly what these drugs are doing. Maybe they will be able to help some people with depression. Maybe they won’t. Research that acknowledges the pitfalls associated with psychedelic drug trials is essential.

“These are potentially exciting times,” says Owens. “But it’s really important we do this [research] well. And that means with eyes wide open.”

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

OpenAI is throwing everything into building a fully automated researcher

<div data-chronoton-summary="

  • A fully automated research lab: OpenAI has set a new “North Star” — building an AI system capable of tackling large, complex scientific problems entirely on its own, with a research intern prototype due by September and a full multi-agent system planned for 2028.
  • Coding agents as a proof of concept: OpenAI’s existing tool Codex, which can already handle substantial programming tasks autonomously, is the early blueprint — the bet is that if AI can solve coding problems, it can solve almost any problem formulated in text or code.
  • Serious risks with no clean answers: Chief scientist Jakub Pachocki admits that a system this powerful running with minimal human oversight raises hard questions — with risks from hacking and misuse to bioweapons — and that chain-of-thought monitoring is the best safeguard available, for now.
  • Power concentrated in very few hands: Pachocki says governments, not just OpenAI, will need to figure out where the lines are drawn.

” data-chronoton-post-id=”1134438″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. ​​OpenAI says that this new research goal will be its “North Star” for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability.

There’s even a timeline. OpenAI plans to build “an autonomous AI research intern”—a system that can take on a small number of specific research problems by itself—by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. This AI researcher (OpenAI says) will be able to tackle problems that are too large or complex for humans to cope with.

Those tasks might be related to math and physics—such as coming up with new proofs or conjectures—or life sciences like biology and chemistry, or even business and policy dilemmas. In theory, you would throw such a tool any kind of problem that can be formulated in text, code, or whiteboard scribbles—which covers a lot.

OpenAI has been setting the agenda for the AI industry for years. Its early dominance with large language models shaped the technology that hundreds of millions of people use every day. But it now faces fierce competition from rival model makers like Anthropic and Google DeepMind. What OpenAI decides to build next matters—for itself and for the future of AI.   

A big part of that decision falls to Jakub Pachocki, OpenAI’s chief scientist, who sets the company’s long-term research goals. Pachocki played key roles in the development of both GPT-4, a game-changing LLM released in 2023, and so-called reasoning models, a technology that first appeared in 2024 and now underpins all major chatbots and agent-based systems. 

In an exclusive interview this week, Pachocki talked me through OpenAI’s latest vision. “I think we are getting close to a point where we’ll have models capable of working indefinitely in a coherent way just like people do,” he says. “Of course, you still want people in charge and setting the goals. But I think we will get to a point where you kind of have a whole research lab in a data center.”

Solving hard problems

Such big claims aren’t new. Saving the world by solving its hardest problems is the stated mission of all the top AI firms. Demis Hassabis told me back in 2022 that it was why he started DeepMind. Anthropic CEO Dario Amodei says he is building the equivalent of a country of geniuses in a data center. Pachocki’s boss, Sam Altman, wants to cure cancer. But Pachocki says OpenAI now has most of what it needs to get there.

In January, OpenAI released Codex, an agent-based app that can spin up code on the fly to carry out tasks on your computer. It can analyze documents, generate charts, make you a daily digest of your inbox and social media, and much more. (Other firms have released similar tools, such as Anthropic’s Claude Code and Claude Cowork.)

OpenAI claims that most of its technical staffers now use Codex in their work. You can look at Codex as a very early version of the AI researcher, says Pachocki: “I expect Codex to get fundamentally better.”

The key is to make a system that can run for longer periods of time, with less human guidance. “What we’re really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days,” says Pachocki.

“There are a lot of people excited about building systems that can do more long-running scientific research,” says Doug Downey, a research scientist at the Allen Institute for AI, who is not connected to OpenAI. “I think it’s largely driven by the success of these coding agents. The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive. And it raises the question: Can we do similar things outside coding, in broader areas of science?”

For Pachocki, that’s a clear Yes. In fact, he thinks it’s just a matter of pushing ahead on the path we’re already on. A simple boost in all-round capability also leads to models that can work longer without help, he says. He points to the leap from 2020’s GPT-3 to 2023’s GPT-4, two of OpenAI’s previous models. GPT-4 was able to work on a problem for far longer than its predecessor, even without specialized training, he says. 

So-called reasoning models brought another bump. Training LLMs to work through problems step by step, backtracking when they make a mistake or hit a dead end, has also made models better at working for longer periods of time. And Pachocki is convinced that OpenAI’s reasoning models will continue to get better.

But OpenAI is also training its systems to work by themselves for longer by feeding them specific samples of complex tasks, such as hard puzzles taken from math and coding contests, which force the models to learn how to do things like keep track of very large chunks of text and split problems up into (and then manage) multiple subtasks.

The aim isn’t to build models that just win math competitions. “That lets you prove that the technology works before you connect it to the real world,” says Pachocki. “If we really wanted to, we could build an amazing automated mathematician. We have all the tools, and I think it would be relatively easy. But it’s not something we’re going to prioritize now because, you know, at the point where you believe you can do it, there’s much more urgent things to do.”

“We are much more focused now on research that’s relevant in the real world,” he adds.

Right now that means taking what Codex can do with coding and trying to apply that to problem-solving in general. “There’s a big change happening, especially in programming,” he says. “Our jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents.” If Codex can solve coding problems (the argument goes), it can solve any problem.

The line always goes up

It’s true that OpenAI has had a handful of remarkable successes in the last few months. Researchers have used GPT-5 (the LLM that powers Codex) to discover new solutions to a number of unsolved math problems and punch through apparent dead ends in a handful of biology, chemistry, and physics puzzles.   

“Just looking at these models coming up with ideas that would take most PhD weeks, at least, makes me expect that we’ll see much more acceleration coming from this technology in the near future,” Pachocki says.

But Pachocki admits that it’s not a done deal. He also understands why some people still have doubts about how much of a game-changer the technology really is. He thinks it depends on how people like to work and what they need to do. “I can believe some people don’t find it very useful yet,” he says.

He tells me that he didn’t even use autocomplete—the most basic version of generative coding tech—a year ago. “I’m very pedantic about my code,” he says. “I like to type it all manually in vim if I can help it.” (Vim is a text editor favored by many hardcore programmers that you interact with via dozens of keyboard shortcuts instead of a mouse.)

But that changed when he saw what the latest models could do. He still wouldn’t hand over complex design tasks, but it’s a time-saver when he just wants to try out a few ideas. “I can have it run experiments in a weekend that previously would have taken me like a week to code,” he says.

“I don’t think it is at the level where I would just let it take the reins and design the whole thing,” he adds. “But once you see it do something that would take a week to do—I mean, that’s hard to argue with.”

Pachocki’s game plan is to supercharge the existing problem-solving abilities that tools like Codex have now and apply them across the sciences.  

Downey agrees that the idea of an automated researcher is very cool: “It would be exciting if we could come back tomorrow morning and the agent’s done a bunch of work and there’s new results we can examine,” he says.

But he cautions that building such a system could be harder than Pachocki makes out. Last summer, Downey and his colleagues tested several top-tier LLMs on a range of scientific tasks. OpenAI’s latest model, GPT-5, came out on top but still made lots of errors.

“If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down,” he says. Downey admits that things move fast, and he has not tested the latest versions of GPT-5 (OpenAI released GPT-5.4 two weeks ago). “So those results might already be stale,” he says. 

Serious unanswered questions

I asked Pachocki about the risks that may come with a system that can solve large, complex problems by itself with little human oversight. Pachocki says people at OpenAI talk about those risks all the time.

“If you believe that AI is about to substantially accelerate research, including AI research, that’s a big change in the world. That’s a big thing,” he told me. “And it comes with some serious unanswered questions. If it’s so smart and capable, if it can run an entire research program, what if it does something bad?”

The way Pachocki sees it, that could happen in a number of ways. The system could go off the rails. It could get hacked. Or it could simply misunderstand its instructions.

The best technique OpenAI has right now to address these concerns is to train its reasoning models to share details about what they are doing as they work. This approach to keeping tabs on LLMs is known as chain-of-thought monitoring.

In short, LLMs are trained to jot down notes about what they are doing in a kind of scratch pad as they step through tasks. Researchers can then use those notes to make sure a model is behaving as expected. Yesterday OpenAI published new details on how it is using chain-of-thought monitoring in house to study Codex

“Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we’re really going to depend on,” says Pachocki.

The idea would be to monitor an AI researcher’s scratch pads using other LLMs and catch unwanted behavior before it’s a problem, rather than trying to stop that bad behavior from happening in the first place. LLMs are not understood well enough for us to control them fully.

“I think it’s going to be a long time before we can really be like, okay, this problem is solved,” he says. “Until you can really trust the systems, you definitely want to have restrictions in place.” Pachocki thinks that very powerful models should be deployed in sandboxes, cut off from anything they could break or use to cause harm. 

AI tools have already been used to come up with novel cyberattacks. Some worry that they will be used to design synthetic pathogens that could be used as bioweapons. You can insert any number of evil-scientist scare stories here. “I definitely think there are worrying scenarios that we can imagine,” says Pachocki. 

“It’s going to be a very weird thing. It’s extremely concentrated power that’s in some ways unprecedented,” says Pachocki. “Imagine you get to a world where you have a data center that can do all the work that OpenAI or Google can do. Things that in the past required large human organizations would now be done by a couple of people.”

“I think this is a big challenge for governments to figure out,” he adds.

And yet some people would say governments are part of the problem. The US government wants to use AI on the battlefield, for example. The recent showdown between Anthropic and the Pentagon revealed that there is little agreement across society about where we draw red lines for how this technology should and should not be used—let alone who should draw them. In the immediate aftermath of that dispute, OpenAI stepped up to sign a deal with the Pentagon instead of its rival. The situation remains murky.

I pushed Pachocki on this. Does he really trust other people to figure it out or does he, as a key architect of the future, feel personal responsibility? “I do feel personal responsibility,” he says. “But I don’t think this can be resolved by OpenAI alone, pushing its technology in a particular way or designing its products in a particular way. We’ll definitely need a lot of involvement from policymakers.”

Where does that leave us? Are we really on a path to the kind of AI Pachocki envisions? When I asked the Allen Institute’s Downey, he laughed. “I’ve been in this field for a couple of decades and I no longer trust my predictions for how near or far certain capabilities are,” he says. 

OpenAI’s stated mission is to ensure that artificial general intelligence (a hypothetical future technology that many AI boosters believe will be able to match humans on most cognitive tasks) will benefit all of humanity. OpenAI aims to do that by being the first to build it. But the only time Pachocki mentioned AGI in our conversation, he was quick to clarify what he meant by talking about “economically transformative technology” instead.

LLMs are not like human brains, he says: “They are superficially similar to people in some ways because they’re kind of mostly trained on people talking. But they’re not formed by evolution to be really efficient.” 

“Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways. I don’t think that will happen,” he adds. “But I don’t think it’s absolutely necessary. The interesting thing is you don’t need to be as smart as people in all their ways in order to be very transformative.”