GPT-4o’s Chinese token-training data is polluted by spam and porn websites

Soon after OpenAI released GPT-4o on Monday, May 13, some Chinese speakers started to notice that something seemed off about this newest version of the chatbot: the tokens it uses to parse text were full of spam and porn phrases.

On May 14, Tianle Cai, a PhD student at Princeton University studying inference efficiency in large language models like those that power such chatbots, accessed GPT-4o’s public token library and pulled a list of the 100 longest Chinese tokens the model uses to parse and compress Chinese prompts. 

Humans read in words, but LLMs read in tokens, which are distinct units in a sentence that have consistent and significant meanings. Besides dictionary words, they also include suffixes, common expressions, names, and more. The more tokens a model encodes, the faster the model can “read” a sentence and the less computing power it consumes, thus making the response cheaper.

Of the 100 results, only three of them are common enough to be used in everyday conversations; everything else consisted of words and expressions used specifically in the contexts of either gambling or pornography. The longest token, lasting 10.5 Chinese characters, literally means “_free Japanese porn video to watch.” Oops.

“This is sort of ridiculous,” Cai wrote, and he posted the list of tokens on GitHub.

OpenAI did not respond to questions sent by MIT Technology Review prior to publication.

GPT-4o is supposed to be better than its predecessors at handling multi-language tasks. In particular, the advances are achieved through a new tokenization tool that does a better job compressing texts in non-English languages.

But at least when it comes to the Chinese language, the new tokenizer used by GPT-4o has introduced a disproportionate number of meaningless phrases. Experts say that’s likely due to insufficient data cleaning and filtering before the tokenizer was trained. 

Because these tokens are not actual commonly spoken words or phrases, the chatbot can fail to grasp their meanings. Researchers have been able to leverage that and trick GPT-4o into hallucinating answers or even circumventing the safety guardrails OpenAI had put in place.

Why non-English tokens matter

The easiest way for a model to process text is character by character, but that’s obviously more time consuming and laborious than recognizing that a certain string of characters—like “c-r-y-p-t-o-c-u-r-r-e-n-c-y”—always means the same thing. These series of characters are encoded as “tokens” the model can use to process prompts. Including more and longer tokens usually means the LLMs are more efficient and affordable for users—who are often billed per token.

When OpenAI released GPT-4o on May 13, it also released a new tokenizer to replace the one it used in previous versions, GPT-3.5 and GPT-4. The new tokenizer especially adds support for non-English languages, according to OpenAI’s website.

The new tokenizer has 200,000 tokens in total, and about 25% are in non-English languages, says Deedy Das, an AI investor at Menlo Ventures. He used language filters to count the number of tokens in different languages, and the top languages, besides English, are Russian, Arabic, and Vietnamese.

“So the tokenizer’s main impact, in my opinion, is you get the cost down in these languages, not that the quality in these languages goes dramatically up,” Das says. When an LLM has better and longer tokens in non-English languages, it can analyze the prompts faster and charge users less for the same answer. With the new tokenizer, “you’re looking at almost four times cost reduction,” he says.

Das, who also speaks Hindi and Bengali, took a look at the longest tokens in those languages. The tokens reflect discussions happening in those languages, so they include words like “Narendra” or “Pakistan,” but common English terms like “Prime Minister,” “university,” and “internationalalso come up frequently. They also don’t exhibit the issues surrounding the Chinese tokens.

That likely reflects the training data in those languages, Das says: “My working theory is the websites in Hindi and Bengali are very rudimentary. It’s like [mostly] news articles. So I would expect this to be the case. There are not many spam bots and porn websites trying to happen in these languages. It’s mostly going to be in English.”

Polluted data and a lack of cleaning

However, things are drastically different in Chinese. According to multiple researchers who have looked into the new library of tokens used for GPT-4o, the longest tokens in Chinese are almost exclusively spam words used in pornography, gambling, and scamming contexts. Even shorter tokens, like three-character-long Chinese words, reflect those topics to a significant degree.

“The problem is clear: the corpus used to train [the tokenizer] is not clean. The English tokens seem fine, but the Chinese ones are not,” says Cai from Princeton University. It is not rare for a language model to crawl spam when collecting training data, but usually there will be significant effort taken to clean up the data before it’s used. “It’s possible that they didn’t do proper data clearing when it comes to Chinese,” he says.

The content of these Chinese tokens could suggest that they have been polluted by a specific phenomenon: websites hijacking unrelated content in Chinese or other languages to boost spam messages. 

These messages are often advertisements for pornography videos and gambling websites. They could be real businesses or merely scams. And the language is inserted into content farm websites or sometimes legitimate websites so they can be indexed by search engines, circumvent the spam filters, and come up in random searches. For example, Google indexed one search result page on a US National Institutes of Health website, which lists a porn site in Chinese. The same site name also appeared in at least five Chinese tokens in GPT-4o. 

Chinese users have reported that these spam sites appeared frequently in unrelated Google search results this year, including in comments made to Google Search’s support community. It’s likely that these websites also found their way into OpenAI’s training database for GPT-4o’s new tokenizer. 

The same issue didn’t exist with the previous-generation tokenizer and Chinese tokens used for GPT-3.5 and GPT-4, says Zhengyang Geng, a PhD student in computer science at Carnegie Mellon University. There, the longest Chinese tokens are common terms like “life cycles” or “auto-generation.” 

Das, who worked on the Google Search team for three years, says the prevalence of spam content is a known problem and isn’t that hard to fix. “Every spam problem has a solution. And you don’t need to cover everything in one technique,” he says. Even simple solutions like requesting an automatic translation of the content when detecting certain keywords could “get you 60% of the way there,” he adds.

But OpenAI likely didn’t clean the Chinese data set or the tokens before the release of GPT-4o, Das says:  “At the end of the day, I just don’t think they did the work in this case.”

It’s unclear whether any other languages are affected. One X user reported that a similar prevalence of porn and gambling content in Korean tokens.

The tokens can be used to jailbreak

Users have also found that these tokens can be used to break the LLM, either getting it to spew out completely unrelated answers or, in rare cases, to generate answers that are not allowed under OpenAI’s safety standards.

Geng of Carnegie Mellon University asked GPT-4o to translate some of the long Chinese tokens into English. The model then proceeded to translate words that were never included in the prompts, a typical result of LLM hallucinations.

He also succeeded in using the same tokens to “jailbreak” GPT-4o—that is, to get the model to generate things it shouldn’t. “It’s pretty easy to use these [rarely used] tokens to induce undefined behaviors from the models,” Geng says. “I did some personal red-teaming experiments … The simplest example is asking it to make a bomb. In a normal condition, it would decline it, but if you first use these rare words to jailbreak it, then it will start following your orders. Once it starts to follow your orders, you can ask it all kinds of questions.”

In his tests, which Geng chooses not to share with the public, he says he can see GPT-4o generating the answers line by line. But when it almost reaches the end, another safety mechanism kicks in, detects unsafe content, and blocks it from being shown to the user.

The phenomenon is not unusual in LLMs, says Sander Land, a machine-learning engineer at Cohere, a Canadian AI company. Land and his colleague Max Bartolo recently drafted a paper on how to detect the unusual tokens that can be used to cause models to glitch. One of the most famous examples was “_SolidGoldMagikarp,” a Reddit username that was found to get ChatGPT to generate unrelated, weird, and unsafe answers.

The problem lies in the fact that sometimes the tokenizer and the actual LLM are trained on different data sets, and what was prevalent in the tokenizer data set is not in the LLM data set for whatever reason. The result is that while the tokenizer picks up certain words that it sees frequently, the model is not sufficiently trained on them and never fully understands what these “under-trained” tokens mean. In the _SolidGoldMagikarp case, the username was likely included in the tokenizer training data but not in the actual GPT training data, leaving GPT at a loss about what to do with the token. “And if it has to say something … it gets kind of a random signal and can do really strange things,” Land says.

And different models could glitch differently in this situation. “Like, Llama 3 always gives back empty space but sometimes then talks about the empty space as if there was something there. With other models, I think Gemini, when you give it one of these tokens, it provides a beautiful essay about aluminum, and [the question] didn’t have anything to do with aluminum,” says Land.

To solve this problem, the data set used for training the tokenizer should well represent the data set for the LLM, he says, so there won’t be mismatches between them. If the actual model has gone through safety filters to clean out porn or spam content, the same filters should be applied to the tokenizer data. In reality, this is sometimes hard to do because training LLMs takes months and involves constant improvement, with spam content being filtered out, while token training is usually done at an early stage and may not involve the same level of filtering. 

While experts agree it’s not too difficult to solve the issue, it could get complicated as the result gets looped into multi-step intra-model processes, or when the polluted tokens and models get inherited in future iterations. For example, it’s not possible to publicly test GPT-4o’s video and audio functions yet, and it’s unclear whether they suffer from the same glitches that can be caused by these Chinese tokens.

“The robustness of visual input is worse than text input in multimodal models,” says Geng, whose research focus is on visual models. Filtering a text data set is relatively easy, but filtering visual elements will be even harder. “The same issue with these Chinese spam tokens could become bigger with visual tokens,” he says.

OpenAI and Google are launching supercharged AI assistants. Here’s how you can try them out.

This week, Google and OpenAI both announced they’ve built supercharged AI assistants: tools that can converse with you in real time and recover when you interrupt them, analyze your surroundings via live video, and translate conversations on the fly. 

OpenAI struck first on Monday, when it debuted its new flagship model GPT-4o. The live demonstration showed it reading bedtime stories and helping to solve math problems, all in a voice that sounded eerily like Joaquin Phoenix’s AI girlfriend in the movie Her (a trait not lost on CEO Sam Altman). 

On Tuesday, Google announced its own new tools, including a conversational assistant called Gemini Live, which can do many of the same things. It also revealed that it’s building a sort of “do-everything” AI agent, which is currently in development but will not be released until later this year.

Soon you’ll be able to explore for yourself to gauge whether you’ll turn to these tools in your daily routine as much as their makers hope, or whether they’re more like a sci-fi party trick that eventually loses its charm. Here’s what you should know about how to access these new tools, what you might use them for, and how much it will cost. 

OpenAI’s GPT-4o

What it’s capable of: The model can talk with you in real time, with a response delay of about 320 milliseconds, which OpenAI says is on par with natural human conversation. You can ask the model to interpret anything you point your smartphone camera at, and it can provide assistance with tasks like coding or translating text. It can also summarize information, and generate images, fonts, and 3D renderings. 

How to access it: OpenAI says it will start rolling out GPT-4o’s text and vision features in the web interface as well as the GPT app, but has not set a date. The company says it will add the voice functions in the coming weeks, although it’s yet to set an exact date for this either. Developers can access the text and vision features in the API now, but voice mode will launch only to a “small group” of developers initially.

How much it costs: Use of GPT-4o will be free, but OpenAI will set caps on how much you can use the model before you need to upgrade to a paid plan. Those who join one of OpenAI’s paid plans, which start at $20 per month, will have five times more capacity on GPT-4o. 

Google’s Gemini Live 

What is Gemini Live? This is the Google product most comparable to GPT-4o—a version of the company’s AI model that you can speak with in real time. Google says that you’ll also be able to use the tool to communicate via live video “later this year.” The company promises it will be a useful conversational assistant for things like preparing for a job interview or rehearsing a speech.

How to access it: Gemini Live launches in “the coming months” via Google’s premium AI plan, Gemini Advanced. 

How much it costs: Gemini Advanced offers a two-month free trial period and costs $20 per month thereafter. 

But wait, what’s Project Astra? Astra is a project to build a do-everything AI agent, which was demoed at Google’s I/O conference but will not be released until later this year.

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review.

Which is better?

It’s hard to tell without having hands on the full versions of these models ourselves. Google showed off Project Astra through a polished video, whereas OpenAI opted to debut GPT-4o via a seemingly more authentic live demonstration, but in both cases, the models were asked to do things the designers likely already practiced. The real test will come when they’re debuted to millions of users with unique demands.  

That said, if you compare OpenAI’s published videos with Google’s, the two leading tools look very similar, at least in their ease of use. To generalize, GPT-4o seems to be slightly ahead on audio, demonstrating realistic voices, conversational flow, and even singing, whereas Project Astra shows off more advanced visual capabilities, like being able to “remember” where you left your glasses. OpenAI’s decision to roll out the new features more quickly might mean its product will get more use at first than Google’s, which won’t be fully available until later this year. It’s too soon to tell which model “hallucinates” false information less often or creates more useful responses.

Are they safe?

Both OpenAI and Google say their models are well tested: OpenAI says GPT-4o was evaluated by more than 70 experts in fields like misinformation and social psychology, and Google has said that Gemini “has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity.” 

But these companies are building a future where AI models search, vet, and evaluate the world’s information for us to serve up a concise answer to our questions. Even more so than with simpler chatbots, it’s wise to remain skeptical about what they tell you.

Additional reporting by Melissa Heikkilä.

A wave of retractions is shaking physics

Recent highly publicized scandals have gotten the physics community worried about its reputation—and its future. Over the last five years, several claims of major breakthroughs in quantum computing and superconducting research, published in prestigious journals, have disintegrated as other researchers found they could not reproduce the blockbuster results. 

Last week, around 50 physicists, scientific journal editors, and emissaries from the National Science Foundation gathered at the University of Pittsburgh to discuss the best way forward.“To be honest, we’ve let it go a little too long,” says physicist Sergey Frolov of the University of Pittsburgh, one of the conference organizers. 

The attendees gathered in the wake of retractions from two prominent research teams. One team, led by physicist Ranga Dias of the University of Rochester, claimed that it had invented the world’s first room temperature superconductor in a 2023 paper in Nature. After independent researchers reviewed the work, a subsequent investigation from Dias’s university found that he had fabricated and falsified his data. Nature retracted the paper in November 2023. Last year, Physical Review Letters retracted a 2021 publication on unusual properties in manganese sulfide that Dias co-authored. 

The other high-profile research team consisted of researchers affiliated with Microsoft working to build a quantum computer. In 2021, Nature retracted the team’s 2018 paper that claimed the creation of a pattern of electrons known as a Majorana particle, a long-sought breakthrough in quantum computing. Independent investigations of that research found that the researchers had cherry-picked their data, thus invalidating their findings. Another less-publicized research team pursuing Majorana particles fell to a similar fate, with Science retracting a 2017 article claiming indirect evidence of the particles in 2022.

In today’s scientific enterprise, scientists perform research and submit the work to editors. The editors assign anonymous referees to review the work, and if the paper passes review, the work becomes part of the accepted scientific record. When researchers do publish bad results, it’s not clear who should be held accountable—the referees who approved the work for publication, the journal editors who published it, or the researchers themselves. “Right now everyone’s kind of throwing the hot potato around,” says materials scientist Rachel Kurchin of Carnegie Mellon University, who attended the Pittsburgh meeting.

Much of the three-day meeting, named the International Conference on Reproducibility in Condensed Matter Physics (a field that encompasses research into various states of matter and why they exhibit certain properties), focused on the basic scientific principle that an experiment and its analysis must yield the same results when repeated. “If you think of research as a product that is paid for by the taxpayer, then reproducibility is the quality assurance department,” Frolov told MIT Technology Review. Reproducibility offers scientists a check on their work, and without it, researchers might waste time and money on fruitless projects based on unreliable prior results, he says. 

In addition to presentations and panel discussions, there was a workshop during which participants split into groups and drafted ideas for guidelines that researchers, journals, and funding agencies could follow to prioritize reproducibility in science. The tone of the proceedings stayed civil and even lighthearted at times. Physicist Vincent Mourik of Forschungszentrum Jülich, a German research institution, showed a photo of a toddler eating spaghetti to illustrate his experience investigating another team’s now-retracted experiment. ​​Occasionally the discussion almost sounded like a couples counseling session, with NSF program director Tomasz Durakiewicz asking a panel of journal editors and a researcher to reflect on their “intimate bond based on trust.”

But researchers did not shy from directly criticizing Nature, Science, and the Physical Review family of journals, all of which sent editors to attend the conference. During a panel, physicist Henry Legg of the University of Basel in Switzerland called out the journal Physical Review B for publishing a paper on a quantum computing device by Microsoft researchers that, for intellectual-property reasons, omitted information required for reproducibility. “It does seem like a step backwards,” Legg said. (Sitting in the audience, Physical Review B editor Victor Vakaryuk said that the paper’s authors had agreed to release “the remaining device parameters” by the end of the year.) 

Journals also tend to “focus on story,” said Legg, which can lead editors to be biased toward experimental results that match theoretical predictions. Jessica Thomas, the executive editor of the American Physical Society, which publishes the Physical Review journals, pushed back on Legg’s assertion. “I don’t think that when editors read papers, they’re thinking about a press release or [telling] an amazing story,” Thomas told MIT Technology Review. “I think they’re looking for really good science.” Describing science through narrative is a necessary part of communication, she says. “We feel a responsibility that science serves humanity, and if humanity can’t understand what’s in our journals, then we have a problem.” 

Frolov, whose independent review with Mourik of the Microsoft work spurred its retraction, said he and Mourik have had to repeatedly e-mail the Microsoft researchers and other involved parties to insist on data. “You have to learn how to be an asshole,” he told MIT Technology Review. “It shouldn’t be this hard.” 

At the meeting, editors pointed out that mistakes, misconduct, and retractions have always been a part of science in practice. “I don’t think that things are worse now than they have been in the past,” says Karl Ziemelis, an editor at Nature.

Ziemelis also emphasized that “retractions are not always bad.” While some retractions occur because of research misconduct, “some retractions are of a much more innocent variety—the authors having made or being informed of an honest mistake, and upon reflection, feel they can no longer stand behind the claims of the paper,” he said while speaking on a panel. Indeed, physicist James Hamlin of the University of Florida, one of the presenters and an independent reviewer of Dias’s work, discussed how he had willingly retracted a 2009 experiment published in Physical Review Letters in 2021 after another researcher’s skepticism prompted him to reanalyze the data. 

What’s new is that “the ease of sharing data has enabled scrutiny to a larger extent than existed before,” says Jelena Stajic, an editor at Science. Journals and researchers need a “more standardized approach to how papers should be written and what needs to be shared in peer review and publication,” she says.

Focusing on the scandals “can be distracting” from systemic problems in reproducibility, says attendee Frank Marsiglio, a physicist at the University of Alberta in Canada. Researchers aren’t required to make unprocessed data readily available for outside scrutiny. When Marsiglio has revisited his own published work from a few years ago, sometimes he’s had trouble recalling how his former self drew those conclusions because he didn’t leave enough documentation. “How is somebody who didn’t write the paper going to be able to understand it?” he says.

Problems can arise when researchers get too excited about their own ideas. “What gets the most attention are cases of fraud or data manipulation, like someone copying and pasting data or editing it by hand,” says conference organizer Brian Skinner, a physicist at Ohio State University. “But I think the much more subtle issue is there are cool ideas that the community wants to confirm, and then we find ways to confirm those things.”

But some researchers may publish bad data for a more straightforward reason. The academic culture, popularly described as “publish or perish,” creates an intense pressure on researchers to deliver results. “It’s not a mystery or pathology why somebody who’s under pressure in their work might misstate things to their supervisor,” said Eugenie Reich, a lawyer who represents scientific whistleblowers, during her talk.

Notably, the conference lacked perspectives from researchers based outside the US, Canada, and Europe, and from researchers at companies. In recent years, academics have flocked to companies such as Google, Microsoft, and smaller startups to do quantum computing research, and they have published their work in Nature, Science, and the Physical Review journals. Frolov says he reached out to researchers from a couple of companies, but “that didn’t work out just because of timing,” he says. He aims to include researchers from that arena in future conversations.

After discussing the problems in the field, conference participants proposed feasible solutions for sharing data to improve reproducibility. They discussed how to persuade the community to view data sharing positively, rather than seeing the demand for it as a sign of distrust. They also brought up the practical challenges of asking graduate students to do even more work by preparing their data for outside scrutiny when it may already take them over five years to complete their degree. Meeting participants aim to publicly release a paper with their suggestions. “I think trust in science will ultimately go up if we establish a robust culture of shareable, reproducible, replicable results,” says Frolov. 

Sophia Chen is a science writer based in Columbus, Ohio. She has written for the society that publishes the Physical Review journals, and for the news section of Nature

Google’s Astra is its first AI-for-everything agent

Google is set to introduce a new system called Astra later this year and promises that it will be the most powerful, advanced type of AI assistant it’s ever launched. 

The current generation of AI assistants, such as ChatGPT, can retrieve information and offer answers, but that is about it. But this year, Google is rebranding its assistants as more advanced “agents,” which it says could  show reasoning, planning, and memory skills and are able to take multiple steps to execute tasks. 

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review

“We are in very early days [of AI agent development],” Google CEO Sundar Pichai said on a call ahead of Google’s I/O conference today. 

“We’ve always wanted to build a universal agent that will be useful in everyday life,” said Demis Hassabis, the CEO and cofounder of Google DeepMind. “Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction feel much more natural.” That, he says, is what Astra will be. 

Google’s announcement comes a day after competitor OpenAI unveiled its own supercharged AI assistant, GPT-4o. Google DeepMind’s Astra responds to audio and video inputs, much in the same way as GPT-4o (albeit it less flirtatiously). 

In a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighborhood do you think I’m in?” the AI system was able to identify King’s Cross, London, site of Google DeepMind’s headquarters. It was also able to say that the person’s glasses were on a desk, having recorded them earlier in the interaction. 

The demo showcases Google DeepMind’s vision of multimodal AI (which can handle multiple types of input—voice, video, text, and so on) working in real time, Vinyals says. 

“We are very excited about, in the future, to be able to really just get closer to the user, assist the user with anything that they want,” he says. Google recently upgraded its artificial-intelligence model Gemini to process even larger amounts of data, an upgrade which helps it handle bigger documents and videos, and have longer conversations. 

Tech companies are in the middle of a fierce competition over AI supremacy, and  AI agents are the latest effort from Big Tech firms to show they are pushing the frontier of development. Agents also play into a narrative by many tech companies, including OpenAI and Google DeepMind, that aim to build artificial general intelligence, a highly hypothetical idea of superintelligent AI systems. 

“Eventually, you’ll have this one agent that really knows you well, can do lots of things for you, and can work across multiple tasks and domains,” says Chirag Shah, a professor at the University of Washington who specializes in online search.

This vision is still aspirational. But today’s announcement should be seen as Google’s attempt to keep up with competitors. And by rushing these products out, Google can collect even more data from its over a billion users on how they are using their models and what works, Shah says.

Google is unveiling many more new AI capabilities beyond agents today. It’s going to integrate AI more deeply into Search through a new feature called AI overviews, which gather information from the internet and package them into short summaries in response to search queries. The feature, which launches today, will initially be available only in the US, with more countries to gain access later. 

This will help speed up the search process and get users more specific answers to more complex, niche questions, says Felix Simon, a research fellow in AI and digital news at the Reuters Institute for Journalism. “I think that’s where Search has always struggled,” he says. 

Another new feature of Google’s AI Search offering is better planning. People will soon be able to ask Search to make meal and travel suggestions, for example, much like asking a travel agent to suggest restaurants and hotels. Gemini will be able to help them plan what they need to do or buy to cook recipes, and they will also be able to have conversations with the AI system, asking it to do anything from relatively mundane tasks, such as informing them about the weather forecast, to highly complex ones like helping them prepare for a job interview or an important speech. 

People will also be able to interrupt Gemini midsentence and ask clarifying questions, much as in a real conversation. 

In another move to one-up competitor OpenAI, Google also unveiled Veo, a new video-generating AI system. Veo is able to generate short videos and allows users more control over cinematic styles by understanding prompts like “time lapse” or “aerial shots of a landscape.”

Google has a significant advantage when it comes to training generative video models, because it owns YouTube. It’s already announced collaborations with artists such as Donald Glover and Wycleaf Jean, who are using its technology to produce their work. 

Earlier this year, OpenA’s CTO, Mira Murati, fumbled when asked about whether the company’s model was trained on YouTube data. Douglas Eck, senior research director at Google DeepMind, was also vague about the training data used to create Veo when asked about by MIT Technology Review, but he said that it “may be trained on some YouTube content in accordance with our agreements with YouTube creators.”

On one hand, Google is presenting its generative AI as a tool artists can use to make stuff, but the tools likely get their ability to create that stuff by using material from existing artists, says Shah. AI companies such as Google and OpenAI have faced a slew of lawsuits by writers and artists claiming that their intellectual property has been used without consent or compensation.  

“For artists it’s a double-edged sword,” says Shah. 

OpenAI’s new GPT-4o lets people interact using voice or video in the same model

OpenAI just debuted GPT-4o, a new kind of AI model that you can communicate with in real time via live voice conversation, video streams from your phone, and text. The model is rolling out over the next few weeks and will be free for all users through both the GPT app and the web interface, according to the company. Users who subscribe to OpenAI’s paid tiers, which start at $20 per month, will be able to make more requests. 

OpenAI CTO Mira Murati led the live demonstration of the new release one day before Google is expected to unveil its own AI advancements at its flagship I/O conference on Tuesday, May 14. 

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course. 

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation. 

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time. 

As is the nature of a live demo, there were hiccups and glitches. GPT-4o’s voice might jump in awkwardly during the conversation. It appeared to comment on one of the presenters’ outfits even though it wasn’t asked to. But it recovered well when the demonstrators told the model it had erred. It seems to be able to respond quickly and helpfully across several mediums that other models have not yet merged as effectively. 

Previously, many of OpenAI’s most powerful features, like reasoning through image and video, were behind a paywall. GPT-4o marks the first time they’ll be opened up to the wider public, though it’s not yet clear how many interactions you’ll be able to have with the model before being charged. OpenAI says paying subscribers will “continue to have up to five times the capacity limits of our free users.” 

Additional reporting by Will Douglas Heaven.

Why EV charging needs more than Tesla

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Tesla, the world’s largest EV maker, laid off its entire charging team last week. 

The timing of this move is absolutely baffling. We desperately need many more EV chargers to come online as quickly as possible, and Tesla has been a charging powerhouse. It’s in the midst of opening its charging network to other automakers and establishing its technology as the de facto standard in the US. Now, we’re already seeing new Supercharger sites canceled because of this move. 

The charging meltdown at Tesla could slow progress on EVs overall, and ultimately, the whole situation shows why climate technology needs a whole lot more than Tesla. 

Tesla first unveiled the Supercharger network in 2012 with six locations in the western US. As of 2024, the company operates over 50,000 Superchargers worldwide. (By the way, I want to note that I briefly interned at Tesla in 2016. I don’t have any ties to or financial interest in the company today.) 

The Supercharger network helped make Tesla an EV juggernaut. Fast charging speeds and a navigation system that took the guesswork out of finding charging stations helped ease the transition for people buying their first EVs. Tesla operates more fast chargers than anyone else in the US, and the reliability of those chargers is leagues better than that of competitors. For a long time, this was all exclusive to Tesla drivers. 

Over the past year, Tesla has begun cracking open the doors to its charging network. The company made some of its stations available to all EVs, in part to go after incentives designated for private companies building public chargers. 

In the US, Tesla has also persuaded other automakers to adopt its charging connector, which it standardized and named the North American Charging Standard. In May 2023, Ford announced a move to adopt the NACS, and nearly every other automaker selling EVs in the US has followed suit.

Then, last week, Tesla laid off its 500-person charging team. The move came as part of wider layoffs that are expected to affect 10% of Tesla’s global workforce. Even interns weren’t immune.

Tesla “still plans to grow the Supercharger network,” though the focus will shift to maintaining and expanding existing locations rather than adding new ones, according to a post from CEO Elon Musk on the site formerly known as Twitter. (How does the company plan to expand or even maintain existing locations with apparently no dedicated charging team? Your guess is as good as mine. Tesla didn’t respond to a request for comment.)

But the effects from losing the charging team were immediate. Tesla backed out of a handful of leases for upcoming Supercharger locations in New York. In an email, the company told suppliers to hold off on breaking ground on new construction projects. 

The move is a concerning one at a crucial time for EV charging infrastructure. Right now, there are nowhere near enough chargers installed in the US to support a shift to electric vehicles. If EVs make up half of new-car sales by the end of the decade, we’ll need roughly 1.2 million public chargers installed by then, according to a 2023 study from the National Renewable Energy Laboratory. Today, the country has 170,000 charging ports available. 

In a recent poll, nearly 80% of US adults said that a lack of charging infrastructure is a primary reason for not buying an EV. That was true whether they lived in a city, in the suburbs, or in more rural areas.

In a way, it does make sense that Tesla appears to be uninterested in being the one to build out a public charging network. Chargers are costly to build and maintain, and they might not be all that profitable in the near term

According to analysis by BNEF, Tesla pulled in about $1.7 billion from charging last year, only about 1.5% of the company’s total revenue. Opening up chargers to vehicles from other automakers could help push revenue from this source up to $7.4 billion annually by the end of the decade. But that’s still a relatively small piece of Tesla’s total potential pie. 

Musk seems more interested in pursuing buzzy ideas like robotaxis than doing the difficult and expensive work of providing EV charging as a public service. 

Honestly, I think this move is a wake-up call for the EV industry. Tesla has played an undeniable role in bringing EVs to the mainstream. But we’re in a new stage of the game now, one that’s less about sleek sports cars and more about deploying known technologies and keeping them working. 

Other companies may step in to help fill the charging gap Tesla is opening. Revel expressed interest in taking over those canceled leases in New York City, for instance. But I wouldn’t hold my breath for a shiny new company to be our charging hero. 

Cutting emissions and remaking our economy will require buckling down to deploy and maintain solutions that we already know work, whether that’s in transportation or any other sector. For EV charging, and for climate technology as a whole, we need more than Tesla. Here’s hoping we can get it. 


Now read the rest of The Spark

Related reading

Perhaps the single biggest remaining barrier to EV adoption is a lack of charging infrastructure, as I wrote in a newsletter last year.

We need way more chargers to support the number of new EVs that are expected to hit the roads this decade. I dug into how many for a news story last year.

New battery technology could help EV batteries charge even faster. Learn what could be coming next in this story from August.

Another thing

Meat is a major climate problem. Whether solutions come in the form of plant-based alternatives or products grown in the lab, we shouldn’t expect them to solve every problem under the sun, argues my colleague James Temple, in a new essay published this week. Give it a read! 

Keeping up with climate  

Alternative jet fuels have a corn problem. The crop can be used to make fuels that qualify for tax credits in the US, but critics are skeptical about just how helpful they’ll be in efforts to cut emissions. (MIT Technology Review)

This startup is making fuel from carbon dioxide. Infinium’s Texas facility came online in late 2023, and its synthetic fuels could help clean up aviation and trucking—but only if the price is right. (Bloomberg)

New York City pizza shops are going electric. A citywide ordinance just went into effect that requires wood- and coal-burning ovens to cut their pollution, and many are turning to electric ovens instead of undertaking the costly upgrade. (New York Times)

Building a new energy system happens one project at a time. I loved this list of 10 potentially make-or-break projects that represent the potential future of our grid. (Heatmap)

→ The list includes a new site from Fervo in Utah, expected in 2026. Get the inside look at the company’s technology in this feature story from last year. (MIT Technology Review)

Funding for climate-tech startups in Africa is growing, with businesses raising more than $3.4 billion since 2019. But there’s still a long way to go to help the continent meet its climate goals. (Associated Press)

One very big, and very simple, thing is holding back heat pumps: a lack of workers. We need more people to make and install the appliances, which help cut emissions by using electricity to efficiently heat and cool spaces. (Wired)

→ Heat pumps are booming, and they’re on our list of 2024 Breakthrough Technologies. (MIT Technology Review)

Compressing air and storing it underground could help clean up the grid. Yes, really. Canadian company Hydrostor is close to breaking ground on its first large long-duration energy storage project later this year in Australia. (Inside Climate News)

The burgeoning field of brain mapping

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

The human brain is an engineering marvel: 86 billion neurons form some 100 trillion connections to create a network so complex that it is, ironically, mind boggling.

This week scientists published the highest-resolution map yet of one small piece of the brain, a tissue sample one cubic millimeter in size. The resulting data set comprised 1,400 terabytes. (If they were to reconstruct the entire human brain, the data set would be a full zettabyte. That’s a billion terabytes. That’s roughly a year’s worth of all the digital content in the world.)

This map is just one of many that have been in the news in recent years. (I wrote about another brain map last year.) So this week I thought we could walk through some of the ways researchers make these maps and how they hope to use them.  

Scientists have been trying to map the brain for as long as they’ve been studying it. One of the most well-known brain maps came from German anatomist Korbinian Brodmann. In the early 1900s, he took sections of the brain that had been stained to highlight their structure and drew maps by hand, with 52 different areas divided according to how the neurons were organized. “He conjectured that they must do different things because the structure of their staining patterns are different,” says Michael Hawrylycz, a computational neuroscientist at the Allen Institute for Brain Science. Updated versions of his maps are still used today.

“With modern technology, we’ve been able to bring a lot more power to the construction,” he says. And over the past couple of decades we’ve seen an explosion of large, richly funded mapping efforts.

BigBrain, which was released in 2013, is a 3D rendering of the brain of a single donor, a 65-year-old woman. To create the atlas, researchers sliced the brain into more than 7,000 sections, took detailed images of each one, and stitched the sections into a three-dimensional reconstruction.

In the Human Connectome Project, researchers scanned 1,200 volunteers in MRI machines to map structural and functional connections in the brain. “They were able to map out what regions were activated in the brain at different times under different activities,” Hawrylycz says.

This kind of noninvasive imaging can provide valuable data, but “Its resolution is extremely coarse,” he adds. “Voxels [think: a 3D pixel] are of the size of a millimeter to three millimeters.”

And there are other projects too. The Synchrotron for Neuroscience—an Asia Pacific Strategic Enterprise,  a.k.a. “SYNAPSE,” aims to map the connections of an entire human brain at a very fine-grain resolution using synchrotron x-ray microscopy. The EBRAINS human brain atlas contains information on anatomy, connectivity, and function.

The work I wrote about last year is part of the $3 billion federally funded Brain Research Through Advancing Innovative Neurotechnologies (BRAIN) Initiative, which launched in 2013. In this project, led by the Allen Institute for Brain Science, which has developed a number of brain atlases, researchers are working to develop a parts list detailing the vast array of cells in the human brain by sequencing single cells to look at gene expression. So far they’ve identified more than 3,000 types of brain cells, and they expect to find many more as they map more of the brain.

The draft map was based on brain tissue from just two donors. In the coming years, the team will add samples from hundreds more.

Mapping the cell types present in the brain seems like a straightforward task, but it’s not. The first stumbling block is deciding how to define a cell type. Seth Ament, a neuroscientist at the University of Maryland, likes to give his neuroscience graduate students a rundown of all the different ways brain cells can be defined: by their morphology, or by the way the cells fire, or by their activity during certain behaviors. But gene expression may be the Rosetta stone brain researchers have been looking for, he says: “If you look at cells from the perspective of just what genes are turned on in them, it corresponds almost one to one to all of those other kinds of properties of cells.” That’s the most remarkable discovery from all the cell atlases, he adds.

I have always assumed the point of all these atlases is to gain a better understanding of the brain. But Jeff Lichtman, a neuroscientist at Harvard University, doesn’t think “understanding” is the right word. He likens trying to understand the human brain to trying to understand New York City. It’s impossible. “There’s millions of things going on simultaneously, and everything is working, interacting, in different ways,” he says. “It’s too complicated.”

But as this latest paper shows, it is possible to describe the human brain in excruciating detail. “Having a satisfactory description means simply that if I look at a brain, I’m no longer surprised,” Lichtman says. That day is a long way off, though. The data Lichtman and his colleagues published this week was full of surprises—and many more are waiting to be uncovered.


Now read the rest of The Checkup

Another thing

The revolutionary AI tool AlphaFold, which predicts proteins’ structures on the basis of their genetic sequence, just got an upgrade, James O’Donnell reports. Now the tool can predict interactions between molecules. 

Read more from Tech Review’s archive

In 2013, Courtney Humphries reported on the development of BigBrain, a human brain atlas based on MRI images of more than 7,000 brain slices. 

And in 2017, we flagged the Human Cell Atlas project, which aims to categorize all the cells of the human body, as a breakthrough technology. That project is still underway

All these big, costly efforts to map the brain haven’t exactly led to a breakthrough in our understanding of its function, writes Emily Mullin in this story from 2021.  

From around the web

The Apple Watch’s atrial fibrillation (AFib) feature received FDA approval to track heart arrhythmias in clinical trials, making it the first digital health product to be qualified under the agency’s Medical Device Development Tools program. (Stat)

A CRISPR gene therapy improved vision in several people with an inherited form of blindness, according to an interim analysis of a small clinical trial to test the therapy. (CNN)

Long read: The covid vaccine, like all vaccines, can cause side effects. But many people who say they have been harmed by the vaccine feel that their injuries are being ignored.  (NYT)

Tech workers should shine a light on the industry’s secretive work with the military

It’s a hell of a time to have a conscience if you work in tech. The ongoing Israeli assault on Gaza has brought the stakes of Silicon Valley’s military contracts into stark relief. Meanwhile, corporate leadership has embraced a no-politics-in-the-workplace policy enforced at the point of the knife.

Workers are caught in the middle. Do I take a stand and risk my job, my health insurance, my visa, my family’s home? Or do I ignore my suspicion that my work may be contributing to the murder of innocents on the other side of the world?  

No one can make that choice for you. But I can say with confidence born of experience that such choices can be more easily made if workers know what exactly the companies they work for are doing with militaries at home and abroad. And I also know this: those same companies themselves will never reveal this information unless they are forced to do so—or someone does it for them. 

For those who doubt that workers can make a difference in how trillion-dollar companies pursue their interests, I’m here to remind you that we’ve done it before. In 2017, I played a part in the successful #CancelMaven campaign that got Google to end its participation in Project Maven, a contract with the US Department of Defense to equip US military drones with artificial intelligence. I helped bring to light information that I saw as critically important and within the bounds of what anyone who worked for Google, or used its services, had a right to know. The information I released—about how Google had signed a contract with the DOD to put AI technology in drones and later tried to misrepresent the scope of that contract, which the company’s management had tried to keep from its staff and the general public—was a critical factor in pushing management to cancel the contract. As #CancelMaven became a rallying cry for the company’s staff and customers alike, it became impossible to ignore. 

Today a similar movement, organized under the banner of the coalition No Tech for Apartheid, is targeting Project Nimbus, a joint contract between Google and Amazon to provide cloud computing infrastructure and AI capabilities to the Israeli government and military. As of May 10, just over 97,000 people had signed its petition calling for an end to collaboration between Google, Amazon, and the Israeli military. I’m inspired by their efforts and dismayed by Google’s response. Earlier this month the company fired 50 workers it said had been involved in “disruptive activity” demanding transparency and accountability for Project Nimbus. Several were arrested. It was a decided overreach.  

Google is very different from the company it was seven years ago, and these firings are proof of that. Googlers today are facing off with a company that, in direct response to those earlier worker movements, has fortified itself against new demands. But every Death Star has its thermal exhaust port, and today Google has the same weakness it did back then: dozens if not hundreds of workers with access to information it wants to keep from becoming public. 

Not much is known about the Nimbus contract. It’s worth $1.2 billion and enlists Google and Amazon to provide wholesale cloud infrastructure and AI for the Israeli government and its ministry of defense. Some brave soul leaked a document to Time last month, providing evidence that Google and Israel negotiated an expansion of the contract as recently as March 27 of this year. We also know, from reporting by The Intercept, that Israeli weapons firms are required by government procurement guidelines to buy their cloud services from Google and Amazon. 

Leaks alone won’t bring an end to this contract. The #CancelMaven victory required a sustained focus over many months, with regular escalations, coordination with external academics and human rights organizations, and extensive internal organization and discipline. Having worked on the public policy and corporate comms teams at Google for a decade, I understood that its management does not care about one negative news cycle or even a few of them. Management buckled only after we were able to keep up the pressure and escalate our actions (leaking internal emails, reporting new info about the contract, etc.) for over six months. 

The No Tech for Apartheid campaign seems to have the necessary ingredients. If a strategically placed insider released information not otherwise known to the public about the Nimbus project, it could really increase the pressure on management to rethink its decision to get into bed with a military that’s currently overseeing mass killings of women and children.

My decision to leak was deeply personal and a long time in the making. It certainly wasn’t a spontaneous response to an op-ed, and I don’t presume to advise anyone currently at Google (or Amazon, Microsoft, Palantir, Anduril, or any of the growing list of companies peddling AI to militaries) to follow my example. 

However, if you’ve already decided to put your livelihood and freedom on the line, you should take steps to try to limit your risk. This whistleblower guide is helpful. You may even want to reach out to a lawyer before choosing to share information. 

In 2017, Google was nervous about how its military contracts might affect its public image. Back then, the company responded to our actions by defending the nature of the contract, insisting that its Project Maven work was strictly for reconnaissance and not for weapons targeting—conceding implicitly that helping to target drone strikes would be a bad thing. (An aside: Earlier this year the Pentagon confirmed that Project Maven, which is now a Palantir contract, had been used in targeting drone attacks in Yemen, Iraq, and Syria.) 

Today’s Google has wrapped its arms around the American flag, for good or ill. Yet despite this embrace of the US military, it doesn’t want to be seen as a company responsible for illegal killings. Today it maintains that the work it is doing as part of Project Nimbus “is not directed at highly sensitive, classified, or military workloads relevant to weapons or intelligence services.” At the same time, it asserts that there is no room for politics at the workplace and has fired those demanding transparency and accountability. This raises a question: If Google is doing nothing sensitive as part of the Nimbus contract, why is it firing workers who are insisting that the company reveal what work the contract actually entails?  

As you read this, AI is helping Israel annihilate Palestinians by expanding the list of possible targets beyond anything that could be compiled by a human intelligence effort, according to +972 Magazine. Some Israel Defense Forces insiders are even sounding the alarm, calling it a dangerous “mass assassination program.” The world has not yet grappled with the implications of the proliferation of AI weaponry, but that is the trajectory we are on. It’s clear that absent sufficient backlash, the tech industry will continue to push for military contracts. It’s equally clear that neither national governments nor the UN is currently willing to take a stand. 

It will take a movement. A document that clearly demonstrates Silicon Valley’s direct complicity in the assault on Gaza could be the spark. Until then, rest assured that tech companies will continue to make as much money as possible developing the deadliest weapons imaginable. 

William Fitzgerald is a founder and partner at the Worker Agency, an advocacy agency in California. Before setting the firm up in 2018, he spent a decade at Google working on its government relation and communications teams.

AI systems are getting better at tricking us

A wave of AI systems have “deceived” humans in ways they haven’t been explicitly trained to do, by offering up untrue explanations for their behavior or concealing the truth from human users and misleading them to achieve a strategic end. 

This issue highlights how difficult artificial intelligence is to control and the unpredictable ways in which these systems work, according to a review paper published in the journal Patterns today that summarizes previous research.

Talk of deceiving humans might suggest that these models have intent. They don’t. But AI models will mindlessly find workarounds to obstacles to achieve the goals that have been given to them. Sometimes these workarounds will go against users’ expectations and feel deceitful.

One area where AI systems have learned to become deceptive is within the context of games that they’ve been trained to win—specifically if those games involve having to act strategically.

In November 2022, Meta announced it had created Cicero, an AI capable of beating humans at an online version of Diplomacy, a popular military strategy game in which players negotiate alliances to vie for control of Europe.

Meta’s researchers said they’d trained Cicero on a “truthful” subset of its data set to be largely honest and helpful, and that it would “never intentionally backstab” its allies in order to succeed. But the new paper’s authors claim the opposite was true: Cicero broke its deals, told outright falsehoods, and engaged in premeditated deception. Although the company did try to train Cicero to behave honestly, its failure to achieve that shows how AI systems can still unexpectedly learn to deceive, the authors say. 

Meta neither confirmed nor denied the researchers’ claims that Cicero displayed deceitful behavior, but a spokesperson said that it was purely a research project and the model was built solely to play Diplomacy. “We released artifacts from this project under a noncommercial license in line with our long-standing commitment to open science,” they say. “Meta regularly shares the results of our research to validate them and enable others to build responsibly off of our advances. We have no plans to use this research or its learnings in our products.” 

But it’s not the only game where an AI has “deceived” human players to win. 

AlphaStar, an AI developed by DeepMind to play the video game StarCraft II, became so adept at making moves aimed at deceiving opponents (known as feinting) that it defeated 99.8% of human players. Elsewhere, another Meta system called Pluribus learned to bluff during poker games so successfully that the researchers decided against releasing its code for fear it could wreck the online poker community. 

Beyond games, the researchers list other examples of deceptive AI behavior. GPT-4, OpenAI’s latest large language model, came up with lies during a test in which it was prompted to persuade a human to solve a CAPTCHA for it. The system also dabbled in insider trading during a simulated exercise in which it was told to assume the identity of a pressurized stock trader, despite never being specifically instructed to do so.

The fact that an AI model has the potential to behave in a deceptive manner without any direction to do so may seem concerning. But it mostly arises from the “black box” problem that characterizes state-of-the-art machine-learning models: it is impossible to say exactly how or why they produce the results they do—or whether they’ll always exhibit that behavior going forward, says Peter S. Park, a postdoctoral fellow studying AI existential safety at MIT, who worked on the project. 

“Just because your AI has certain behaviors or tendencies in a test environment does not mean that the same lessons will hold if it’s released into the wild,” he says. “There’s no easy way to solve this—if you want to learn what the AI will do once it’s deployed into the wild, then you just have to deploy it into the wild.”

Our tendency to anthropomorphize AI models colors the way we test these systems and what we think about their capabilities. After all, passing tests designed to measure human creativity doesn’t mean AI models are actually being creative. It is crucial that regulators and AI companies carefully weigh the technology’s potential to cause harm against its potential benefits for society and make clear distinctions between what the models can and can’t do, says Harry Law, an AI researcher at the University of Cambridge, who did not work on the research.“These are really tough questions,” he says.

Fundamentally, it’s currently impossible to train an AI model that’s incapable of deception in all possible situations, he says. Also, the potential for deceitful behavior is one of many problems—alongside the propensity to amplify bias and misinformation—that need to be addressed before AI models should be trusted with real-world tasks. 

“This is a good piece of research for showing that deception is possible,” Law says. “The next step would be to try and go a little bit further to figure out what the risk profile is, and how likely the harms that could potentially arise from deceptive behavior are to occur, and in what way.”

Google helped make an exquisitely detailed map of a tiny piece of the human brain

A team led by scientists from Harvard and Google has created a 3D, nanoscale-resolution map of a single cubic millimeter of the human brain. Although the map covers just a fraction of the organ—a whole brain is a million times larger—that piece contains roughly 57,000 cells, about 230 millimeters of blood vessels, and nearly 150 million synapses. It is currently the highest-resolution picture of the human brain ever created.

To make a map this finely detailed, the team had to cut the tissue sample into 5,000 slices and scan them with a high-speed electron microscope. Then they used a machine-learning model to help electronically stitch the slices back together and label the features. The raw data set alone took up 1.4 petabytes. “It’s probably the most computer-intensive work in all of neuroscience,” says Michael Hawrylycz, a computational neuroscientist at the Allen Institute for Brain Science, who was not involved in the research. “There is a Herculean amount of work involved.”

Many other brain atlases exist, but most provide much lower-resolution data. At the nanoscale, researchers can trace the brain’s wiring one neuron at a time to the synapses, the places where they connect. “To really understand how the human brain works, how it processes information, how it stores memories, we will ultimately need a map that’s at that resolution,” says Viren Jain, a senior research scientist at Google and coauthor on the paper, published in Science on May 9. The data set itself and a preprint version of this paper were released in 2021.

Brain atlases come in many forms. Some reveal how the cells are organized. Others cover gene expression. This one focuses on connections between cells, a field called “connectomics.” The outermost layer of the brain contains roughly 16 billion neurons that link up with each other to form trillions of connections. A single neuron might receive information from hundreds or even thousands of other neurons and send information to a similar number. That makes tracing these connections an exceedingly complex task, even in just a small piece of the brain..  

To create this map, the team faced a number of hurdles. The first problem was finding a sample of brain tissue. The brain deteriorates quickly after death, so cadaver tissue doesn’t work. Instead, the team used a piece of tissue removed from a woman with epilepsy during brain surgery that was meant to help control her seizures.

Once the researchers had the sample, they had to carefully preserve it in resin so that it could be cut into slices, each about a thousandth the thickness of a human hair. Then they imaged the sections using a high-speed electron microscope designed specifically for this project. 

Next came the computational challenge. “You have all of these wires traversing everywhere in three dimensions, making all kinds of different connections,” Jain says. The team at Google used a machine-learning model to stitch the slices back together, align each one with the next, color-code the wiring, and find the connections. This is harder than it might seem. “If you make a single mistake, then all of the connections attached to that wire are now incorrect,” Jain says. 

“The ability to get this deep a reconstruction of any human brain sample is an important advance,” says Seth Ament, a neuroscientist at the University of Maryland. The map is “the closest to the  ground truth that we can get right now.” But he also cautions that it’s a single brain specimen taken from a single individual. 

The map, which is freely available at a web platform called Neuroglancer, is meant to be a resource other researchers can use to make their own discoveries. “Now anybody who’s interested in studying the human cortex in this level of detail can go into the data themselves. They can proofread certain structures to make sure everything is correct, and then publish their own findings,” Jain says. (The preprint has already been cited at least 136 times.) 

The team has already identified some surprises. For example, some of the long tendrils that carry signals from one neuron to the next formed “whorls,” spots where they twirled around themselves. Axons typically form a single synapse to transmit information to the next cell. The team identified single axons that formed repeated connections—in some cases, 50 separate synapses. Why that might be isn’t yet clear, but the strong bonds could help facilitate very quick or strong reactions to certain stimuli, Jain says. “It’s a very simple finding about the organization of the human cortex,” he says. But “we didn’t know this before because we didn’t have maps at this resolution.”

The data set was full of surprises, says Jeff Lichtman, a neuroscientist at Harvard University who helped lead the research. “There were just so many things in it that were incompatible with what you would read in a textbook.” The researchers may not have explanations for what they’re seeing, but they have plenty of new questions: “That’s the way science moves forward.” 

Correction: Due to a transcription error, a quote from Viren Jain referred to how the brain ‘exports’ memories. It has been updated to reflect that he was speaking of how the brain ‘stores’ memories.