What this futuristic Olympics video says about the state of generative AI

The Olympic Games in Paris just finished last month and the Paralympics are still underway, so the 2028 Summer Olympics in Los Angeles feel like a lifetime from now. But the prospect of watching the games in his home city has Josh Kahn, a filmmaker in the sports entertainment world who has worked in content creation for both LeBron James and the Chicago Bulls, thinking even further into the future: What might an LA Olympics in the year 3028 look like?

It’s the perfect type of creative exercise for AI video generation, which came into the mainstream with the debut of OpenAI’s Sora earlier this year. By typing prompts into generators like Runway or Synthesia, users can generate fairly high-definition video in minutes. It’s fast and cheap, and it presents few technical obstacles compared with traditional creation techniques like CGI or animation. Even if every frame isn’t perfect—distortions like hands with six fingers or objects that disappear are common—there are, at least in theory, a host of commercial applications. Ad agencies, companies, and content creators could use the technology to create videos quickly and cheaply.  

Kahn, who has been toying with AI video tools for some time, used the latest version of Runway to dream up what the Olympics of the future could look like, entering a new prompt in the model for each shot. The video is just over one minute long and features sweeping aerial views of a futuristic version of LA where sea levels have risen sharply, leaving the city crammed right up to the coastline. A football stadium sits perched on top of a skyscraper, while a dome in the middle of the harbor contains courts for beach volleyball. 

The video, which was shared exclusively with MIT Technology Review, is meant less as a road map for the city and more as a demonstration of what’s possible now with AI.

“We were watching the Olympics and the amount of care that goes into the cultural storytelling of the host city,” Kahn says. “There’s a culture of imagination and storytelling in Los Angeles that has kind of set the tone for the rest of the world. Wouldn’t it be cool if we could showcase what the Olympics would look like if they returned to LA 1,000 years from now?”

More than anything, the video shows what a boon the generative technology may be for creators. However, it also indicates what’s holding it back. Though Kahn declined to share his prompts for the shots or specify how many prompts it took to get each take right, he did caution that anyone wishing to create good content with AI must be comfortable with trial and error. Particularly challenging in his futuristic project was getting the AI model to think outside the box in terms of architecture. A stadium hovering above water, for example, is not something most AI models have seen many examples of in their training data. 

With each shot requiring a new set of prompts, it’s also hard to instill a sense of continuity throughout a video. The color, angle of the sun, and shapes of buildings are difficult for a video generation model to keep consistent. The video also lacks any close-ups of people, which Kahn says AI models still tend to struggle with. 

“These technologies are always better on large-scale things right now as opposed to really nuanced human interaction,” he says. For this reason, Kahn imagines that early filmmaking applications of generative video might be for wide shots of landscapes or crowds. 

Alex Mashrabov, an AI video expert who left his role as director of generative AI at Snap last year to found a new AI video company called Higgsfield AI, agrees on the current failures and flaws of AI video. He also points out that good dialogue-heavy content is hard to produce with AI, as it tends to hinge upon subtle facial expressions and body language. 

Some content creators may be reluctant to adopt generative video simply because of the amount of time required to prompt the models again and again to get the end result right.

“Typically, the success rate is one out of 20,” Mashrabov says, but it’s not uncommon to need 50 or 100 attempts. 

For many purposes, though, that’s good enough. Mashrabov says he’s seen an uptick in AI-generated video advertisements from massive suppliers like Temu. In goods-producing countries like China, video generators are in high demand to quickly make in-your-face video ads for particular products. Even if an AI model might require lots of prompts to yield a usable ad, filming it with real people, cameras, and equipment might be 100 times more expensive. Applications like this might be the first use of generative video at scale as the technology slowly improves, he says. 

“Although I think this is a very long path, I’m very confident there are low-hanging fruits,” Mashrabov says. “We’re figuring out the genres where generative AI is already good today.”

Coming soon: Our 2024 list of Innovators Under 35

To tackle complex global problems such as preventing disease and mitigating climate change, we’re going to need new ideas from our brightest minds. Every year, MIT Technology Review identifies a new class of Innovators Under 35 taking on these and other challenges. 

On September 10, we will honor the 2024 class of Innovators Under 35. These 35 researchers and entrepreneurs are rising stars in their fields pursuing ambitious projects: One is unraveling the mysteries of how our immune system works, while another is engineering microbes to someday replace chemical pesticides.

Each is doing groundbreaking work to advance one of five areas: materials science, biotechnology, robotics, artificial intelligence, or climate and energy. Some have found clever ways to integrate these disciplines. One innovator, for example, enlists tiny robots to reduce the amount of antibiotics required to treat infections.

MIT Technology Review has published its Innovators Under 35 list since 1999. The first edition was created for our 100th anniversary and was meant to give readers a glimpse into the future, by highlighting what some of the world’s most talented young scientists are working on today.

This year, we’re celebrating our 125th anniversary and honoring this 25th class of innovators with the same goal in mind. (Note: The 2024 list will be made available exclusively to subscribers. If you’re not a subscriber, you can sign up here.)

Keep an eye on The Download newsletter next week for our announcement of the new class. You can also meet some of them at EmTech MIT, which will take place on September 30 and October 1 on MIT’s campus in Cambridge, Massachusetts.

If you can’t wait until then, we’ll reveal our Innovator of the Year during a live broadcast on LinkedIn on Monday, September 9. This person stood out for using their ingenuity to address a power imbalance in the tech sector (and that’s the only hint you get). They’ll join me on screen to talk about their work and share what’s next for their research.

A new way to build neural networks could make AI more understandable

A tweak to the way artificial neurons work in neural networks could make AIs easier to decipher.

Artificial neurons—the fundamental building blocks of deep neural networks—have survived almost unchanged for decades. While these networks give modern artificial intelligence its power, they are also inscrutable. 

Existing artificial neurons, used in large language models like GPT4, work by taking in a large number of inputs, adding them together, and converting the sum into an output using another mathematical operation inside the neuron. Combinations of such neurons make up neural networks, and their combined workings can be difficult to decode.

But the new way to combine neurons works a little differently. Some of the complexity of the existing neurons is both simplified and moved outside the neurons. Inside, the new neurons simply sum up their inputs and produce an output, without the need for the extra hidden operation. Networks of such neurons are called Kolmogorov-Arnold Networks (KANs), after the Russian mathematicians who inspired them.

The simplification, studied in detail by a group led by researchers at MIT, could make it easier to understand why neural networks produce certain outputs, help verify their decisions, and even probe for bias. Preliminary evidence also suggests that as KANs are made bigger, their accuracy increases faster than networks built of traditional neurons.

“It’s interesting work,” says Andrew Wilson, who studies the foundations of machine learning at New York University. “It’s nice that people are trying to fundamentally rethink the design of these [networks].”

The basic elements of KANs were actually proposed in the 1990s, and researchers kept building simple versions of such networks. But the MIT-led team has taken the idea further, showing how to build and train bigger KANs, performing empirical tests on them, and analyzing some KANs to demonstrate how their problem-solving ability could be interpreted by humans. “We revitalized this idea,” said team member Ziming Liu, a PhD student in Max Tegmark’s lab at MIT. “And, hopefully, with the interpretability… we [may] no longer [have to] think neural networks are black boxes.”

While it’s still early days, the team’s work on KANs is attracting attention. GitHub pages have sprung up that show how to use KANs for myriad applications, such as image recognition and solving fluid dynamics problems. 

Finding the formula

The current advance came when Liu and colleagues at MIT, Caltech, and other institutes were trying to understand the inner workings of standard artificial neural networks. 

Today, almost all types of AI, including those used to build large language models and image recognition systems, include sub-networks known as a multilayer perceptron (MLP). In an MLP, artificial neurons are arranged in dense, interconnected “layers.” Each neuron has within it something called an “activation function”—a mathematical operation that takes in a bunch of inputs and transforms them in some pre-specified manner into an output. 

In an MLP, each artificial neuron receives inputs from all the neurons in the previous layer and multiplies each input with a corresponding “weight” (a number signifying the importance of that input). These weighted inputs are added together and fed to the activation function inside the neuron to generate an output, which is then passed on to neurons in the next layer. An MLP learns to distinguish between images of cats and dogs, for example, by choosing the correct values for the weights of the inputs for all the neurons. Crucially, the activation function is fixed and doesn’t change during training.

Once trained, all the neurons of an MLP and their connections taken together essentially act as another function that takes an input (say, tens of thousands of pixels in an image) and produces the desired output (say, 0 for cat and 1 for dog). Understanding what that function looks like, meaning its mathematical form, is an important part of being able to understand why it produces some output. For example, why does it tag someone as creditworthy given inputs about their financial status? But MLPs are black boxes. Reverse-engineering the network is nearly impossible for complex tasks such as image recognition.

And even when Liu and colleagues tried to reverse-engineer an MLP for simpler tasks that involved bespoke “synthetic” data, they struggled. 

“If we cannot even interpret these synthetic datasets from neural networks, then it’s hopeless to deal with real-world data sets,” says Liu. “We found it really hard to try to understand these neural networks. We wanted to change the architecture.”

Mapping the math

The main change was to remove the fixed activation function and introduce a much simpler learnable function to transform each incoming input before it enters the neuron. 

Unlike the activation function in an MLP neuron, which takes in numerous inputs, each simple function outside the KAN neuron takes in one number and spits out another number. Now, during training, instead of learning the individual weights, as happens in an MLP, the KAN just learns how to represent each simple function. In a paper posted this year on the preprint server ArXiv, Liu and colleagues showed that these simple functions outside the neurons are much easier to interpret, making it possible to reconstruct the mathematical form of the function being learned by the entire KAN.

The team, however, has only tested the interpretability of KANs on simple, synthetic data sets, not on real-world problems, such as image recognition, which are more complicated. “[We are] slowly pushing the boundary,” says Liu. “Interpretability can be a very challenging task.”

Liu and colleagues have also shown that KANs get more accurate at their tasks with increasing size faster than MLPs do. The team proved the result theoretically and showed it empirically for science-related tasks (such as learning to approximate functions relevant to physics). “It’s still unclear whether this observation will extend to standard machine learning tasks, but at least for science-related tasks, it seems promising,” Liu says.

Liu acknowledges that KANs come with one important downside: it takes more time and compute power to train a KAN, compared to an MLP.

“This limits the application efficiency of KANs on large-scale data sets and complex tasks,” says Di Zhang, of Xi’an Jiaotong-Liverpool University in Suzhou, China. But he suggests that more efficient algorithms and hardware accelerators could help.

Anil Ananthaswamy is a science journalist and author who writes about physics, computational neuroscience, and machine learning. His new book, WHY MACHINES LEARN: The Elegant Math Behind Modern AI, was published by Dutton (Penguin Random House US) in July.

A new smart mask analyzes your breath to monitor your health

Your breath can give away a lot about you. Each exhalation contains all sorts of compounds, including possible biomarkers for disease or lung conditions, that could give doctors a valuable insight into your health.

Now a new smart mask, developed by a team at the California Institute of Technology, could help doctors check your breath for these signals continuously and in a noninvasive way. A patient could wear the mask at home, measure their own levels, and then go to the doctor if a flare-up is likely. 

“They don’t have to come to the clinic to assess their inflammation level,” says Wei Gao, professor of Medical Engineering at Caltech and one of the smart mask’s creators. “This can be lifesaving.”

The smart mask, details of which were published in Science today, uses a two-part cooling system to chill the breath of its wearer. The cooling turns the breath into exhaled breath condensate (EBC). 

EBC, essentially a liquid version of someone’s breath, is easier to analyze, because biomarkers like nitrite and alcohol content are more concentrated in a liquid than in a gas. The mask design takes inspiration from plants’ capillary abilities, using a series of microfluidic modules that create pressure to push the EBC fluid around to sensors in the mask.

The sensors are connected via Bluetooth to a device like a phone, where the patient has access to real-time health readings.

“The biggest challenge has always been collecting real-time samples. This problem has been solved. That’s a paradigm shift,” says Rajan Chakrabarty, professor of Environmental and Chemical Engineering at Washington University in St. Louis and who was not involved in the research.

The Caltech team tested the smart mask with patients, including several who had chronic obstructive pulmonary disease (COPD) or asthma or had just gotten over a covid-19 infection. They were testing the masks for comfort and breathability, but they also wanted to see if the masks actually worked at tracking useful biomarkers throughout a patient’s daily activities, such as exercise and work. 

The mask picked up on higher levels of nitrite in patients who had asthma or other conditions that involved inflamed airways. It also picked up on higher alcohol content after a patient went out drinking, which demonstrates another potential application of the mask. Analyzing breath this way is more accurate than the typical breathalyzer test, which involves a patient blowing into a device. Blowing can produce imprecise results due to alcohol in saliva being spit out.

The researchers hope this is just the beginning. They plan to test the masks on a larger population, and if all goes well, commercialize the masks to get them out to a wider audience. They hope the mask will be a platform for broader application, where sensors for a range of biomarkers could be slotted in and out. 

“What I would like to be able to do is take off their sensors, put in my sensors, and this becomes the building block for doing all other types of development,” says Albert Titus, professor and chair of the Department of Biomedical Engineering at the University at Buffalo and who wasn’t part of the Caltech team. “That’s where I’d like to see it go.”

For example, there may be the possibility to measure ketones in the breath, a high level of which is a sign of diabetes, or glucose levels, to help people with diabetes monitor their condition.

“The mask can be reconfigured for many different applications,” says Gao.

How machine learning is helping us probe the secret names of animals

Do animals have names? According to the poet T.S. Eliot, cats have three: the name their owner calls them (like George); a second, more noble one (like Quaxo or Cricopat); and, finally, a “deep and inscrutable” name known only to themselves “that no human research can discover.”

But now, researchers armed with audio recorders and pattern-recognition software are making unexpected discoveries about the secrets of animal names—at least with small monkeys called marmosets.  

That’s according to a team at Hebrew University in Israel, who claim in the journal Science this week they’ve discovered that marmosets “vocally label” their monkey friends with specific sounds.

Until now, only humans, dolphins, elephants, and probably parrots had been known to use specific sounds to call out to other individuals.

Marmosets are highly social creatures that maintain contact through high-pitched chirps and twitters called “phee-calls.” By recording different pairs of monkeys placed near each other, the team in Israel says they found the animals will adjust their sounds toward a vocal label that’s specific to their conversation partner. 

“It’s similar to names in humans,” says David Omer, the neuroscientist who led the project. “There’s a typical time structure to their calls, and what we report is that the monkey fine-tunes it to encode an individual.”

These names aren’t really recognizable to the human ear; instead, they were identified via a “random forest,” the statistical machine learning technique Omer’s team used to cluster, classify, and analyze the sounds.

To prove they’d cracked the monkey code—and learned the secret names—the team played recordings at the marmosets through a speaker and found they responded more often when their label, or name, was in the recording.

This sort of research could provide clues to the origins of human language, which is arguably the most powerful innovation in our species’ evolution, right up there with opposable thumbs. In years past, it’s been argued that human language is unique and that animals lack both the brains and vocal apparatus to converse.

But there’s growing evidence that isn’t the case, especially now that the use of names has been found in at least four distantly related species. “This is very strong evidence that the evolution of language was not a singular event,” says Omer.

Some similar research tactics were reported earlier this year by Mickey Pardo, a postdoctoral researcher, now at Cornell University, who spent 14 months in Kenya recording elephant calls. Elephants sound alarms by trumpeting, but in reality most of their vocalizations are deep rumbles that are only partly audible to humans.

Pardo also found evidence that elephants use vocal labels, and he says he can definitely get an elephant’s attention by playing the sound of another elephant addressing it. But does this mean researchers are now “speaking animal”? 

Not quite, says Pardo. Real language, he thinks, would mean the ability to discuss things that happened in the past or string together more complex ideas. Pardo says he’s hoping to determine next if elephants have specific sounds for deciding which watering hole to visit—that is, whether they employ place names.

Several efforts are underway to discover if there’s still more meaning in animal sounds than we thought. This year, a group called Project CETI that’s studying the songs of sperm whales found they are far more complex than previously recognized. It means the animals, in theory, could be using a kind of grammar—although whether they actually are saying anything specific isn’t known.

Another effort, the Earth Species Project, aims to use “artificial intelligence to decode nonhuman communication” and has started helping researchers collect more data on animal sounds to feed into those models. 

The team in Israel say they will also be giving the latest types of artificial intelligence a try. Their marmosets live in a laboratory facility, and Omer says he’s already put microphones in monkeys’ living space in order to record everything they say, 24 hours a day.

Their chatter, Omer says, will be used to train a large language model that could, in theory, be used to finish a series of calls that a monkey started, or produce what it predicts is an appropriate reply. But will a primate language model actually make sense, or will it just gibber away without meaning? 

Only the monkeys will be able to say for sure.  

“I don’t have any delusional expectations that they will talk about Nietzsche,” says Omer. “I don’t expect it to be extremely complex like a human, but I would expect it to help us understand something about how our language developed.” 

AI’s growth needs the right interface

If you took a walk in Hayes Valley, San Francisco’s epicenter of AI froth, and asked the first dude-bro you saw wearing a puffer vest about the future of the interface, he’d probably say something about the movie Her, about chatty virtual assistants that will help you do everything from organize your email to book a trip to Coachella to sort your text messages.

Nonsense. Setting aside that Her (a still from the film is shown above) was about how technology manipulates us into a one-sided relationship, you’d have to be pudding-brained to believe that chatbots are the best way to use computers. The real opportunity is close, but it isn’t chatbots.

Instead, it’s computers built atop the visual interfaces we know, but which we can interact with more fluidly, through whatever combination of voice and touch is most natural. Crucially, this won’t just be a computer that we can use. It’ll also be a computer that empowers us to break and remake it, to whatever ends we want. 

Chatbots fail because they ignore a simple fact that’s sold 20 billion smartphones: For a computer to be useful, we need an easily absorbed mental model of both its capabilities and its limitations. The smartphone’s victory was built on the graphical user interface, which revolutionized how we use computers—and how many computers we use!—because it made it easy to understand what a computer could do. There was no mystery. In a blink, you saw the icons and learned without realizing it.

Today we take the GUI for granted. Meanwhile, chatbots can feel like magic, letting you say anything and get a reasonable-­sounding response. But magic is also the power to mislead. Chatbots and open-ended conversational systems are doomed as general-­purpose interfaces because while they may seem able to understand anything, they can’t actually do everything. 

In that gap between anything and everything sits a teetering mound of misbegotten ideas and fatally hyped products.

“But dude, maybe a chatbot could help you book that flight to Coachella?” Sure. But could it switch your reservation when you have a problem? Could it ask you, in turn, which flight is best given your need to be back in Hayes Valley by Friday at 2? 

We take interactive features for granted because of the GUI’s genius. But with a chatbot, you can never know up front where its abilities begin and end. Yes, the list of things they can do is growing every day. But how do you remember what does and doesn’t work, or what’s supposed to work soon? And how are you supposed to constantly update your mental model as those capabilities grow?

If you’ve ever used a digital assistant or smart speaker, you already know that mismatched expectations create products we’ll never use to their full potential. When you first tried one, you probably asked it to do whatever you could think of. Some things worked; most didn’t. So you eventually settled on asking for just the few things you could remember that always worked: timers and music. LLMs, when used as primary interfaces, re-create the trouble that arises when your mental model isn’t quite right. 

Chatbots have their uses and their users. But their usefulness is still capped because they are open-ended computer interfaces that challenge you to figure them out through trial and error. Instead, we need to combine the ease of natural-­language input with machines that will simply show us what they are capable of. 

For example, imagine if, instead of stumbling around trying to talk to the smart devices in your home like a doofus, you could simply look at something with your smart glasses (or whatever) and see a right-click for the real world, giving you a menu of what you can control in all the devices that increasingly surround us. It won’t be a voice that tells you what’s possible—it’ll be an old-fashioned computer screen, and an old-fashioned GUI, which you can operate with your voice or with your hands, or both in combination if you want.

But that’s still not the big opportunity! 

Why shouldn’t we be able to not merely consume technology but instead architect it to suit our own ends?

I think the future interface we want is made from computers and apps that work in ways similar to the phones and laptops we have now—but that we can remake to suit whatever uses we want. Compare this with the world we have now: If you don’t like your hotel app, you can’t make a new one. If you don’t want all the bloatware in your banking app, tough luck. We’re surrounded by apps that are nominally tools. But unlike any tool previously known to man, these are tools that serve only the purpose that someone else defined for them. Why shouldn’t we be able to not merely consume technology, like the gelatinous former Earthlings in Wall-E, but instead architect technology to suit our own ends?

That world seemed close in the 1970s, to Steve Wozniak and the Homebrew Computer Club. It seemed to approach again in the 1990s, with the World Wide Web. But today, the imbalance between people who own computers and people who remake them has never been greater. We, the heirs of the original tool-using primates, have been reduced from wielders of those tools to passive consumers of technology delivered in slick buttons we can use but never change. This runs against what it is to be Homo sapiens, a species defined by our love and instinct for repurposing tools to whatever ends we like.

Imagine if you didn’t have to accept the features some tech genius announced on a wave of hype. Imagine if, instead of downloading some app someone else built, you could describe the app you wanted and then make it with a computer’s help, by reassembling features from any other apps ever created. Comp sci geeks call this notion of recombining capabilities “composability.” I think the future is composability—but composability that anyone can command. 

This idea is already lurching to life. Notion—originally meant as enterprise software that let you collect and create various docs in one place—has exploded with Gen Z, because unlike most software, which serves only a narrow or rigid purpose, it allows you to make and share templates for how to do things of all kinds. You can manage your finances or build a kindergarten lesson plan in one place, with whatever tools you need. 

Now imagine if you could tell your phone what kinds of new templates you want. An LLM can already assemble all the things you need and draw the right interface for them. Want a how-to app about knitting? Sure. Or your own guide to New York City? Done. That computer will probably be using an LLM to assemble these apps. Great. That just means that you, as a normie, can inspect and tinker with the prompt powering the software you just created, like a mechanic looking under the hood.

One day, hopefully soon, we’ll look back on this sad and weird era when our digital tools were both monolithic and ungovernable as a blip when technology conflicted with the human urge to constantly tinker with the world around us. And we’ll realize that the key to building a different relationship with technology was simply to give each of us power over how the interface of the future is designed. 

Cliff Kuang is a user-experience designer and the author of User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play.

The author who listens to the sound of the cosmos

In 1983, while on a field recording assignment in Kenya, the musician and soundscape ecologist Bernie Krause noticed something remarkable. Lying in his tent late one night, listening to the calls of hyenas, tree frogs, elephants, and insects in the surrounding old-growth forest, Krause heard what seemed to be a kind of collective orchestra. Rather than a chaotic cacophony of nighttime noises, it was as if each animal was singing within a defined acoustic bandwidth, like living instruments in a larger sylvan ensemble. 

Unsure of whether this structured musicality was real or the invention of an exhausted mind, Krause analyzed his soundscape recordings on a spectrogram when he returned home. Sure enough, the insects occupied one frequency niche, the frogs another, and the mammals a completely separate one. Each group had claimed a unique part of the larger sonic spectrum, a fact that not only made communication easier, Krause surmised, but also helped convey important information about the health and history of the ecosystem.

cover of A Book of Noises
A Book of Noises:
Notes on the Auraculous

Caspar Henderson
UNIVERSITY OF CHICAGO PRESS, 2024

Krause describes his “niche hypothesis” in the 2012 book The Great Animal Orchestra, dubbing these symphonic soundscapes the “biophony”—his term for all the sounds generated by nonhuman organisms in a specific biome. Along with his colleague Stuart Gage from Michigan State University, he also coins two more terms—“anthropophony” and “geophony”—to describe sounds associated with humanity (think music, language, traffic jams, jetliners) and those originating from Earth’s natural processes (wind, waves, volcanoes, and thunder).

In A Book of Noises: Notes on the Auraculous, the Oxford-based writer and journalist Caspar Henderson makes an addition to Krause’s soundscape triumvirate: the “cosmophony,” or the sounds of the cosmos. Together, these four categories serve as the basis for a brief but fascinating tour through the nature of sound and music with 48 stops (in the form of short essays) that explore everything from human earworms to whale earwax.

We start, appropriately enough, with a bang. Sound, Henderson explains, is a pressure wave in a medium. The denser the medium, the faster it travels. For hundreds of thousands of years after the Big Bang, the universe was so dense that it trapped light but allowed sound to pass through it freely. As the primordial plasma of this infant universe cooled and expansion continued, matter collected along the ripples of these cosmic waves, which eventually became the loci for galaxies like our own. “The universe we see today is an echo of those early years,” Henderson writes, “and the waves help us measure [its] size.” 

The Big Bang may seem like a logical place to start a journey into sound, but cosmophony is actually an odd category to invent for a book about noise. After all, there’s not much of it in the vacuum of space. Henderson gets around this by keeping the section short and focusing more on how humans have historically thought about sound in the heavens. For example, there are two separate essays on our multicentury obsession with “the music of the spheres,” the idea that there exists a kind of ethereal harmony produced by the movements of heavenly objects.

Since matter matters when it comes to sound—there can be none of the latter without the former—we also get an otherworldly examination of what human voices would sound like on different terrestrial and gas planets in our solar system, as well as some creative efforts from musicians and scientists who have transmuted visual data from space into music and other forms of audio. These are fun and interesting forays, but it isn’t until the end of the equally short “Sounds of Earth” (geophony) section that readers start to get a sense of the “auraculousness”—ear-related wonder—Henderson references in the subtitle.

Judging by the quantity and variety of entries in the “biophony” and “anthropophony” sections, you get the impression Henderson himself might be more attuned to these particular wonders as well. You really can’t blame him. 

The sheer number of fascinating ways that sound is employed across the human and nonhuman animal kingdom is mind-boggling, and it’s in these final two sections of the book that Henderson’s prose and curatorial prowess really start to shine—or should I say sing

We learn, for example, about female frogs that have devised their own biological noise-canceling system to tune out the male croaks of other species; crickets that amplify their chirps by “chewing a hole in a leaf, sticking their heads through it, and using it as a megaphone”; elephants that listen and communicate with each other seismically; plants that react to the buzz of bees by increasing the concentration of sugar in their flowers’ nectar; and moths with tiny bumps on their exoskeletons that jam the high-frequency echolocation pulses bats use to hunt them. 

Henderson has a knack for crisp characterization (“Singing came from winging”) and vivid, playful descriptions (“Through [the cochlea], the booming and buzzing confusion of the world, all its voices and music, passes into the three pounds of wobbly blancmange inside the nutshell numbskulls that are our kingdoms of infinite space”). He also excels at injecting a sense of wonder into aspects of sound that many of us take for granted. 

It turns out that sound is not just a great way to communicate and navigate underwater—it may be the best way.

In an essay about its power to heal, he marvels at ultrasound’s twin uses as a medical treatment and a method of examination. In addition to its kidney-stone-blasting and tumor-ablating powers, sound, Henderson says, can also be a literal window into our bodies. “It is, truly, an astonishing thing that our first glimpse of the greatest wonder and trial of our lives, parenthood, comes in the form of a fuzzy black and white smudge made from sound.”

While you can certainly quibble with some of the topical choices and their treatment in A Book of Noises, what you can’t argue with is the clear sense of awe that permeates almost every page. It’s an infectious and edifying kind of energy. So much so that by the time Henderson wraps up the book’s final essay, on silence, all you want to do is immerse yourself in more noise.

Singing in the key of sea

For the multiple generations who grew up watching his Academy Award–­winning 1956 documentary film, The Silent World, Jacques-Yves Cousteau’s mischaracterization of the ocean as a place largely devoid of sound seems to have calcified into common knowledge. The science writer Amorina Kingdon offers a thorough and convincing rebuttal of this idea in her new book, Sing Like Fish: How Sound Rules Life Under Water.

cover of Sing Like Fish
Sing Like Fish: How Sound
Rules Life Under Water

Amorina Kingdon
CROWN, 2024

Beyond serving as a 247-page refutation of this unfortunate trope, Kingdon’s book aims to open our ears to all the marvels of underwater life by explaining how sound behaves in this watery underworld, why it’s so important to the animals that live there, and what we can learn when we start listening to them.

It turns out that sound is not just a great way to communicate and navigate underwater—it may be the best way. For one thing, it travels four and a half times faster there than it does on land. It can also go farther (across entire seas, under the right conditions) and provide critical information about everything from who wants to eat you to who wants to mate with you. 

To take advantage of the unique way sound propagates in the world’s oceans, fish rely on a variety of methods to “hear” what’s going on around them. These mechanisms range from so-called lateral lines—rows of tiny hair cells along the outside of their body that can sense small movements and vibrations in the water around them—to otoliths, dense lumps of calcium carbonate that form inside their inner ears. 

Because fish are more or less the same density as water, these denser otoliths move at a different amplitude and phase in response to vibrations passing through their body. The movement is then registered by patches of hair cells that line the chambers where otoliths are embedded, which turn the vibrations of sound into nerve impulses. The philosopher of science Peter Godfrey-Smith may have put it best: “It is not too much to say that a fish’s body is a giant pressure-sensitive ear.” 

While there are some minor topical overlaps with Henderson’s book—primarily around whale-related sound and communication—one of the more admirable attributes of Sing Like Fish is Kingdon’s willingness to focus on some of the oceans’ … let’s say, less charismatic noise-­makers. We learn about herring (“the inveterate farters of the sea”), which use their flatuosity much as a fighter jet might use countermeasures to avoid an incoming missile. When these silvery fish detect the sound of a killer whale, they’ll fire off a barrage of toots, quickly decreasing both their bodily buoyancy and their vulnerability to the location-revealing clicks of the whale hunting them. “This strategic fart shifts them deeper and makes them less reflective to sound,” writes Kingdon.  

Readers are also introduced to the plainfin midshipman, a West Coast fish with “a booming voice” and “a perpetual look of accusation.” In addition to having “a fishy case of resting bitch face,” the male midshipman also has a unique hum, which it uses to attract gravid females in the spring. That hum became the subject of various conspiracy theories in the mid-’80s, when houseboat owners in Sausalito, California, started complaining about a mysterious seasonal drone. Thanks to a hydrophone and a level-headed local aquarium director, the sound was eventually revealed to be not aliens or a secret government experiment, but simply a small, brownish-green fish looking for love.

Kingdon’s command of, and enthusiasm for, the science of underwater sound is uniformly impressive. But it’s her recounting of how and why we started listening to the oceans in the first place that’s arguably one of the book’s most fascinating topics. It’s a wide-ranging tale, one that spans “firearm-­happy Victorian-era gentleman” and “whales that sounded suspiciously like Soviet submarines.” It’s also a powerful reminder of how war and military research can both spur and stifle scientific discovery in surprising ways.  

The fact that Sing Like Fish ends up being both an exquisitely reported piece of journalism and a riveting exploration of a sense that tends to get short shrift only amplifies Kingdon’s ultimate message—that we all need to start paying more attention to the ways in which our own sounds are impinging on life underwater. As we’ve started listening more to the seas, what we’re increasingly hearing is ourselves, she writes: “Piercing sonar, thudding seismic air guns for geological imaging, bangs from pile drivers, buzzing motorboats, and shipping’s broadband growl. We make a lot of noise.”

That noise affects underwater communication, mating, migrating, and bonding in all sorts of subtle and obvious ways. And its impact is often made worse when combined with other threats, like climate change. The good news is that while noise can be a frustratingly hard thing to regulate, there are efforts underway to address our poor underwater aural etiquette. The International Maritime Organization is currently updating its ship noise guidelines for member nations. At the same time, the International Organization for Standardization is creating more guidelines for measuring underwater noise. 

“The ocean is not, and has never been, a silent place,” writes Kingdon. But to keep it filled with the right kinds of noise (i.e., the kinds that are useful to the creatures living there), we’ll have to recommit ourselves to doing two things that humans sometimes aren’t so great at: learning to listen and knowing when to shut up.   

Music to our ears (and minds)

We tend to do both (shut up and listen) when music is being played—at least if it’s the kind we like. And yet the nature of what the composer Edgard Varèse famously called “organized sound” largely remains a mystery to us. What exactly is music? What distinguishes it from other sounds? Why do we enjoy making it? Why do we prefer certain kinds? Why is it so effective at influencing our emotions and (often) our memories?  

In their recent book Every Brain Needs Music: The Neuroscience of Making and Listening to Music, Larry Sherman and Dennis Plies look inside our heads to try to find some answers to these vexing questions. Sherman is a professor of neuroscience at the Oregon Health and Science University, and Plies is a professional musician and teacher. Unfortunately, if the book reveals anything, it’s that limiting your exploration of music to one lens (neuroscience) also limits the insights you can gain into its nature. 

cover of Every Brain Needs Music
Every Brain Needs Music:
The Neuroscience of Making
and Listening to Music

Larry Sherman and Dennis Plies
COLUMBIA UNIVERSITY PRESS, 2023

That’s not to say that getting a better sense of how specific patterns of vibrating air molecules get translated into feelings of joy and happiness isn’t valuable. There are some genuinely interesting explanations of what happens in our brains when we play, listen to, and compose music—supported by some truly great watercolor-­based illustrations by Susi Davis that help to clarify the text. But much of this gets bogged down in odd editorial choices (there are, for some reason, three chapters on practicing music) and conclusions that aren’t exactly earth-shattering (humans like music because it connects us). 

Every Brain Needs Music purports to be for all readers, but unless you’re a musician who’s particularly interested in the brain and its inner workings, I think most people will be far better served by A Book of Noises or other, more in-depth explorations of the importance of music to humans, like Michael Spitzer’s The Musical Human: A History of Life on Earth

“We have no earlids,” the late composer and naturalist R. Murray Schafer once observed. He also noted that despite this anatomical omission, we’ve become quite good at ignoring or tuning out large portions of the sonic world around us. Some of this tendency may be tied to our supposed preference for other sensory modalities. Most of us are taught from an early age that we are primarily visual creatures—that seeing is believing, that a picture is worth a thousand words. This idea is likely reinforced by a culture that also tends to focus primarily on the visual experience.

Yet while it may be true that we rely heavily on our eyes to make sense of the world, we do a profound disservice to ourselves and the rest of the natural world when we underestimate or downplay sound. Indeed, if there’s a common message that runs through all three of these books, it’s that attending to sound in all its forms isn’t just personally rewarding or edifying; it’s a part of what makes us fully human. As Bernie Krause discovered one night more than 40 years ago, once you start listening, it’s amazing what you can hear. 

Bryan Gardiner is a writer based in Oakland, California.

Job title of the future: Weather maker

Much of the western United States relies on winter snowpack to supply its rivers and reservoirs through the summer months. But with warming temperatures, less and less snow is falling—a recent study showed a 23% decline in annual snowpack since 1955. By some estimates, runoff from snowmelt in the western US could decrease by a third between now and the end of the century, meaning less water will be available for agriculture, hydroelectric projects, and urban use in a region already dealing with water scarcity. 

That’s where Frank McDonough comes in. An atmospheric research scientist, McDonough leads a cloud-seeding program at the Desert Research Institute (DRI) that aims to increase snowfall in Nevada and the Eastern Sierras. Snow makers like McDonough and others who generate rain represent a growing sector in a parched world. 

Instant snow: Cloud seeding for snow works by injecting a tiny amount of silver iodide dust into a cloud to help its water vapor condense into ice crystals that grow into snowflakes. In other conditions, water molecules drawn to such particles coalesce into raindrops. McDonough uses custom-­made, remotely operated machines on the ground to heat up a powdered form of the silver iodide that’s released into the air. Dust—or sometimes table salt—can also be released from planes. 

Old tech, new urgency: The precipitation-­catalyzing properties of silver iodide were first explored in the 1940s by American chemists and engineers, but the field remained a small niche. Now, with 40% of people worldwide affected by water scarcity and a growing number of reservoirs facing climate stress, cloud seeding is receiving global interest. “It’s becoming almost like, hey, we have to do this, because there’s just too many people and too many demands on these water resources,” says McDonough. A growing number of government-­run cloud-seeding programs around the world are now working to increase rainfall and snowpack, and even manipulating the timing of precipitation to prevent large hailstorms, reduce air pollution, and minimize flood risk. The private sector is also taking note: One cloud-seeding startup, Rainmaker, recently raised millions.

Generating results: At the end of each winter, the snowmakers dig into the data to see what impact they’ve had. In the past, McDonough says, his seeding has increased snowpack by 5% to 10%. That’s not enough to end a drought, but the DRI estimates that the cloud seeding around Reno, Nevada, alone adds enough precipitation to keep about 40,000 households supplied. And for some hydroelectric projects, “a 1% increase is worth millions of dollars,” McDonough says. “Water is really valuable out here in the West.”

This startup is making coffee without coffee beans

DJ Tan, cofounder of the Singaporean startup Prefer Coffee, pops open a bottle of oat latte and pours some into my cup. The chilled drink feels wonderfully refreshing in Singapore’s heat—and it tastes just like coffee. And that’s impressive, because there isn’t a single ounce of coffee in it. 

It turns out that our beloved cup of joe may not be sustainable the way it’s produced now. Rising temperatures, droughts, floods, typhoons, and new diseases are endangering coffee crops. A 2022 study published in the journal PLOS One expects a general decline in land suitable for growing coffee by 2050. Modern coffee production involves clearing forests and uses a lot of water (as well as fertilizers and pesticides). It also consumes a lot of energy, generates greenhouse-gas emissions, and ruins native ecosystems. The situation “presents an existential crisis for the global coffee industry,” says Tan—and for all those who love their morning wake-up shot. 

Tan had an idea that could fix it: a “coffee” brewed entirely from leftovers of the local food industry. 

For a few years before starting Prefer, Tan was working in the food industry with Singapore’s top chefs. His clients were in search of new flavors, which he created using fermentation—feeding various organic substances to microbes. Humans have been using microorganisms to create foods for ages: microbes and yeast produce some of our favorite foods and drinks, like yogurt, kimchi, beer, and kombucha. But Tan was pushing the process in new directions. “Fermentation is a way to create flavors that don’t exist,” he says. 

In 2022, at a local startup accelerator in Singapore, Tan met Jake Berber, a neuroscientist turned entrepreneur. Both men were coffee lovers, and they joined forces to create a beanless drink. In doing so, they joined a growing movement of upcyclers who believe that we can reduce our footprint by putting food leftovers back onto our plates after making them appealing and palatable once again. 

They spent months experimenting with various ingredients. “From my previous work, I had an inkling of what might work,” says Tan, but narrowing it down to the exact proportions, processes, and types of leftovers took a while. They tried roasting chicory root, which had been used as a coffee substitute before, but while the result was reminiscent of coffee, the taste wasn’t close enough. They tried grinding date seeds, which yielded a fruity tea-like drink, a far cry from coffee. Then some batches brewed from mixtures of food leftovers showed promise. They used gas chromatography mass spectrometry, a technique that identifies individual molecular compounds in mixtures, to identify and analyze the molecules responsible for the desired taste. The results guided them in tweaking new iterations of the brew. After a few months and several hundred different mixes and methods, they zeroed in on the right combination: stale bread from bakeries, soybean pulp from tofu making, and spent barley grains from local breweries. “We combine them in roughly equal amounts, ferment for 24 hours, and then roast,” Tan says. Out comes a naturally caffeine-­free “coffee” that can be enjoyed with plant-based or regular milk. Or added to a martini—local bartenders jumped on the novelty. Without milk, the drink “tastes a little more chocolatey and retains the notes of herbaceous bitterness,” according to Tan. Price-wise it’s comparable to your average coffee, Berber says. Prefer sells a powder to be brewed like any other coffee, as well as bottled cold brew and bottled latte. The products can be bought online and ordered at various Singaporean cafés.  

For those who want their kick, the startup adds caffeine powder from tea leaves. On a warming planet, tea plants are a better bet, Tan explains: “You’re harvesting the leaves, which are a lot more plentiful than the coffee berries.” 

Prefer ferments and then roasts its upcycled mixture (right). They also have started selling bottled products online (left).
PREFER

Currently, Prefer Coffee sells its brew in Singapore only, but it hopes to expand to other places while still upcycling local waste. In the Philippines, for example, leftover cassava, sugarcane, or pineapple might be used, Tan says. Although adjustments will have to be made, the company’s fermentation process should be able to deliver something similarly coffee-like: “Our technology doesn’t rely on soy, bread, and barley but tries to use whatever is available.” ν

Journalist Lina Zeldovich is the author of The Living Medicine: How a Lifesaving Cure Was Nearly Lost and Why It Will Rescue Us When Antibiotics Fail, to be published by St. Martin’s in October 2024.

Will computers ever feel responsible?

“If a machine is to interact intelligently with people, it has to be endowed with an understanding of human life.” 

—Dreyfus and Dreyfus

Bold technology predictions pave the road to humility. Even titans like Albert Einstein own a billboard or two along that humbling freeway. In a classic example, John von Neumann, who pioneered modern computer architecture, wrote in 1949, “It would appear that we have reached the limits of what is possible to achieve with computer technology.” Among the myriad manifestations of computational limit-busting that have defied von Neumann’s prediction is the social psychologist Frank Rosenblatt’s 1958 model of a human brain’s neural network. He called his device, based on the IBM 704 mainframe computer, the “Perceptron” and trained it to recognize simple patterns. Perceptrons eventually led to deep learning and modern artificial intelligence.

In a similarly bold but flawed prediction, brothers Hubert and Stuart Dreyfus—professors at UC Berkeley with very different specialties, Hubert’s in philosophy and Stuart’s in engineering—wrote in a January 1986 story in Technology Review that “there is almost no likelihood that scientists can develop machines capable of making intelligent decisions.” The article drew from the Dreyfuses’ soon-to-be-published book, Mind Over Machine (Macmillan, February 1986), which described their five-stage model for human “know-how,” or skill acquisition. Hubert (who died in 2017) had long been a critic of AI, penning skeptical papers and books as far back as the 1960s. 

Stuart Dreyfus, who is still a professor at Berkeley, is impressed by the progress made in AI. “I guess I’m not surprised by reinforcement learning,” he says, adding that he remains skeptical and concerned about certain AI applications, especially large language models, or LLMs, like ChatGPT. “Machines don’t have bodies,” he notes. And he believes that being disembodied is limiting and creates risk: “It seems to me that in any area which involves life-and-death possibilities, AI is dangerous, because it doesn’t know what death means.”

According to the Dreyfus skill acquisition model, an intrinsic shift occurs as human know-how advances through five stages of development: novice, advanced beginner, competent, proficient, and expert. “A crucial difference between beginners and more competent performers is their level of involvement,” the researchers explained. “Novices and beginners feel little responsibility for what they do because they are only applying the learned rules.” If they fail, they blame the rules. Expert performers, however, feel responsibility for their decisions because as their know-how becomes deeply embedded in their brains, nervous systems, and muscles—an embodied skill—they learn to manipulate the rules to achieve their goals. They own the outcome.

That inextricable relationship between intelligent decision-­making and responsibility is an essential ingredient for a well-­functioning, civilized society, and some say it’s missing from today’s expert systems. Also missing is the ability to care, to share concerns, to make commitments, to have and read emotions—all the aspects of human intelligence that come from having a body and moving through the world.

As AI continues to infiltrate so many aspects of our lives, can we teach future generations of expert systems to feel responsible for their decisions? Is responsibility—or care or commitment or emotion—something that can be derived from statistical inferences or drawn from the problematic data used to train AI? Perhaps, but even then machine intelligence would not equate to human intelligence—it would still be something different, as the Dreyfus brothers also predicted nearly four decades ago. 

Bill Gourgey is a science writer based in Washington, DC.