Ai Archives Ai

Ai AI Hype Index App Artificial intelligence

May 29 2025

The AI Hype Index: College students are hooked on ChatGPT

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

Large language models confidently present their responses as accurate and reliable, even when they’re neither of those things. That’s why we’ve recently seen chatbots supercharge vulnerable people’s delusions, make citation mistakes in an important legal battle between music publishers and Anthropic, and (in the case of xAI’s Grok) rant irrationally about “white genocide.”

But it’s not all bad news—AI could also finally lead to a better battery life for your iPhone and solve tricky real-world problems that humans have been struggling to crack, if Google DeepMind’s new model is any indication. And perhaps most exciting of all, it could combine with brain implants to help people communicate when they have lost the ability to speak.

Ecommerce MGMT 0 Comments

Ai App Artificial intelligence

May 10 2025

How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks.

SWE-Bench (pronounced “swee bench”) launched in November 2024 to evaluate an AI model’s coding skill, using more than 2,000 real-world programming problems pulled from the public GitHub repositories of 12 different Python-based projects.

In the months since then, it’s quickly become one of the most popular tests in AI. A SWE-Bench score has become a mainstay of major model releases from OpenAI, Anthropic, and Google—and outside of foundation models, the fine-tuners at AI firms are in constant competition to see who can rise above the pack. The top of the leaderboard is a pileup between three different fine tunings of Anthropic’s Claude Sonnet model and Amazon’s Q developer agent. Auto Code Rover—one of the Claude modifications—nabbed the number two spot in November, and was acquired just three months later.

Despite all the fervor, this isn’t exactly a truthful assessment of which model is “better.” As the benchmark has gained prominence, “you start to see that people really want that top spot,” says John Yang, a researcher on the team that developed SWE-Bench at Princeton University. As a result, entrants have begun to game the system—which is pushing many others to wonder whether there’s a better way to actually measure AI achievement.

Developers of these coding agents aren’t necessarily doing anything as straightforward cheating, but they’re crafting approaches that are too neatly tailored to the specifics of the benchmark. The initial SWE-Bench test set was limited to programs written in Python, which meant developers could gain an advantage by training their models exclusively on Python code. Soon, Yang noticed that high-scoring models would fail completely when tested on different programming languages—revealing an approach to the test that he describes as “gilded.”

“It looks nice and shiny at first glance, but then you try to run it on a different language and the whole thing just kind of falls apart,” Yang says. “At that point, you’re not designing a software engineering agent. You’re designing to make a SWE-Bench agent, which is much less interesting.”

The SWE-Bench issue is a symptom of a more sweeping—and complicated—problem in AI evaluation, and one that’s increasingly sparking heated debate: The benchmarks the industry uses to guide development are drifting further and further away from evaluating actual capabilities, calling their basic value into question. Making the situation worse, several benchmarks, most notably FrontierMath and Chatbot Arena, have recently come under heat for an alleged lack of transparency. Nevertheless, benchmarks still play a central role in model development, even if few experts are willing to take their results at face value. OpenAI cofounder Andrej Karpathy recently described the situation as “an evaluation crisis”: the industry has fewer trusted methods for measuring capabilities and no clear path to better ones.

“Historically, benchmarks were the way we evaluated AI systems,” says Vanessa Parli, director of research at Stanford University’s Institute for Human-Centered AI. “Is that the way we want to evaluate systems going forward? And if it’s not, what is the way?”

A growing group of academics and AI researchers are making the case that the answer is to go smaller, trading sweeping ambition for an approach inspired by the social sciences. Specifically, they want to focus more on testing validity, which for quantitative social scientists refers to how well a given questionnaire measures what it’s claiming to measure—and, more fundamentally, whether what it is measuring has a coherent definition. That could cause trouble for benchmarks assessing hazily defined concepts like “reasoning” or “scientific knowledge”—and for developers aiming to reach the much–hyped goal of artificial general intelligence—but it would put the industry on firmer ground as it looks to prove the worth of individual models.

“Taking validity seriously means asking folks in academia, industry, or wherever to show that their system does what they say it does,” says Abigail Jacobs, a University of Michigan professor who is a central figure in the new push for validity. “I think it points to a weakness in the AI world if they want to back off from showing that they can support their claim.”

The limits of traditional testing

If AI companies have been slow to respond to the growing failure of benchmarks, it’s partially because the test-scoring approach has been so effective for so long.

One of the biggest early successes of contemporary AI was the ImageNet challenge, a kind of antecedent to contemporary benchmarks. Released in 2010 as an open challenge to researchers, the database held more than 3 million images for AI systems to categorize into 1,000 different classes.

Crucially, the test was completely agnostic to methods, and any successful algorithm quickly gained credibility regardless of how it worked. When an algorithm called AlexNet broke through in 2012, with a then unconventional form of GPU training, it became one of the foundational results of modern AI. Few would have guessed in advance that AlexNet’s convolutional neural nets would be the secret to unlocking image recognition—but after it scored well, no one dared dispute it. (One of AlexNet’s developers, Ilya Sutskever, would go on to cofound OpenAI.)

A large part of what made this challenge so effective was that there was little practical difference between ImageNet’s object classification challenge and the actual process of asking a computer to recognize an image. Even if there were disputes about methods, no one doubted that the highest-scoring model would have an advantage when deployed in an actual image recognition system.

But in the 12 years since, AI researchers have applied that same method-agnostic approach to increasingly general tasks. SWE-Bench is commonly used as a proxy for broader coding ability, while other exam-style benchmarks often stand in for reasoning ability. That broad scope makes it difficult to be rigorous about what a specific benchmark measures—which, in turn, makes it hard to use the findings responsibly.

Where things break down

Anka Reuel, a PhD student who has been focusing on the benchmark problem as part of her research at Stanford, has become convinced the evaluation problem is the result of this push toward generality. “We’ve moved from task-specific models to general-purpose models,” Reuel says. “It’s not about a single task anymore but a whole bunch of tasks, so evaluation becomes harder.”

Like the University of Michigan’s Jacobs, Reuel thinks “the main issue with benchmarks is validity, even more than the practical implementation,” noting: “That’s where a lot of things break down.” For a task as complicated as coding, for instance, it’s nearly impossible to incorporate every possible scenario into your problem set. As a result, it’s hard to gauge whether a model is scoring better because it’s more skilled at coding or because it has more effectively manipulated the problem set. And with so much pressure on developers to achieve record scores, shortcuts are hard to resist.

For developers, the hope is that success on lots of specific benchmarks will add up to a generally capable model. But the techniques of agentic AI mean a single AI system can encompass a complex array of different models, making it hard to evaluate whether improvement on a specific task will lead to generalization. “There’s just many more knobs you can turn,” says Sayash Kapoor, a computer scientist at Princeton and a prominent critic of sloppy practices in the AI industry. “When it comes to agents, they have sort of given up on the best practices for evaluation.”

In a paper from last July, Kapoor called out specific issues in how AI models were approaching the WebArena benchmark, designed by Carnegie Mellon University researchers in 2024 as a test of an AI agent’s ability to traverse the web. The benchmark consists of more than 800 tasks to be performed on a set of cloned websites mimicking Reddit, Wikipedia, and others. Kapoor and his team identified an apparent hack in the winning model, called STeP. STeP included specific instructions about how Reddit structures URLs, allowing STeP models to jump directly to a given user’s profile page (a frequent element of WebArena tasks).

This shortcut wasn’t exactly cheating, but Kapoor sees it as “a serious misrepresentation of how well the agent would work had it seen the tasks in WebArena for the first time.” Because the technique was successful, though, a similar policy has since been adopted by OpenAI’s web agent Operator. (“Our evaluation setting is designed to assess how well an agent can solve tasks given some instruction about website structures and task execution,” an OpenAI representative said when reached for comment. “This approach is consistent with how others have used and reported results with WebArena.” STeP did not respond to a request for comment.)

Further highlighting the problem with AI benchmarks, late last month Kapoor and a team of researchers wrote a paper that revealed significant problems in Chatbot Arena, the popular crowdsourced evaluation system. According to the paper, the leaderboard was being manipulated; many top foundation models were conducting undisclosed private testing and releasing their scores selectively.

Today, even ImageNet itself, the mother of all benchmarks, has started to fall victim to validity problems. A 2023 study from researchers at the University of Washington and Google Research found that when ImageNet-winning algorithms were pitted against six real-world data sets, the architecture improvement “resulted in little to no progress,” suggesting that the external validity of the test had reached its limit.

Going smaller

For those who believe the main problem is validity, the best fix is reconnecting benchmarks to specific tasks. As Reuel puts it, AI developers “have to resort to these high-level benchmarks that are almost meaningless for downstream consumers, because the benchmark developers can’t anticipate the downstream task anymore.” So what if there was a way to help the downstream consumers identify this gap?

In November 2024, Reuel launched a public ranking project called BetterBench, which rates benchmarks on dozens of different criteria, such as whether the code has been publicly documented. But validity is a central theme, with particular criteria challenging designers to spell out what capability their benchmark is testing and how it relates to the tasks that make up the benchmark.

“You need to have a structural breakdown of the capabilities,” Reuel says. “What are the actual skills you care about, and how do you operationalize them into something we can measure?”

The results are surprising. One of the highest-scoring benchmarks is also the oldest: the Arcade Learning Environment (ALE), established in 2013 as a way to test models’ ability to learn how to play a library of Atari 2600 games. One of the lowest-scoring is the Massive Multitask Language Understanding (MMLU) benchmark, a widely used test for general language skills; by the standards of BetterBench, the connection between the questions and the underlying skill was too poorly defined.

BetterBench hasn’t meant much for the reputations of specific benchmarks, at least not yet; MMLU is still widely used, and ALE is still marginal. But the project has succeeded in pushing validity into the broader conversation about how to fix benchmarks. In April, Reuel quietly joined a new research group hosted by Hugging Face, the University of Edinburgh, and EleutherAI, where she’ll develop her ideas on validity and AI model evaluation with other figures in the field. (An official announcement is expected later this month.)

Irene Solaiman, Hugging Face’s head of global policy, says the group will focus on building valid benchmarks that go beyond measuring straightforward capabilities. “There’s just so much hunger for a good benchmark off the shelf that already works,” Solaiman says. “A lot of evaluations are trying to do too much.”

Increasingly, the rest of the industry seems to agree. In a paper in March, researchers from Google, Microsoft, Anthropic, and others laid out a new framework for improving evaluations—with validity as the first step.

“AI evaluation science must,” the researchers argue, “move beyond coarse grained claims of ‘general intelligence’ towards more task-specific and real-world relevant measures of progress.”

Measuring the “squishy” things

To help make this shift, some researchers are looking to the tools of social science. A February position paper argued that “evaluating GenAI systems is a social science measurement challenge,” specifically unpacking how the validity systems used in social measurements can be applied to AI benchmarking.

The authors, largely employed by Microsoft’s research branch but joined by academics from Stanford and the University of Michigan, point to the standards that social scientists use to measure contested concepts like ideology, democracy, and media bias. Applied to AI benchmarks, those same procedures could offer a way to measure concepts like “reasoning” and “math proficiency” without slipping into hazy generalizations.

In the social science literature, it’s particularly important that metrics begin with a rigorous definition of the concept measured by the test. For instance, if the test is to measure how democratic a society is, it first needs to establish a definition for a “democratic society” and then establish questions that are relevant to that definition.

To apply this to a benchmark like SWE-Bench, designers would need to set aside the classic machine learning approach, which is to collect programming problems from GitHub and create a scheme to validate answers as true or false. Instead, they’d first need to define what the benchmark aims to measure (“ability to resolve flagged issues in software,” for instance), break that into subskills (different types of problems or types of program that the AI model can successfully process), and then finally assemble questions that accurately cover the different subskills.

It’s a profound change from how AI researchers typically approach benchmarking—but for researchers like Jacobs, a coauthor on the February paper, that’s the whole point. “There’s a mismatch between what’s happening in the tech industry and these tools from social science,” she says. “We have decades and decades of thinking about how we want to measure these squishy things about humans.”

Even though the idea has made a real impact in the research world, it’s been slow to influence the way AI companies are actually using benchmarks.

The last two months have seen new model releases from OpenAI, Anthropic, Google, and Meta, and all of them lean heavily on multiple-choice knowledge benchmarks like MMLU—the exact approach that validity researchers are trying to move past. After all, model releases are, for the most part, still about showing increases in general intelligence, and broad benchmarks continue to be used to back up those claims.

For some observers, that’s good enough. Benchmarks, Wharton professor Ethan Mollick says, are “bad measures of things, but also they’re what we’ve got.” He adds: “At the same time, the models are getting better. A lot of sins are forgiven by fast progress.”

For now, the industry’s long-standing focus on artificial general intelligence seems to be crowding out a more focused validity-based approach. As long as AI models can keep growing in general intelligence, then specific applications don’t seem as compelling—even if that leaves practitioners relying on tools they no longer fully trust.

“This is the tightrope we’re walking,” says Hugging Face’s Solaiman. “It’s too easy to throw the system out, but evaluations are really helpful in understanding our models, even with these limitations.”

Russell Brandom is a freelance writer covering artificial intelligence. He lives in Brooklyn with his wife and two cats.

This story was supported by a grant from the Tarbell Center for AI Journalism.

Ecommerce MGMT 0 Comments

Ai AI Hype Index App Artificial intelligence

Apr 30 2025

The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

AI agents are the AI industry’s hypiest new product—intelligent assistants capable of completing tasks without human supervision. But while they can be theoretically useful—Simular AI’s S2 agent, for example, intelligently switches between models depending on what it’s been told to do—they could also be weaponized to execute cyberattacks. Elsewhere, OpenAI is reported to be throwing its hat into the social media arena, and AI models are getting more adept at making music. Oh, and if the results of the first half-marathon pitting humans against humanoid robots are anything to go by, we won’t have to worry about the robot uprising any time soon.

Ecommerce MGMT 0 Comments

Ai App Artificial intelligence

Apr 24 2025

Seeing AI as a collaborator, not a creator

The reason you are reading this letter from me today is that I was bored 30 years ago.

I was bored and curious about the world and so I wound up spending a lot of time in the university computer lab, screwing around on Usenet and the early World Wide Web, looking for interesting things to read. Soon enough I wasn’t content to just read stuff on the internet—I wanted to make it. So I learned HTML and made a basic web page, and then a better web page, and then a whole website full of web things. And then I just kept going from there. That amateurish collection of web pages led to a journalism internship with the online arm of a magazine that paid little attention to what we geeks were doing on the web. And that led to my first real journalism job, and then another, and, well, eventually this journalism job.

But none of that would have been possible if I hadn’t been bored and curious. And more to the point: curious about tech.

The university computer lab may seem at first like an unlikely center for creativity. We tend to think of creativity as happening more in the artist’s studio or writers’ workshop. But throughout history, very often our greatest creative leaps—and I would argue that the web and its descendants represent one such leap—have been due to advances in technology.

There are the big easy examples, like photography or the printing press, but it’s also true of all sorts of creative inventions that we often take for granted. Oil paints. Theaters. Musical scores. Electric synthesizers! Almost anywhere you look in the arts, perhaps outside of pure vocalization, technology has played a role.

But the key to artistic achievement has never been the technology itself. It has been the way artists have applied it to express our humanity. Think of the way we talk about the arts. We often compliment it with words that refer to our humanity, like soul, heart, and life; we often criticize it with descriptors such as sterile, clinical, or lifeless. (And sure, you can love a sterile piece of art, but typically that’s because the artist has leaned into sterility to make a point about humanity!)

All of which is to say I think that AI can be, will be, and already is a tool for creative expression, but that true art will always be something steered by human creativity, not machines.

I could be wrong. I hope not.

This issue, which was entirely produced by human beings using computers, explores creativity and the tension between the artist and technology. You can see it on our cover illustrated by Tom Humberstone, and read about it in stories from James O’Donnell, Will Douglas Heaven, Rebecca Ackermann, Michelle Kim, Bryan Gardiner, and Allison Arieff.

Yet of course, creativity is about more than just the arts. All of human advancement stems from creativity, because creativity is how we solve problems. So it was important to us to bring you accounts of that as well. You’ll find those in stories from Carrie Klein, Carly Kay, Matthew Ponsford, and Robin George Andrews. (If you’ve ever wanted to know how we might nuke an asteroid, this is the issue for you!)

We’re also trying to get a little more creative ourselves. Over the next few issues, you’ll notice some changes coming to this magazine with the addition of some new regular items (see Caiwei Chen’s “3 Things” for one such example). Among those changes, we are planning to solicit and publish more regular reader feedback and answer questions you may have about technology. We invite you to get creative and email us: newsroom@technologyreview.com.

As always, thanks for reading.

Ecommerce MGMT 0 Comments

Apr 22 2025

AI is pushing the limits of the physical world

Architecture often assumes a binary between built projects and theoretical ones. What physics allows in actual buildings, after all, is vastly different from what architects can imagine and design (often referred to as “paper architecture”). That imagination has long been supported and enabled by design technology, but the latest advancements in artificial intelligence have prompted a surge in the theoretical.

ai-generated shapes — **Karl Daubmann, College of Architecture and Design at Lawrence Technological University**
“Very often the new synthetic image that comes from a tool like Midjourney or Stable Diffusion feels new,” says Daubmann, “infused by each of the multiple tools but rarely completely derived from them.”

“Transductions: Artificial Intelligence in Architectural Experimentation,” a recent exhibition at the Pratt Institute in Brooklyn, brought together works from over 30 practitioners exploring the experimental, generative, and collaborative potential of artificial intelligence to open up new areas of architectural inquiry—something they’ve been working on for a decade or more, since long before AI became mainstream. Architects and exhibition co-curators Jason Vigneri-Beane, Olivia Vien, Stephen Slaughter, and Hart Marlow explain that the works in “Transductions” emerged out of feedback loops among architectural discourses, techniques, formats, and media that range from imagery, text, and animation to mixed-reality media and fabrication. The aim isn’t to present projects that are going to break ground anytime soon; architects already know how to build things with the tools they have. Instead, the show attempts to capture this very early stage in architecture’s exploratory engagement with AI.

Technology has long enabled architecture to push the limits of form and function. As early as 1963, Sketchpad, one of the first architectural software programs, allowed architects and designers to move and change objects on screen. Rapidly, traditional hand drawing gave way to an ever-expanding suite of programs—Revit, SketchUp, and BIM, among many others—that helped create floor plans and sections, track buildings’ energy usage, enhance sustainable construction, and aid in following building codes, to name just a few uses.

The architects exhibiting in “Transductions” view newly evolving forms of AI “like a new tool rather than a profession-ending development,” says Vigneri-Beane, despite what some of his peers fear about the technology. He adds, “I do appreciate that it’s a somewhat unnerving thing for people, [but] I feel a familiarity with the rhetoric.”

After all, he says, AI doesn’t just do the job. “To get something interesting and worth saving in AI, an enormous amount of time is required,” he says. “My architectural vocabulary has gotten much more precise and my visual sense has gotten an incredible workout, exercising all these muscles which have atrophied a little bit.”

Vien agrees: “I think these are extremely powerful tools for an architect and designer. Do I think it’s the entire future of architecture? No, but I think it’s a tool and a medium that can expand the long history of mediums and media that architects can use not just to represent their work but as a generator of ideas.”

**Andrew Kudless, Hines College of Architecture and Design**
This image, part of the Urban Resolution series, shows how the Stable Diffusion AI model “is unable to focus on constructing a realistic image and instead duplicates features that are prominent in the local latent space,” Kudless says.

Jason Vigneri-Beane, Pratt Institute
“These images are from a larger series on cyborg ecologies that have to do with co-creating with machines to imagine [other] machines,” says Vigneri-Beane. “I might refer to these as cryptomegafauna—infrastructural robots operating at an architectural scale.”

**Martin Summers, University of Kentucky College of Design**
“Most AI is racing to emulate reality,” says Summers. “I prefer to revel in the hallucinations and misinterpretations like glitches and the sublogic they reveal present in a mediated reality.”

**Jason Lee, Pratt Institute**
Lee typically uses AI “to generate iterations or high-resolution sketches,” he says. “I am also using it to experiment with how much realism one can incorporate with more abstract representation methods.”

Olivia Vien, Pratt Institute
For the series Imprinting Grounds, Vien created images digitally and fed them into Midjourney. “It riffs on the ideas of damask textile patterns in a more digital realm,” she says.

**Robert Lee Brackett III, Pratt Institute**
“While new software raises concerns about the absence of traditional tools like hand drawing and modeling, I view these technologies as collaborators rather than replacements,” Brackett says.

Ecommerce MGMT 0 Comments

Apr 12 2025

How AI is interacting with our creative human processes

In 2021, 20 years after the death of her older sister, Vauhini Vara was still unable to tell the story of her loss. “I wondered,” she writes in Searches, her new collection of essays on AI technology, “if Sam Altman’s machine could do it for me.” So she tried ChatGPT. But as it expanded on Vara’s prompts in sentences ranging from the stilted to the unsettling to the sublime, the thing she’d enlisted as a tool stopped seeming so mechanical.

“Once upon a time, she taught me to exist,” the AI model wrote of the young woman Vara had idolized. Vara, a journalist and novelist, called the resulting essay “Ghosts,” and in her opinion, the best lines didn’t come from her: “I found myself irresistibly attracted to GPT-3—to the way it offered, without judgment, to deliver words to a writer who has found herself at a loss for them … as I tried to write more honestly, the AI seemed to be doing the same.”

The rapid proliferation of AI in our lives introduces new challenges around authorship, authenticity, and ethics in work and art. But it also offers a particularly human problem in narrative: How can we make sense of these machines, not just use them? And how do the words we choose and stories we tell about technology affect the role we allow it to take on (or even take over) in our creative lives? Both Vara’s book and The Uncanny Muse, a collection of essays on the history of art and automation by the music critic David Hajdu, explore how humans have historically and personally wrestled with the ways in which machines relate to our own bodies, brains, and creativity. At the same time, The Mind Electric, a new book by a neurologist, Pria Anand, reminds us that our own inner workings may not be so easy to replicate.

Searches is a strange artifact. Part memoir, part critical analysis, and part AI-assisted creative experimentation, Vara’s essays trace her time as a tech reporter and then novelist in the San Francisco Bay Area alongside the history of the industry she watched grow up. Tech was always close enough to touch: One college friend was an early Google employee, and when Vara started reporting on Facebook (now Meta), she and Mark Zuckerberg became “friends” on his platform. In 2007, she published a scoop that the company was planning to introduce ad targeting based on users’ personal information—the first shot fired in the long, gnarly data war to come. In her essay “Stealing Great Ideas,” she talks about turning down a job reporting on Apple to go to graduate school for fiction. There, she wrote a novel about a tech founder, which was later published as The Immortal King Rao. Vara points out that in some ways at the time, her art was “inextricable from the resources [she] used to create it”—products like Google Docs, a MacBook, an iPhone. But these pre-AI resources were tools, plain and simple. What came next was different.

Interspersed with Vara’s essays are chapters of back-and-forths between the author and ChatGPT about the book itself, where the bot serves as editor at Vara’s prompting. ChatGPT obligingly summarizes and critiques her writing in a corporate-shaded tone that’s now familiar to any knowledge worker. “If there’s a place for disagreement,” it offers about the first few chapters on tech companies, “it might be in the balance of these narratives. Some might argue that the benefits—such as job creation, innovation in various sectors like AI and logistics, and contributions to the global economy—can outweigh the negatives.”

book cover — **Searches: Selfhood in the Digital Age**
Vauhini Vara

Vara notices that ChatGPT writes “we” and “our” in these responses, pulling it into the human story, not the tech one: “Earlier you mentioned ‘our access to information’ and ‘our collective experiences and understandings.’” When she asks what the rhetorical purpose of that choice is, ChatGPT responds with a numbered list of benefits including “inclusivity and solidarity” and “neutrality and objectivity.” It adds that “using the first-person plural helps to frame the discussion in terms of shared human experiences and collective challenges.” Does the bot believe it’s human? Or at least, do the humans who made it want other humans to believe it does? “Can corporations use these [rhetorical] tools in their products too, to subtly make people identify with, and not in opposition to, them?” Vara asks. ChatGPT replies, “Absolutely.”

Vara has concerns about the words she’s used as well. In “Thank You for Your Important Work,” she worries about the impact of “Ghosts,” which went viral after it was first published. Had her writing helped corporations hide the reality of AI behind a velvet curtain? She’d meant to offer a nuanced “provocation,” exploring how uncanny generative AI can be. But instead, she’d produced something beautiful enough to resonate as an ad for its creative potential. Even Vara herself felt fooled. She particularly loved one passage the bot wrote, about Vara and her sister as kids holding hands on a long drive. But she couldn’t imagine either of them being so sentimental. What Vara had elicited from the machine, she realized, was “wish fulfillment,” not a haunting.

The rapid proliferation of AI in our lives introduces new challenges around authorship, authenticity, and ethics in work and art. How can we make sense of these machines, not just use them?

The machine wasn’t the only thing crouching behind that too-good-to-be-true curtain. The GPT models and others are trained through human labor, in sometimes exploitative conditions. And much of the training data was the creative work of human writers before her. “I’d conjured artificial language about grief through the extraction of real human beings’ language about grief,” she writes. The creative ghosts in the model were made of code, yes, but also, ultimately, made of people. Maybe Vara’s essay helped cover up that truth too.

In the book’s final essay, Vara offers a mirror image of those AI call-and-response exchanges as an antidote. After sending out an anonymous survey to women of various ages, she presents the replies to each question, one after the other. “Describe something that doesn’t exist,” she prompts, and the women respond: “God.” “God.” “God.” “Perfection.” “My job. (Lost it.)” Real people contradict each other, joke, yell, mourn, and reminisce. Instead of a single authoritative voice—an editor, or a company’s limited style guide—Vara gives us the full gasping crowd of human creativity. “What’s it like to be alive?” Vara asks the group. “It depends,” one woman answers.

David Hajdu, now music editor at The Nation and previously a music critic for The New Republic, goes back much further than the early years of Facebook to tell the history of how humans have made and used machines to express ourselves. Player pianos, microphones, synthesizers, and electrical instruments were all assistive technologies that faced skepticism before acceptance and, sometimes, elevation in music and popular culture. They even influenced the kind of art people were able to and wanted to make. Electrical amplification, for instance, allowed singers to use a wider vocal range and still reach an audience. The synthesizer introduced a new lexicon of sound to rock music. “What’s so bad about being mechanical, anyway?” Hajdu asks in The Uncanny Muse. And “what’s so great about being human?”

book cover of the Uncanny Muse — **The Uncanny Muse: Music, Art, and Machines from Automata to AI**
David Hajdu

But Hajdu is also interested in how intertwined the history of man and machine can be, and how often we’ve used one as a metaphor for the other. Descartes saw the body as empty machinery for consciousness, he reminds us. Hobbes wrote that “life is but a motion of limbs.” Freud described the mind as a steam engine. Andy Warhol told an interviewer that “everybody should be a machine.” And when computers entered the scene, humans used them as metaphors for themselves too. “Where the machine model had once helped us understand the human body … a new category of machines led us to imagine the brain (how we think, what we know, even how we feel or how we think about what we feel) in terms of the computer,” Hajdu writes.

But what is lost with these one-to-one mappings? What happens when we imagine that the complexity of the brain—an organ we do not even come close to fully understanding—can be replicated in 1s and 0s? Maybe what happens is we get a world full of chatbots and agents, computer-generated artworks and AI DJs, that companies claim are singular creative voices rather than remixes of a million human inputs. And perhaps we also get projects like the painfully named Painting Fool—an AI that paints, developed by Simon Colton, a scholar at Queen Mary University of London. He told Hajdu that he wanted to “demonstrate the potential of a computer program to be taken seriously as a creative artist in its own right.” What Colton means is not just a machine that makes art but one that expresses its own worldview: “Art that communicates what it’s like to be a machine.”

What happens when we imagine that the complexity of the brain—an organ we do not even come close to fully understanding—can be replicated in 1s and 0s?

Hajdu seems to be curious and optimistic about this line of inquiry. “Machines of many kinds have been communicating things for ages, playing invaluable roles in our communication through art,” he says. “Growing in intelligence, machines may still have more to communicate, if we let them.” But the question that The Uncanny Muse raises at the end is: Why should we art-making humans be so quick to hand over the paint to the paintbrush? Why do we care how the paintbrush sees the world? Are we truly finished telling our own stories ourselves?

Pria Anand might say no. In The Mind Electric, she writes: “Narrative is universally, spectacularly human; it is as unconscious as breathing, as essential as sleep, as comforting as familiarity. It has the capacity to bind us, but also to other, to lay bare, but also obscure.” The electricity in The Mind Electric belongs entirely to the human brain—no metaphor necessary. Instead, the book explores a number of neurological afflictions and the stories patients and doctors tell to better understand them. “The truth of our bodies and minds is as strange as fiction,” Anand writes—and the language she uses throughout the book is as evocative as that in any novel.

cover of the Mind Electric — **The Mind Electric: A Neurologist on the Strangeness and Wonder of Our Brains**
Pria Anand

In personal and deeply researched vignettes in the tradition of Oliver Sacks, Anand shows that any comparison between brains and machines will inevitably fall flat. She tells of patients who see clear images when they’re functionally blind, invent entire backstories when they’ve lost a memory, break along seams that few can find, and—yes—see and hear ghosts. In fact, Anand cites one study of 375 college students in which researchers found that nearly three-quarters “had heard a voice that no one else could hear.” These were not diagnosed schizophrenics or sufferers of brain tumors—just people listening to their own uncanny muses. Many heard their name, others heard God, and some could make out the voice of a loved one who’d passed on. Anand suggests that writers throughout history have harnessed organic exchanges with these internal apparitions to make art. “I see myself taking the breath of these voices in my sails,” Virginia Woolf wrote of her own experiences with ghostly sounds. “I am a porous vessel afloat on sensation.” The mind in The Mind Electric is vast, mysterious, and populated. The narratives people construct to traverse it are just as full of wonder.

Humans are not going to stop using technology to help us create anytime soon—and there’s no reason we should. Machines make for wonderful tools, as they always have. But when we turn the tools themselves into artists and storytellers, brains and bodies, magicians and ghosts, we bypass truth for wish fulfillment. Maybe what’s worse, we rob ourselves of the opportunity to contribute our own voices to the lively and loud chorus of human experience. And we keep others from the human pleasure of hearing them too.

Rebecca Ackermann is a writer, designer, and artist based in San Francisco.

Ecommerce MGMT 0 Comments

Apr 11 2025

How AI can help supercharge creativity

Sometimes Lizzie Wilson shows up to a rave with her AI sidekick.

One weeknight this past February, Wilson plugged her laptop into a projector that threw her screen onto the wall of a low-ceilinged loft space in East London. A small crowd shuffled in the glow of dim pink lights. Wilson sat down and started programming.

Techno clicks and whirs thumped from the venue’s speakers. The audience watched, heads nodding, as Wilson tapped out code line by line on the projected screen—tweaking sounds, looping beats, pulling a face when she messed up.

Wilson is a live coder. Instead of using purpose-built software like most electronic music producers, live coders create music by writing the code to generate it on the fly. It’s an improvised performance art known as algorave.

“It’s kind of boring when you go to watch a show and someone’s just sitting there on their laptop,” she says. “You can enjoy the music, but there’s a performative aspect that’s missing. With live coding, everyone can see what it is that I’m typing. And when I’ve had my laptop crash, people really like that. They start cheering.”

Taking risks is part of the vibe. And so Wilson likes to dial up her performances one more notch by riffing off what she calls a live-coding agent, a generative AI model that comes up with its own beats and loops to add to the mix. Often the model suggests sound combinations that Wilson hadn’t thought of. “You get these elements of surprise,” she says. “You just have to go for it.”

two performers at a table with a disapproving cat covered in code on a screen behind them

Wilson, a researcher at the Creative Computing Institute at the University of the Arts London, is just one of many working on what’s known as co-creativity or more-than-human creativity. The idea is that AI can be used to inspire or critique creative projects, helping people make things that they would not have made by themselves. She and her colleagues built the live-coding agent to explore how artificial intelligence can be used to support human artistic endeavors—in Wilson’s case, musical improvisation.

It’s a vision that goes beyond the promise of existing generative tools put out by companies like OpenAI and Google DeepMind. Those can automate a striking range of creative tasks and offer near-instant gratification—but at what cost? Some artists and researchers fear that such technology could turn us into passive consumers of yet more AI slop.

And so they are looking for ways to inject human creativity back into the process. The aim is to develop AI tools that augment our creativity rather than strip it from us—pushing us to be better at composing music, developing games, designing toys, and much more—and lay the groundwork for a future in which humans and machines create things together.

Ultimately, generative models could offer artists and designers a whole new medium, pushing them to make things that couldn’t have been made before, and give everyone creative superpowers.

Explosion of creativity

There’s no one way to be creative, but we all do it. We make everything from memes to masterpieces, infant doodles to industrial designs. There’s a mistaken belief, typically among adults, that creativity is something you grow out of. But being creative—whether cooking, singing in the shower, or putting together super-weird TikToks—is still something that most of us do just for the fun of it. It doesn’t have to be high art or a world-changing idea (and yet it can be). Creativity is basic human behavior; it should be celebrated and encouraged.

When generative text-to-image models like Midjourney, OpenAI’s DALL-E, and the popular open-source Stable Diffusion arrived, they sparked an explosion of what looked a lot like creativity. Millions of people were now able to create remarkable images of pretty much anything, in any style, with the click of a button. Text-to-video models came next. Now startups like Udio are developing similar tools for music. Never before have the fruits of creation been within reach of so many.

But for a number of researchers and artists, the hype around these tools has warped the idea of what creativity really is. “If I ask the AI to create something for me, that’s not me being creative,” says Jeba Rezwana, who works on co-creativity at Towson University in Maryland. “It’s a one-shot interaction: You click on it and it generates something and that’s it. You cannot say ‘I like this part, but maybe change something here.’ You cannot have a back-and-forth dialogue.”

Rezwana is referring to the way most generative models are set up. You can give the tools feedback and ask them to have another go. But each new result is generated from scratch, which can make it hard to nail exactly what you want. As the filmmaker Walter Woodman put it last year after his art collective Shy Kids made a short film with OpenAI’s text-to-video model for the first time: “Sora is a slot machine as to what you get back.”

What’s more, the latest versions of some of these generative tools do not even use your submitted prompt as is to produce an image or video (at least not on their default settings). Before a prompt is sent to the model, the software edits it—often by adding dozens of hidden words—to make it more likely that the generated image will appear polished.

“Extra things get added to juice the output,” says Mike Cook, a computational creativity researcher at King’s College London. “Try asking Midjourney to give you a bad drawing of something—it can’t do it.” These tools do not give you what you want; they give you what their designers think you want.

All of which is fine if you just need a quick image and don’t care too much about the details, says Nick Bryan-Kinns, also at the Creative Computing Institute: “Maybe you want to make a Christmas card for your family or a flyer for your community cake sale. These tools are great for that.”

In short, existing generative models have made it easy to create, but they have not made it easy to be creative. And there’s a big difference between the two. For Cook, relying on such tools could in fact harm people’s creative development in the long run. “Although many of these creative AI systems are promoted as making creativity more accessible,” he wrote in a paper published last year, they might instead have “adverse effects on their users in terms of restricting their ability to innovate, ideate, and create.” Given how much generative models have been championed for putting creative abilities at everyone’s fingertips, the suggestion that they might in fact do the opposite is damning.

screenshot from the game with overlapping saws — In the game Disc Room, players navigate a room of moving buzz saws.

screenshot from the AI-generated game with tiny saws — Cook used AI to design a new level for the game. The result was a room where none of the discs actually moved.

He’s far from the only researcher worrying about the cognitive impact of these technologies. In February a team at Microsoft Research Cambridge published a report concluding that generative AI tools “can inhibit critical engagement with work and can potentially lead to long-term overreliance on the tool and diminished skill for independent problem-solving.” The researchers found that with the use of generative tools, people’s effort “shifts from task execution to task stewardship.”

Cook is concerned that generative tools don’t let you fail—a crucial part of learning new skills. We have a habit of saying that artists are gifted, says Cook. But the truth is that artists work at their art, developing skills over months and years.

“If you actually talk to artists, they say, ‘Well, I got good by doing it over and over and over,’” he says. “But failure sucks. And we’re always looking at ways to get around that.”

Generative models let us skip the frustration of doing a bad job.

“Unfortunately, we’re removing the one thing that you have to do to develop creative skills for yourself, which is fail,” says Cook. “But absolutely nobody wants to hear that.”

Surprise me

And yet it’s not all bad news. Artists and researchers are buzzing at the ways generative tools could empower creators, pointing them in surprising new directions and steering them away from dead ends. Cook thinks the real promise of AI will be to help us get better at what we want to do rather than doing it for us. For that, he says, we’ll need to create new tools, different from the ones we have now. “Using Midjourney does not do anything for me—it doesn’t change anything about me,” he says. “And I think that’s a wasted opportunity.”

Ask a range of researchers studying creativity to name a key part of the creative process and many will say: reflection. It’s hard to define exactly, but reflection is a particular type of focused, deliberate thinking. It’s what happens when a new idea hits you. Or when an assumption you had turns out to be wrong and you need to rethink your approach. It’s the opposite of a one-shot interaction.

Looking for ways that AI might support or encourage reflection—asking it to throw new ideas into the mix or challenge ideas you already hold—is a common thread across co-creativity research. If generative tools like DALL-E make creation frictionless, the aim here is to add friction back in. “How can we make art without friction?” asks Elisa Giaccardi, who studies design at the Polytechnic University of Milan in Italy. “How can we engage in a truly creative process without material that pushes back?”

Take Wilson’s live-coding agent. She claims that it pushes her musical improvisation in directions she might not have taken by herself. Trained on public code shared by the wider live-coding community, the model suggests snippets of code that are closer to other people’s styles than her own. This makes it more likely to produce something unexpected. “Not because you couldn’t produce it yourself,” she says. “But the way the human brain works, you tend to fall back on repeated ideas.”

Last year, Wilson took part in a study run by Bryan-Kinns and his colleagues in which they surveyed six experienced musicians as they used a variety of generative models to help them compose a piece of music. The researchers wanted to get a sense of what kinds of interactions with the technology were useful and which were not.

The participants all said they liked it when the models made surprising suggestions, even when those were the result of glitches or mistakes. Sometimes the results were simply better. Sometimes the process felt fresh and exciting. But a few people struggled with giving up control. It was hard to direct the models to produce specific results or to repeat results that the musicians had liked. “In some ways it’s the same as being in a band,” says Bryan-Kinns. “You need to have that sense of risk and a sense of surprise, but you don’t want it totally random.”

Alternative designs

Cook comes at surprise from a different angle: He coaxes unexpected insights out of AI tools that he has developed to co-create video games. One of his tools, Puck, which was first released in 2022, generates designs for simple shape-matching puzzle games like Candy Crush or Bejeweled. A lot of Puck’s designs are experimental and clunky—don’t expect it to come up with anything you are ever likely to play. But that’s not the point: Cook uses Puck—and a newer tool called Pixie—to explore what kinds of interactions people might want to have with a co-creative tool.

Pixie can read computer code for a game and tweak certain lines to come up with alternative designs. Not long ago, Cook was working on a copy of a popular game called Disc Room, in which players have to cross a room full of moving buzz saws. He asked Pixie to help him come up with a design for a level that skilled and unskilled players would find equally hard. Pixie designed a room where none of the discs actually moved. Cook laughs: It’s not what he expected. “It basically turned the room into a minefield,” he says. “But I thought it was really interesting. I hadn’t thought of that before.”

Researcher Anne Arzberger developed experimental AI tools to come up with gender-neutral toy designs.

Pushing back on assumptions, or being challenged, is part of the creative process, says Anne Arzberger, a researcher at the Delft University of Technology in the Netherlands. “If I think of the people I’ve collaborated with best, they’re not the ones who just said ‘Yes, great’ to every idea I brought forth,” she says. “They were really critical and had opposing ideas.”

She wants to build tech that provides a similar sounding board. As part of a project called Creating Monsters, Arzberger developed two experimental AI tools that help designers find hidden biases in their designs. “I was interested in ways in which I could use this technology to access information that would otherwise be difficult to access,” she says.

For the project, she and her colleagues looked at the problem of designing toy figures that would be gender neutral. She and her colleagues (including Giaccardi) used Teachable Machine, a web app built by Google researchers in 2017 that makes it easy to train your own machine-learning model to classify different inputs, such as images. They trained this model with a few dozen images that Arzberger had labeled as being masculine, feminine, or gender neutral.

Arzberger then asked the model to identify the genders of new candidate toy designs. She found that quite a few designs were judged to be feminine even when she had tried to make them gender neutral. She felt that her views of the world—her own hidden biases—were being exposed. But the tool was often right: It challenged her assumptions and helped the team improve the designs. The same approach could be used to assess all sorts of design characteristics, she says.

Arzberger then used a second model, a version of a tool made by the generative image and video startup Runway, to come up with gender-neutral toy designs of its own. First the researchers trained the model to generate and classify designs for male- and female-looking toys. They could then ask the tool to find a design that was exactly midway between the male and female designs it had learned.

Generative models can give feedback on designs that human designers might miss by themselves, she says: “We can really learn something.”

Taking control

The history of technology is full of breakthroughs that changed the way art gets made, from recipes for vibrant new paint colors to photography to synthesizers. In the 1960s, the Stanford researcher John Chowning spent years working on an esoteric algorithm that could manipulate the frequencies of computer-generated sounds. Stanford licensed the tech to Yamaha, which built it into its synthesizers—including the DX7, the cool new sound behind 1980s hits such as Tina Turner’s “The Best,” A-ha’s “Take On Me,” and Prince’s “When Doves Cry.”

Bryan-Kinns is fascinated by how artists and designers find ways to use new technologies. “If you talk to artists, most of them don’t actually talk about these AI generative models as a tool—they talk about them as a material, like an artistic material, like a paint or something,” he says. “It’s a different way of thinking about what the AI is doing.” He highlights the way some people are pushing the technology to do weird things it wasn’t designed to do. Artists often appropriate or misuse these kinds of tools, he says.

Bryan-Kinns points to the work of Terence Broad, another colleague of his at the Creative Computing Institute, as a favorite example. Broad employs techniques like network bending, which involves inserting new layers into a neural network to produce glitchy visual effects in generated images, and generating images with a model trained on no data, which produces almost Rothko-like abstract swabs of color.

But Broad is an extreme case. Bryan-Kinns sums it up like this: “The problem is that you’ve got this gulf between the very commercial generative tools that produce super-high-quality outputs but you’ve got very little control over what they do—and then you’ve got this other end where you’ve got total control over what they’re doing but the barriers to use are high because you need to be somebody who’s comfortable getting under the hood of your computer.”

“That’s a small number of people,” he says. “It’s a very small number of artists.”

Arzberger admits that working with her models was not straightforward. Running them took several hours, and she’s not sure the Runway tool she used is even available anymore. Bryan-Kinns, Arzberger, Cook, and others want to take the kinds of creative interactions they are discovering and build them into tools that can be used by people who aren’t hardcore coders.

Researcher Terence Broad creates dynamic images using a model trained on no data, which produces almost Rothko-like abstract color fields.

Finding the right balance between surprise and control will be hard, though. Midjourney can surprise, but it gives few levers for controlling what it produces beyond your prompt. Some have claimed that writing prompts is itself a creative act. “But no one struggles with a paintbrush the way they struggle with a prompt,” says Cook.

Faced with that struggle, Cook sometimes watches his students just go with the first results a generative tool gives them. “I’m really interested in this idea that we are priming ourselves to accept that whatever comes out of a model is what you asked for,” he says. He is designing an experiment that will vary single words and phrases in similar prompts to test how much of a mismatch people see between what they expect and what they get.

But it’s early days yet. In the meantime, companies developing generative models typically emphasize results over process. “There’s this impressive algorithmic progress, but a lot of the time interaction design is overlooked,” says Rezwana.

For Wilson, the crucial choice in any co-creative relationship is what you do with what you’re given. “You’re having this relationship with the computer that you’re trying to mediate,” she says. “Sometimes it goes wrong, and that’s just part of the creative process.”

When AI gives you lemons—make art. “Wouldn’t it be fun to have something that was completely antagonistic in a performance—like, something that is actively going against you—and you kind of have an argument?” she says. “That would be interesting to watch, at least.”

Ecommerce MGMT 0 Comments

Ai AI Hype Index App Artificial intelligence

Mar 27 2025

The AI Hype Index: DeepSeek mania, Israel’s spying tool, and cheating at chess

While AI models are certainly capable of creating interesting and sometimes entertaining material, their output isn’t necessarily useful. Google DeepMind is hoping that its new robotics model could make machines more receptive to verbal commands, paving the way for us to simply speak orders to them aloud. Elsewhere, the Chinese startup Monica has created Manus, which it claims is the very first general AI agent to complete truly useful tasks. And burnt-out coders are allowing AI to take the wheel entirely in a new practice dubbed “vibe coding.”

Ecommerce MGMT 0 Comments

Jan 7 2025

AI means the end of internet search as we’ve known it

We all know what it means, colloquially, to google something. You pop a few relevant words in a search box and in return get a list of blue links to the most relevant results. Maybe some quick explanations up top. Maybe some maps or sports scores or a video. But fundamentally, it’s just fetching information that’s already out there on the internet and showing it to you, in some sort of structured way.

But all that is up for grabs. We are at a new inflection point.

The biggest change to the way search engines have delivered information to us since the 1990s is happening right now. No more keyword searching. No more sorting through links to click. Instead, we’re entering an era of conversational search. Which means instead of keywords, you use real questions, expressed in natural language. And instead of links, you’ll increasingly be met with answers, written by generative AI and based on live information from all across the internet, delivered the same way.

Of course, Google—the company that has defined search for the past 25 years—is trying to be out front on this. In May of 2023, it began testing AI-generated responses to search queries, using its large language model (LLM) to deliver the kinds of answers you might expect from an expert source or trusted friend. It calls these AI Overviews. Google CEO Sundar Pichai described this to MIT Technology Review as “one of the most positive changes we’ve done to search in a long, long time.”

AI Overviews fundamentally change the kinds of queries Google can address. You can now ask it things like “I’m going to Japan for one week next month. I’ll be staying in Tokyo but would like to take some day trips. Are there any festivals happening nearby? How will the surfing be in Kamakura? Are there any good bands playing?” And you’ll get an answer—not just a link to Reddit, but a built-out answer with current results.

More to the point, you can attempt searches that were once pretty much impossible, and get the right answer. You don’t have to be able to articulate what, precisely, you are looking for. You can describe what the bird in your yard looks like, or what the issue seems to be with your refrigerator, or that weird noise your car is making, and get an almost human explanation put together from sources previously siloed across the internet. It’s amazing, and once you start searching that way, it’s addictive.

And it’s not just Google. OpenAI’s ChatGPT now has access to the web, making it far better at finding up-to-date answers to your queries. Microsoft released generative search results for Bing in September. Meta has its own version. The startup Perplexity was doing the same, but with a “move fast, break things” ethos. Literal trillions of dollars are at stake in the outcome as these players jockey to become the next go-to source for information retrieval—the next Google.

Not everyone is excited for the change. Publishers are completely freaked out. The shift has heightened fears of a “zero-click” future, where search referral traffic—a mainstay of the web since before Google existed—vanishes from the scene.

I got a vision of that future last June, when I got a push alert from the Perplexity app on my phone. Perplexity is a startup trying to reinvent web search. But in addition to delivering deep answers to queries, it will create entire articles about the news of the day, cobbled together by AI from different sources.

On that day, it pushed me a story about a new drone company from Eric Schmidt. I recognized the story. Forbes had reported it exclusively, earlier in the week, but it had been locked behind a paywall. The image on Perplexity’s story looked identical to one from Forbes. The language and structure were quite similar. It was effectively the same story, but freely available to anyone on the internet. I texted a friend who had edited the original story to ask if Forbes had a deal with the startup to republish its content. But there was no deal. He was shocked and furious and, well, perplexed. He wasn’t alone. Forbes, the New York Times, and Condé Nast have now all sent the company cease-and-desist orders. News Corp is suing for damages.

People are worried about what these new LLM-powered results will mean for our fundamental shared reality. It could spell the end of the canonical answer.

It was precisely the nightmare scenario publishers have been so afraid of: The AI was hoovering up their premium content, repackaging it, and promoting it to its audience in a way that didn’t really leave any reason to click through to the original. In fact, on Perplexity’s About page, the first reason it lists to choose the search engine is “Skip the links.”

But this isn’t just about publishers (or my own self-interest).

People are also worried about what these new LLM-powered results will mean for our fundamental shared reality. Language models have a tendency to make stuff up—they can hallucinate nonsense. Moreover, generative AI can serve up an entirely new answer to the same question every time, or provide different answers to different people on the basis of what it knows about them. It could spell the end of the canonical answer.

But make no mistake: This is the future of search. Try it for a bit yourself, and you’ll see.

Sure, we will always want to use search engines to navigate the web and to discover new and interesting sources of information. But the links out are taking a back seat. The way AI can put together a well-reasoned answer to just about any kind of question, drawing on real-time data from across the web, just offers a better experience. That is especially true compared with what web search has become in recent years. If it’s not exactly broken (data shows more people are searching with Google more often than ever before), it’s at the very least increasingly cluttered and daunting to navigate.

Who wants to have to speak the language of search engines to find what you need? Who wants to navigate links when you can have straight answers? And maybe: Who wants to have to learn when you can just know?

In the beginning there was Archie. It was the first real internet search engine, and it crawled files previously hidden in the darkness of remote servers. It didn’t tell you what was in those files—just their names. It didn’t preview images; it didn’t have a hierarchy of results, or even much of an interface. But it was a start. And it was pretty good.

Then Tim Berners-Lee created the World Wide Web, and all manner of web pages sprang forth. The Mosaic home page and the Internet Movie Database and Geocities and the Hampster Dance and web rings and Salon and eBay and CNN and federal government sites and some guy’s home page in Turkey.

Until finally, there was too much web to even know where to start. We really needed a better way to navigate our way around, to actually find the things we needed.

And so in 1994 Jerry Yang created Yahoo, a hierarchical directory of websites. It quickly became the home page for millions of people. And it was … well, it was okay. TBH, and with the benefit of hindsight, I think we all thought it was much better back then than it actually was.

But the web continued to grow and sprawl and expand, every day bringing more information online. Rather than just a list of sites by category, we needed something that actually looked at all that content and indexed it. By the late ’90s that meant choosing from a variety of search engines: AltaVista and AlltheWeb and WebCrawler and HotBot. And they were good—a huge improvement. At least at first.

But alongside the rise of search engines came the first attempts to exploit their ability to deliver traffic. Precious, valuable traffic, which web publishers rely on to sell ads and retailers use to get eyeballs on their goods. Sometimes this meant stuffing pages with keywords or nonsense text designed purely to push pages higher up in search results. It got pretty bad.

And then came Google. It’s hard to overstate how revolutionary Google was when it launched in 1998. Rather than just scanning the content, it also looked at the sources linking to a website, which helped evaluate its relevance. To oversimplify: The more something was cited elsewhere, the more reliable Google considered it, and the higher it would appear in results. This breakthrough made Google radically better at retrieving relevant results than anything that had come before. It was amazing.

Google CEO Sundar Pichai describes AI Overviews as “one of the most positive changes we’ve done to search in a long, long time.”

For 25 years, Google dominated search. Google was search, for most people. (The extent of that domination is currently the subject of multiple legal probes in the United States and the European Union.)

But Google has long been moving away from simply serving up a series of blue links, notes Pandu Nayak, Google’s chief scientist for search.

“It’s not just so-called web results, but there are images and videos, and special things for news. There have been direct answers, dictionary answers, sports, answers that come with Knowledge Graph, things like featured snippets,” he says, rattling off a litany of Google’s steps over the years to answer questions more directly.

It’s true: Google has evolved over time, becoming more and more of an answer portal. It has added tools that allow people to just get an answer—the live score to a game, the hours a café is open, or a snippet from the FDA’s website—rather than being pointed to a website where the answer may be.

But once you’ve used AI Overviews a bit, you realize they are different.

Take featured snippets, the passages Google sometimes chooses to highlight and show atop the results themselves. Those words are quoted directly from an original source. The same is true of knowledge panels, which are generated from information stored in a range of public databases and Google’s Knowledge Graph, its database of trillions of facts about the world.

While these can be inaccurate, the information source is knowable (and fixable). It’s in a database. You can look it up. Not anymore: AI Overviews can be entirely new every time, generated on the fly by a language model’s predictive text combined with an index of the web.

“I think it’s an exciting moment where we have obviously indexed the world. We built deep understanding on top of it with Knowledge Graph. We’ve been using LLMs and generative AI to improve our understanding of all that,” Pichai told MIT Technology Review. “But now we are able to generate and compose with that.”

The result feels less like a querying a database than like asking a very smart, well-read friend. (With the caveat that the friend will sometimes make things up if she does not know the answer.)

“[The company’s] mission is organizing the world’s information,” Liz Reid, Google’s head of search, tells me from its headquarters in Mountain View, California. “But actually, for a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you.”

That second concept—accessibility—is what Google is really keying in on with AI Overviews. It’s a sentiment I hear echoed repeatedly while talking to Google execs: They can address more complicated types of queries more efficiently by bringing in a language model to help supply the answers. And they can do it in natural language.

That will become even more important for a future where search goes beyond text queries. For example, Google Lens, which lets people take a picture or upload an image to find out more about something, uses AI-generated answers to tell you what you may be looking at. Google has even showed off the ability to query live video.

When it doesn’t have an answer, an AI model can confidently spew back a response anyway. For Google, this could be a real problem. For the rest of us, it could actually be dangerous.

“We are definitely at the start of a journey where people are going to be able to ask, and get answered, much more complex questions than where we’ve been in the past decade,” says Pichai.

There are some real hazards here. First and foremost: Large language models will lie to you. They hallucinate. They get shit wrong. When it doesn’t have an answer, an AI model can blithely and confidently spew back a response anyway. For Google, which has built its reputation over the past 20 years on reliability, this could be a real problem. For the rest of us, it could actually be dangerous.

In May 2024, AI Overviews were rolled out to everyone in the US. Things didn’t go well. Google, long the world’s reference desk, told people to eat rocks and to put glue on their pizza. These answers were mostly in response to what the company calls adversarial queries—those designed to trip it up. But still. It didn’t look good. The company quickly went to work fixing the problems—for example, by deprecating so-called user-generated content from sites like Reddit, where some of the weirder answers had come from.

Yet while its errors telling people to eat rocks got all the attention, the more pernicious danger might arise when it gets something less obviously wrong. For example, in doing research for this article, I asked Google when MIT Technology Review went online. It helpfully responded that “MIT Technology Review launched its online presence in late 2022.” This was clearly wrong to me, but for someone completely unfamiliar with the publication, would the error leap out?

I came across several examples like this, both in Google and in OpenAI’s ChatGPT search. Stuff that’s just far enough off the mark not to be immediately seen as wrong. Google is banking that it can continue to improve these results over time by relying on what it knows about quality sources.

“When we produce AI Overviews,” says Nayak, “we look for corroborating information from the search results, and the search results themselves are designed to be from these reliable sources whenever possible. These are some of the mechanisms we have in place that assure that if you just consume the AI Overview, and you don’t want to look further … we hope that you will still get a reliable, trustworthy answer.”

In the case above, the 2022 answer seemingly came from a reliable source—a story about MIT Technology Review’s email newsletters, which launched in 2022. But the machine fundamentally misunderstood. This is one of the reasons Google uses human beings—raters—to evaluate the results it delivers for accuracy. Ratings don’t correct or control individual AI Overviews; rather, they help train the model to build better answers. But human raters can be fallible. Google is working on that too.

“Raters who look at your experiments may not notice the hallucination because it feels sort of natural,” says Nayak. “And so you have to really work at the evaluation setup to make sure that when there is a hallucination, someone’s able to point out and say, That’s a problem.”

The new search

Google has rolled out its AI Overviews to upwards of a billion people in more than 100 countries, but it is facing upstarts with new ideas about how search should work.

Search Engine

Google
The search giant has added AI Overviews to search results. These overviews take information from around the web and Google’s Knowledge Graph and use the company’s Gemini language model to create answers to search queries.

What it’s good at

Google’s AI Overviews are great at giving an easily digestible summary in response to even the most complex queries, with sourcing boxes adjacent to the answers. Among the major options, its deep web index feels the most “internety.” But web publishers fear its summaries will give people little reason to click through to the source material.

Perplexity
Perplexity is a conversational search engine that uses third-party large
language models from OpenAI and Anthropic to answer queries.

Perplexity is fantastic at putting together deeper dives in response to user queries, producing answers that are like mini white papers on complex topics. It’s also excellent at summing up current events. But it has gotten a bad rep with publishers, who say it plays fast and loose with their content.

ChatGPT
While Google brought AI to search, OpenAI brought search to ChatGPT. Queries that the model determines will benefit from a web search automatically trigger one, or users can manually select the option to add a web search.

Thanks to its ability to preserve context across a conversation, ChatGPT works well for performing searches that benefit from follow-up questions—like planning a vacation through multiple search sessions. OpenAI says users sometimes go “20 turns deep” in researching queries. Of these three, it makes links out to publishers least prominent.

When I talked to Pichai about this, he expressed optimism about the company’s ability to maintain accuracy even with the LLM generating responses. That’s because AI Overviews is based on Google’s flagship large language model, Gemini, but also draws from Knowledge Graph and what it considers reputable sources around the web.

“You’re always dealing in percentages. What we have done is deliver it at, like, what I would call a few nines of trust and factuality and quality. I’d say 99-point-few-nines. I think that’s the bar we operate at, and it is true with AI Overviews too,” he says. “And so the question is, are we able to do this again at scale? And I think we are.”

There’s another hazard as well, though, which is that people ask Google all sorts of weird things. If you want to know someone’s darkest secrets, look at their search history. Sometimes the things people ask Google about are extremely dark. Sometimes they are illegal. Google doesn’t just have to be able to deploy its AI Overviews when an answer can be helpful; it has to be extremely careful not to deploy them when an answer may be harmful.

“If you go and say ‘How do I build a bomb?’ it’s fine that there are web results. It’s the open web. You can access anything,” Reid says. “But we do not need to have an AI Overview that tells you how to build a bomb, right? We just don’t think that’s worth it.”

But perhaps the greatest hazard—or biggest unknown—is for anyone downstream of a Google search. Take publishers, who for decades now have relied on search queries to send people their way. What reason will people have to click through to the original source, if all the information they seek is right there in the search result?

Rand Fishkin, cofounder of the market research firm SparkToro, publishes research on so-called zero-click searches. As Google has moved increasingly into the answer business, the proportion of searches that end without a click has gone up and up. His sense is that AI Overviews are going to explode this trend.

“If you are reliant on Google for traffic, and that traffic is what drove your business forward, you are in long- and short-term trouble,” he says.

Don’t panic, is Pichai’s message. He argues that even in the age of AI Overviews, people will still want to click through and go deeper for many types of searches. “The underlying principle is people are coming looking for information. They’re not looking for Google always to just answer,” he says. “Sometimes yes, but the vast majority of the times, you’re looking at it as a jumping-off point.”

Reid, meanwhile, argues that because AI Overviews allow people to ask more complicated questions and drill down further into what they want, they could even be helpful to some types of publishers and small businesses, especially those operating in the niches: “You essentially reach new audiences, because people can now express what they want more specifically, and so somebody who specializes doesn’t have to rank for the generic query.”

^“I’m going to start with something risky,” Nick Turley tells me from the confines of a Zoom window. Turley is the head of product for ChatGPT, and he’s showing off OpenAI’s new web search tool a few weeks before it launches. “I should normally try this beforehand, but I’m just gonna search for you,” he says. “This is always a high-risk demo to do, because people tend to be particular about what is said about them on the internet.”

He types my name into a search field, and the prototype search engine spits back a few sentences, almost like a speaker bio. It correctly identifies me and my current role. It even highlights a particular story I wrote years ago that was probably my best known. In short, it’s the right answer. Phew?

A few weeks after our call, OpenAI incorporated search into ChatGPT, supplementing answers from its language model with information from across the web. If the model thinks a response would benefit from up-to-date information, it will automatically run a web search (OpenAI won’t say who its search partners are) and incorporate those responses into its answer, with links out if you want to learn more. You can also opt to manually force it to search the web if it does not do so on its own. OpenAI won’t reveal how many people are using its web search, but it says some 250 million people use ChatGPT weekly, all of whom are potentially exposed to it.

“There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be a better super-assistant for you.”

Kevin Weil, chief product officer, OpenAI

According to Fishkin, these newer forms of AI-assisted search aren’t yet challenging Google’s search dominance. “It does not appear to be cannibalizing classic forms of web search,” he says.

OpenAI insists it’s not really trying to compete on search—although frankly this seems to me like a bit of expectation setting. Rather, it says, web search is mostly a means to get more current information than the data in its training models, which tend to have specific cutoff dates that are often months, or even a year or more, in the past. As a result, while ChatGPT may be great at explaining how a West Coast offense works, it has long been useless at telling you what the latest 49ers score is. No more.

“I come at it from the perspective of ‘How can we make ChatGPT able to answer every question that you have? How can we make it more useful to you on a daily basis?’ And that’s where search comes in for us,” Kevin Weil, the chief product officer with OpenAI, tells me. “There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be able to be a better super-assistant for you.”

Today ChatGPT is able to generate responses for very current news events, as well as near-real-time information on things like stock prices. And while ChatGPT’s interface has long been, well, boring, search results bring in all sorts of multimedia—images, graphs, even video. It’s a very different experience.

Weil also argues that ChatGPT has more freedom to innovate and go its own way than competitors like Google—even more than its partner Microsoft does with Bing. Both of those are ad-dependent businesses. OpenAI is not. (At least not yet.) It earns revenue from the developers, businesses, and individuals who use it directly. It’s mostly setting large amounts of money on fire right now—it’s projected to lose $14 billion in 2026, by some reports. But one thing it doesn’t have to worry about is putting ads in its search results as Google does.

Elizabeth Reid — “For a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you,” says Google head of search, Liz Reid.

Like Google, ChatGPT is pulling in information from web publishers, summarizing it, and including it in its answers. But it has also struck financial deals with publishers, a payment for providing the information that gets rolled into its results. (MIT Technology Review has been in discussions with OpenAI, Google, Perplexity, and others about publisher deals but has not entered into any agreements. Editorial was neither party to nor informed about the content of those discussions.)

But the thing is, for web search to accomplish what OpenAI wants—to be more current than the language model—it also has to bring in information from all sorts of publishers and sources that it doesn’t have deals with. OpenAI’s head of media partnerships, Varun Shetty, told MIT Technology Review that it won’t give preferential treatment to its publishing partners.

Instead, OpenAI told me, the model itself finds the most trustworthy and useful source for any given question. And that can get weird too. In that very first example it showed me—when Turley ran that name search—it described a story I wrote years ago for Wired about being hacked. That story remains one of the most widely read I’ve ever written. But ChatGPT didn’t link to it. It linked to a short rewrite from The Verge. Admittedly, this was on a prototype version of search, which was, as Turley said, “risky.”

When I asked him about it, he couldn’t really explain why the model chose the sources that it did, because the model itself makes that evaluation. The company helps steer it by identifying—sometimes with the help of users—what it considers better answers, but the model actually selects them.

“And in many cases, it gets it wrong, which is why we have work to do,” said Turley. “Having a model in the loop is a very, very different mechanism than how a search engine worked in the past.”

Indeed!

The model, whether it’s OpenAI’s GPT-4o or Google’s Gemini or Anthropic’s Claude, can be very, very good at explaining things. But the rationale behind its explanations, its reasons for selecting a particular source, and even the language it may use in an answer are all pretty mysterious. Sure, a model can explain very many things, but not when that comes to its own answers.

It was almost a decade ago, in 2016, when Pichai wrote that Google was moving from “mobile first” to “AI first”: “But in the next 10 years, we will shift to a world that is AI-first, a world where computing becomes universally available—be it at home, at work, in the car, or on the go—and interacting with all of these surfaces becomes much more natural and intuitive, and above all, more intelligent.”

We’re there now—sort of. And it’s a weird place to be. It’s going to get weirder. That’s especially true as these things we now think of as distinct—querying a search engine, prompting a model, looking for a photo we’ve taken, deciding what we want to read or watch or hear, asking for a photo we wish we’d taken, and didn’t, but would still like to see—begin to merge.

The search results we see from generative AI are best understood as a waypoint rather than a destination. What’s most important may not be search in itself; rather, it’s that search has given AI model developers a path to incorporating real-time information into their inputs and outputs. And that opens up all sorts of possibilities.

“A ChatGPT that can understand and access the web won’t just be about summarizing results. It might be about doing things for you. And I think there’s a fairly exciting future there,” says OpenAI’s Weil. “You can imagine having the model book you a flight, or order DoorDash, or just accomplish general tasks for you in the future. It’s just once the model understands how to use the internet, the sky’s the limit.”

This is the agentic future we’ve been hearing about for some time now, and the more AI models make use of real-time data from the internet, the closer it gets.

Let’s say you have a trip coming up in a few weeks. An agent that can get data from the internet in real time can book your flights and hotel rooms, make dinner reservations, and more, based on what it knows about you and your upcoming travel—all without your having to guide it. Another agent could, say, monitor the sewage output of your home for certain diseases, and order tests and treatments in response. You won’t have to search for that weird noise your car is making, because the agent in your vehicle will already have done it and made an appointment to get the issue fixed.

“It’s not always going to be just doing search and giving answers,” says Pichai. “Sometimes it’s going to be actions. Sometimes you’ll be interacting within the real world. So there is a notion of universal assistance through it all.”

And the ways these things will be able to deliver answers is evolving rapidly now too. For example, today Google can not only search text, images, and even video; it can create them. Imagine overlaying that ability with search across an array of formats and devices. “Show me what a Townsend’s warbler looks like in the tree in front of me.” Or “Use my existing family photos and videos to create a movie trailer of our upcoming vacation to Puerto Rico next year, making sure we visit all the best restaurants and top landmarks.”

“We have primarily done it on the input side,” he says, referring to the ways Google can now search for an image or within a video. “But you can imagine it on the output side too.”

This is the kind of future Pichai says he is excited to bring online. Google has already showed off a bit of what that might look like with NotebookLM, a tool that lets you upload large amounts of text and have it converted into a chatty podcast. He imagines this type of functionality—the ability to take one type of input and convert it into a variety of outputs—transforming the way we interact with information.

In a demonstration of a tool called Project Astra this summer at its developer conference, Google showed one version of this outcome, where cameras and microphones in phones and smart glasses understand the context all around you—online and off, audible and visual—and have the ability to recall and respond in a variety of ways. Astra can, for example, look at a crude drawing of a Formula One race car and not only identify it, but also explain its various parts and their uses.

But you can imagine things going a bit further (and they will). Let’s say I want to see a video of how to fix something on my bike. The video doesn’t exist, but the information does. AI-assisted generative search could theoretically find that information somewhere online—in a user manual buried in a company’s website, for example—and create a video to show me exactly how to do what I want, just as it could explain that to me with words today.

These are the kinds of things that start to happen when you put the entire compendium of human knowledge—knowledge that’s previously been captured in silos of language and format; maps and business registrations and product SKUs; audio and video and databases of numbers and old books and images and, really, anything ever published, ever tracked, ever recorded; things happening right now, everywhere—and introduce a model into all that. A model that maybe can’t understand, precisely, but has the ability to put that information together, rearrange it, and spit it back in a variety of different hopefully helpful ways. Ways that a mere index could not.

That’s what we’re on the cusp of, and what we’re starting to see. And as Google rolls this out to a billion people, many of whom will be interacting with a conversational AI for the first time, what will that mean? What will we do differently? It’s all changing so quickly. Hang on, just hang on.

Ecommerce MGMT 0 Comments

Ai App Artificial intelligence TR10 2025

Jan 4 2025

Small language models: 10 Breakthrough Technologies 2025

WHO

Allen Institute for Artificial Intelligence, Anthropic, Google, Meta, Microsoft, OpenAI

WHEN

Now

Make no mistake: Size matters in the AI world. When OpenAI launched GPT-3 back in 2020, it was the largest language model ever built. The firm showed that supersizing this type of model was enough to send performance through the roof. That kicked off a technology boom that has been sustained by bigger models ever since. As Noam Brown, a research scientist at OpenAI, told an audience at TEDAI San Francisco in October, “The incredible progress in AI over the past five years can be summarized in one word: scale.”

But as the marginal gains for new high-end models trail off, researchers are figuring out how to do more with less. For certain tasks, smaller models that are trained on more focused data sets can now perform just as well as larger ones—if not better. That’s a boon for businesses eager to deploy AI in a handful of specific ways. You don’t need the entire internet in your model if you’re making the same kind of request again and again.

Most big tech firms now boast fun-size versions of their flagship models for this purpose: OpenAI offers both GPT-4o and GPT-4o mini; Google DeepMind has Gemini Ultra and Gemini Nano; and Anthropic’s Claude 3 comes in three flavors: outsize Opus, midsize Sonnet, and tiny Haiku. Microsoft is pioneering a range of small language models called Phi.

A growing number of smaller companies offer small models as well. The AI startup Writer claims that its latest language model matches the performance of the largest top-tier models on many key metrics despite in some cases having just a 20th as many parameters (the values that get calculated during training and determine how a model behaves).

Explore the full 2025 list of 10 Breakthrough Technologies.

Smaller models are more efficient, making them quicker to train and run. That’s good news for anyone wanting a more affordable on-ramp. And it could be good for the climate, too: Because smaller models work with a fraction of the computer oomph required by their giant cousins, they burn less energy.

These small models also travel well: They can run right in our pockets, without needing to send requests to the cloud. Small is the next big thing.

Ecommerce MGMT 0 Comments