Introducing: The AI Hype Index

There’s no denying that the AI industry moves fast. Each week brings a bold new announcement, product release, or lofty claim that pushes the bounds of what we previously thought was possible. Separating AI fact from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

Our first index is a white-knuckle ride that ranges from the outright depressing—rising numbers of sexually explicit deepfakes; the complete lack of rules governing Elon Musk’s Grok AI model—to the bizarre, including AI-powered dating wingmen and startup Friend’s dorky intelligent-jewelry line. 

But it’s not all a horror show—at least not entirely. AI is being used for more wholesome endeavors, too, like simulating the classic video game Doom without a traditional gaming engine. Elsewhere, AI models have gotten so good at table tennis they can now beat beginner-level human opponents. They’re also giving us essential insight into the secret names monkeys use to communicate with one another. Because while AI may be a lot of things, it’s never boring. 

How Wayve’s driverless cars will meet one of their biggest challenges yet

The UK driverless-car startup Wayve is headed west. The firm’s cars learned to drive on the streets of London. But Wayve has announced that it will begin testing its tech in and around San Francisco as well. And that brings a new challenge: Its AI will need to switch from driving on the left to driving on the right.

As visitors to or from the UK will know, making that switch is harder than it sounds. Your view of the road, how the vehicle turns—it’s all different, says Wayve’s vice president of software, Silvius Rus. Rus himself learned to drive on the left for the first time last year after years in the US. “Even for a human who has driven a long time, it’s not trivial,” he says.

Wayve’s US fleet of Ford Mustang Mach-E’s.
WAYVE

The move to the US will be a test of Wayve’s technology, which the company claims is more general-purpose than what many of its rivals are offering. Wayve’s approach has attracted massive investment—including a $1 billion funding round that broke UK records this May—and partnerships with Uber and online grocery firms such as Asda and Ocado. But it will now go head to head with the heavyweights of the growing autonomous-car industry, including Cruise, Waymo, and Tesla.  

Back in 2022, when I first visited the company’s offices in north London, there were two or three vehicles parked in the building’s auto shop. But on a sunny day this fall, both the shop and the forecourt are full of cars. A billion dollars buys a lot of hardware.

I’ve come for a ride-along. In London, autonomous vehicles can still turn heads. But what strikes me as I sit in the passenger seat of one of Wayve’s Jaguar I-PACE cars isn’t how weird it feels to be driven around by a computer program, but how normal—how comfortable, how safe. This car drives better than I do.

Regulators have not yet cleared autonomous vehicles to drive on London’s streets without a human in the loop. A test driver sits next to me, his hands hovering a centimeter above the wheel as it turns back and forth beneath them. Rus gives a running commentary from the back.

The midday traffic is light, but that makes things harder, says Rus: “When it’s crowded, you tend to follow the car in front.” We steer around roadworks, cyclists, and other vehicles stopped in the middle of the street. It starts to rain. At one point I think we’re on the wrong side of the road. But it’s a one-way street: The car has spotted a sign that I didn’t. We approach every intersection with what feels like deliberate confidence.

At one point a blue car (with a human at the wheel) sticks its nose into the stream of traffic just ahead of us. Urban drivers know this can go two ways: Hesitate and it’s a cue for the other car to pull out; push ahead and you’re telling it to wait its turn. Wayve’s car pushes ahead.

The interaction lasts maybe a second. But it’s the most impressive moment of my ride. Wayve says its model has picked up lots of defensive driving habits like this. “It was our right of way, and the safest approach was to assert that,” says Rus. “It learned to do that; it’s not programmed.”

Learning to drive

Everything that Wayve’s cars do is learned rather than programmed. The company uses different technology from what’s in most other driverless cars. Instead of separate, specialized models trained to handle individual tasks like spotting obstacles or finding a route around them—models that must then be wired up to work together—Wayve uses an approach called end-to-end learning.

This means that Wayve’s cars are controlled by a single large model that learns all the individual tasks needed to drive at once, using camera footage, feedback from test drivers (many of whom are former driving instructors), and a lot of reruns in simulation.

Wayve has argued that this approach makes its driving models more general-purpose. The firm has shown that it can take a model trained on the streets of London and then use that same model to drive cars in multiple UK cities—something that others have struggled to do.

But a move to the US is more than a simple relocation. It rewrites one of the most basic rules of driving—which side of the road to drive on. With Wayve’s single large model, there’s no left-hand-drive module to swap out. “We did not program it to drive on the left,” says Rus. “It’s just seen it enough to think that’s how it needs to drive. Even if there’s no marking on the road, it will still keep to the left.”  

“So how will the model learn to drive on the right? This will be an interesting question for the US.”

Answering that question involves figuring out whether the side of the road it drives on is a deep feature of Wayve’s model—intrinsic to its behavior—or a more superficial one that can be overridden with a little retraining.

Given the adaptability seen in the model so far, Rus believes it will switch to US streets just fine. He cites the way the cars have shown they can adapt to new UK cities, for example. “That gives us confidence in its capability to learn and to drive in new situations,” he says.

Under the hood

But Wayve needs to be certain. As well as testing its cars in San Francisco, Rus and his colleagues are poking around inside their model to find out what makes it tick. “It’s like you’re doing a brain scan and you can see there’s some activity in a certain part of the brain,” he says.

The team presents the model with many different scenarios and watches what parts of it get activated at specific times. One example is an unprotected turn—a turn that crosses traffic going in the opposite direction, without a traffic signal. “Unprotected turns are to the right here and to the left in the US,” says Rus. “So will it see them as similar? Or will it just see right turns as right turns?”

Figuring out why the model behaves as it does tells Wayve what kinds of scenarios require extra help. Using a hyper-detailed simulation tool called PRISM-1 that can reconstruct 3D street scenes from video footage, the company can generate bespoke scenarios and run the model through them over and over until it learns how to handle them. How much retraining might the model need? “I cannot tell you the amount. This is part of our secret sauce,” says Rus. “But it’s a small amount.”

Wayve’s simulation tool, PRISM-1, can reconstruct virtual street scenes from real video footage. Wayve uses the tool to help train its driving model.
WAYVE

The autonomous-vehicle industry is known for hype and overpromising. Within the past year, Cruise laid off hundreds after its cars caused chaos and injury on the streets of San Francisco. Tesla is facing federal investigation after its driver-assistance technology was blamed for multiple crashes, including a fatal collision with a pedestrian. 

But the industry keeps forging ahead. Waymo has said it is now giving 100,000 robotaxi rides a week in San Francisco, Los Angeles, and Phoenix. In China, Baidu claims it is giving some 287,000 rides in a handful of cities, including Beijing and Wuhan. Undaunted by the allegations that Tesla’s driver-assistance technology is unsafe, Elon Musk announced his Cybercab last week with a timeline that would put these driverless concept cars on the road by 2025. 

What should we make of it all? “The competition between robotaxi operators is heating up,” says Crijn Bouman, CEO and cofounder of Rocsys, a startup that makes charging stations for autonomous electric vehicles. “I believe we are close to their ChatGPT moment.”

“The technology, the business model, and the consumer appetite are all there,” Bouman says. “The question is which operator will seize the opportunity and come out on top.”

Others are more skeptical. We need to be very clear what we’re talking about when we talk about autonomous vehicles, says Saber Fallah, director of the Connected Autonomous Vehicle Research Lab at the University of Surrey, UK. Some of Baidu’s robotaxis still require a safety driver behind the wheel, for example. Cruise and Waymo have shown that a fully autonomous service is viable in certain locations. But it took years to train their vehicles to drive specific streets, and extending routes—safely—beyond existing neighborhoods will take time. “We won’t have robotaxis that can drive anywhere anytime soon,” says Fallah.

Fallah takes the extreme view that this won’t happen until all human drivers hand in their licenses. For robotaxis to be safe, they need to be the only vehicles on the road, he says. He thinks today’s driving models are still not good enough to interact with the complex and subtle behaviors of humans. There are just too many edge cases, he says.

Wayve is betting its approach will win out. In the US, it will begin by testing what it calls an advanced driver assistance system, a technology similar to Tesla’s. But unlike Tesla, Wayve plans to sell that technology to a wide range of existing car manufacturers. The idea is to build on this foundation to achieve full autonomy in the next few years. “We’ll get access to scenarios that are encountered by many cars,” says Rus. “The path to full self-driving is easier if you go level by level.”

But cars are just the start, says Rus. What Wayve is in fact building, he says, is an embodied model that could one day control many different types of machines, whether they have wheels, wings, or legs. 

“We’re an AI shop,” he says. “Driving is a milestone, but it’s a stepping stone as well.”

Reckoning with generative AI’s uncanny valley

Generative AI has the power to surprise in a way that few other technologies can. Sometimes that’s a very good thing; other times, not so good. In theory, as generative AI improves, this issue should become less important. However, in reality, as generative AI becomes more “human” it can begin to turn sinister and unsettling, plunging us into what robotics has long described as the “uncanny valley.”

It might be tempting to overlook this experience as something that can be corrected by bigger data sets or better training. However, insofar as it speaks to a disturbance in our mental model of the technology (e.g., I don’t like what it did there) it’s something that needs to be acknowledged and addressed.

Mental models and antipatterns

Mental models are an important concept in UX and product design, but they need to be more readily embraced by the AI community. At one level, mental models often don’t appear because they are routine patterns of our assumptions about an AI system. This is something we discussed at length in the process of putting together the latest volume of the Thoughtworks Technology Radar, a biannual report based on our experiences working with clients all over the world.

For instance, we called out complacency with AI generated code and replacing pair programming with generative AI as two practices we believe practitioners must avoid as the popularity of AI coding assistants continues to grow. Both emerge from poor mental models that fail to acknowledge how this technology actually works and its limitations. The consequences are that the more convincing and “human” these tools become, the harder it is for us to acknowledge how the technology actually works and the limitations of the “solutions” it provides us.

Of course, for those deploying generative AI into the world, the risks are similar, perhaps even more pronounced. While the intent behind such tools is usually to create something convincing and usable, if such tools mislead, trick, or even merely unsettle users, their value and worth evaporates. It’s no surprise that legislation, such as the EU AI Act, which requires of deep fake creators to label content as “AI generated,” is being passed to address these problems.

It’s worth pointing out that this isn’t just an issue for AI and robotics. Back in 2011, our colleague Martin Fowler wrote about how certain approaches to building cross platform mobile applications can create an uncanny valley, “where things work mostly like… native controls but there are just enough tiny differences to throw users off.”

Specifically, Fowler wrote something we think is instructive: “different platforms have different ways they expect you to use them that alter the entire experience design.” The point here, applied to generative AI, is that different contexts and different use cases all come with different sets of assumptions and mental models that change at what point users might drop into the uncanny valley. These subtle differences change one’s experience or perception of a large language model’s (LLM) output.

For example, for the drug researcher that wants vast amounts of synthetic data, accuracy at a micro level may be unimportant; for the lawyer trying to grasp legal documentation, accuracy matters a lot. In fact, dropping into the uncanny valley might just be the signal to step back and reassess your expectations.

Shifting our perspective

The uncanny valley of generative AI might be troubling, even something we want to minimize, but it should also remind us of generative AI’s limitations—it should encourage us to rethink our perspective.

There have been some interesting attempts to do that across the industry. One that stands out is Ethan Mollick, a professor at the University of Pennsylvania, who argues that AI shouldn’t be understood as good software but instead as “pretty good people.”

Therefore, our expectations about what generative AI can do and where it’s effective must remain provisional and should be flexible. To a certain extent, this might be one way of overcoming the uncanny valley—by reflecting on our assumptions and expectations, we remove the technology’s power to disturb or confound them.

However, simply calling for a mindset shift isn’t enough. There are various practices and tools that can help. One example is the technique, which we identified in the latest Technology Radar, of getting structured outputs from LLMs. This can be done by either instructing a model to respond in a particular format when prompting or through fine-tuning. Thanks to tools like Instructor, it is getting easier to do that and creates greater alignment between expectations and what the LLM will output. While there’s a chance something unexpected or not quite right might happen, this technique goes some way to addressing that.

There are other techniques too, including retrieval augmented generation as a way of better controlling the “context window.” There are frameworks and tools that can help evaluate and measure the success of such techniques, including Ragas and DeepEval, which are libraries that provide AI developers with metrics for faithfulness and relevance.

Measurement is important, as are relevant guidelines and policies for LLMs, such as LLM guardrails. It’s important to take steps to better understand what’s actually happening inside these models. Completely unpacking these black boxes might be impossible, but tools like Langfuse can help. Doing so may go a long way in reorienting the relationship with this technology, shifting mental models, and removing the possibility of falling into the uncanny valley.

An opportunity, not a flaw

These tools—part of a Cambrian explosion of generative AI tools—can help practitioners rethink generative AI and, hopefully, build better and more responsible products. However, for the wider world, this work will remain invisible. What’s important is exploring how we can evolve toolchains to better control and understand generative AI, even though existing mental models and conceptions of generative AI are a fundamental design problem, not a marginal issue we can choose to ignore.

Ken Mugrage is the principal technologist in the office of the CTO at Thoughtworks. Srinivasan Raguraman is a technical principal at Thoughtworks based in Singapore.

This content was produced by Thoughtworks. It was not written by MIT Technology Review’s editorial staff.

Google DeepMind is making its AI text watermark open source

Google DeepMind has developed a tool for identifying AI-generated text and is making it available open source. 

The tool, called SynthID, is part of a larger family of watermarking tools for generative AI outputs. The company unveiled a watermark for images last year, and it has since rolled out one for AI-generated video. In May, Google announced it was applying SynthID in its Gemini app and online chatbots and made it freely available on Hugging Face, an open repository of AI data sets and models. Watermarks have emerged as an important tool to help people determine when something is AI generated, which could help counter harms such as misinformation. 

“Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly,” says Pushmeet Kohli, the vice president of research at Google DeepMind. 

SynthID works by adding an invisible watermark directly into the text when it is generated by an AI model. 

Large language models work by breaking down language into “tokens” and then predicting which token is most likely to follow the other. Tokens can be a single character, word, or part of a phrase, and each one gets a percentage score for how likely it is to be the appropriate next word in a sentence. The higher the percentage, the more likely the model is going to use it. 

SynthID introduces additional information at the point of generation by changing the probability that tokens will be generated, explains Kohli. 

To detect the watermark and determine whether text has been generated by an AI tool, SynthID compares the expected probability scores for words in watermarked and unwatermarked text. 

Google DeepMind found that using the SynthID watermark did not compromise the quality, accuracy, creativity, or speed of generated text. That conclusion was drawn from a massive live experiment of SynthID’s performance after the watermark was deployed in its Gemini products and used by millions of people. Gemini allows users to rank the quality of the AI model’s responses with a thumbs-up or a thumbs-down. 

Kohli and his team analyzed the scores for around 20 million watermarked and unwatermarked chatbot responses. They found that users did not notice a difference in quality and usefulness between the two. The results of this experiment are detailed in a paper published in Nature today. Currently SynthID for text only works on content generated by Google’s models, but the hope is that open-sourcing it will expand the range of tools it’s compatible with. 

SynthID does have other limitations. The watermark was resistant to some tampering, such as cropping text and light editing or rewriting, but it was less reliable when AI-generated text had been rewritten or translated from one language into another. It is also less reliable in responses to prompts asking for factual information, such as the capital city of France. This is because there are fewer opportunities to adjust the likelihood of the next possible word in a sentence without changing facts. 

“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where LLM outputs are near deterministic, such as factual questions or code generation tasks,” says Soheil Feizi, an associate professor at the University of Maryland, who has studied the vulnerabilities of AI watermarking.  

Feizi says Google DeepMind’s decision to open-source its watermarking method is a positive step for the AI community. “It allows the community to test these detectors and evaluate their robustness in different settings, helping to better understand the limitations of these techniques,” he adds. 

There is another benefit too, says João Gante, a machine-learning engineer at Hugging Face. Open-sourcing the tool means anyone can grab the code and incorporate watermarking into their model with no strings attached, Gante says. This will improve the watermark’s privacy, as only the owner will know its cryptographic secrets. 

“With better accessibility and the ability to confirm its capabilities, I want to believe that watermarking will become the standard, which should help us detect malicious use of language models,” Gante says. 

But watermarks are not an all-purpose solution, says Irene Solaiman, Hugging Face’s head of global policy. 

“Watermarking is one aspect of safer models in an ecosystem that needs many complementing safeguards. As a parallel, even for human-generated content, fact-checking has varying effectiveness,” she says. 

Investing in AI to build next-generation infrastructure

The demand for new and improved infrastructure across the world is not being met. The Asian Development Bank has estimated that in Asia alone, roughly $1.7 trillion needs to be invested annually through to 2030 just to sustain economic growth and offset the effects of climate change. Globally, that figure has been put at $15 trillion.

In the US, for example, it is no secret that the country’s highways, railways and bridges are in need of updating. But similar to many other sectors, there are significant shortages in skilled workers and resources, which delays all-important repairs and maintenance and harms efficiency.

This infrastructure gap – the difference between funding and construction – is vast. And while governments and companies everywhere are feeling the strain of constructing an energy efficient and sustainable built environment, it’s proving more than humans can do alone. To redress this imbalance, many organizations are turning to various forms of AI, including large language models (LLMs) and machine learning (ML). Collectively, they are not yet able to fix all current infrastructure problems but they are already helping to reduce costs, risks, and increase efficiency.

Overcoming resource constraints

A shortage of skilled engineering and construction labor is a major problem. In the US, it is estimated that there will be a 33% shortfall in the supply of new talent by 2031, with unfilled positions in software, industrial, civil and electrical engineering. Germany reported a shortage of 320,000 science, technology, engineering, and mathematics (STEM) specialists in 2022 and another engineering powerhouse, Japan, has forecast a deficit of more than 700,000 engineers by 2030. Considering the duration of most engineering projects (repairing a broken gas pipeline for example, can take decades), the demand for qualified engineers will only continue to outstrip supply unless something is done.

Immigration and visa restrictions for international engineering students, and a lack of retention in formative STEM jobs, exert additional constraints. Plus, there is the issue of task duplication which is something AI can do with ease.

Julien Moutte, CTO of Bentley Systems explains, “There’s a massive amount of work that engineers have to do that is tedious and repetitive. Between 30% to 50% of their time is spent just compressing 3D models into 2D PDF formats. If that work can be done by AI-powered tools, they can recover half their working time which could then be invested in performing higher value tasks.”

With guidance, AI can automate the same drawings hundreds of times. Training engineers to ask the right questions and use AI optimally will ease the burden and stress of repetition.

However, this is not without challenges. Users of ChatGPT, or other LLMs, know the pitfalls of AI hallucinations, where the model can logically predict a sequence of words but without contextual understanding of what the words mean. This can lead to nonsensical outputs, but in engineering, hallucinations can sometimes be altogether more risky. “If a recommendation was made by AI, it needs to be validated,” says Moutte. “Is that recommendation safe? Does it respect the laws of physics? And it’s a waste of time for the engineers to have to review all these things.”

But this can be offset by having existing company tools and products running simulations and validating the designs using established engineering rules and design codes which again relieves the burden of having the engineers having to do the validating themselves.

Improving resource efficiency

An estimated 30% of building materials, such as steel and concrete, are wasted on a typical construction site in the United States and United Kingdom, with the majority ending up in landfills, although countries such as Germany and The Netherlands have recently implemented recycling measures. This, and the rising cost of raw materials, is putting pressure on companies to think of solutions to improve construction efficiency and sustainability.

AI can provide solutions to both of these issues during the design and construction phases. Digital twins can help workers spot deviations in product quality even and provide the insights needed to minimize waste and energy output and crucially, save money.

Machine learning models use real-time data from field statistics and process variables to flag off-spec materials, product deviations and excess energy usage, such as machinery and transportation for construction site workers. Engineers can then anticipate the gaps and streamline the processes, making large-scale overall improvements for each project which can be replicated in the future.

“Being able to anticipate and reduce that waste with that visual awareness, with the application of AI to make sure that you are optimizing those processes and those designs and the resources that you need to construct that infrastructure is massive,” says Moutte.

He continues, “The big game changer is going to be around sustainability because we need to create infrastructure with more sustainable and efficient designs, and there’s a lot of room for improvement.” And an important part of this will be how AI can help create new materials and models to reduce waste.

Human and AI partnership

AI might never be entirely error-free, but for the time being, human intervention can catch mistakes. Although there may be some concern in the construction sector that AI will replace humans, there are elements to any construction project that only people can do.

AI lacks the critical thinking and problem-solving that humans excel at, so additional training for engineers to supervise and maintain the automated systems is key so that each side can work together optimally. Skilled workers have creativity and intuition, as well as customer service expertise, while AI is not yet capable of such novel solutions.

With the engineers implementing appropriate guardrails and frameworks, AI can contribute the bulk of automation and repetition to projects, thereby creating a symbiotic and optimal relationship between humans and machines.

“Engineers have been designing impressive buildings for decades already, where they are not doing all the design manually. You need to make sure that those structures are validated first by engineering principles, physical rules, local codes, and the rest. So we have all the tools to be able to validate those designs,” explains Moutte.

As AI advances alongside human care and control, it can help futureproof the construction process where every step is bolstered by the strengths of both sides. By addressing the concerns of the construction industry – costs, sustainability, waste and task repetition – and upskilling engineers to manage AI to address these at the design and implementation stage, the construction sector looks set to be less riddled with potholes.

“We’ve already seen how AI can be used to create new materials and reduce waste,” explains Moutte. “As we move to 2050, I believe engineers will need those AI capabilities to create the best possible designs and I’m looking forward to releasing some of those AI-enabled features in our products.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

The race to find new materials with AI needs more data. Meta is giving massive amounts away for free.

Meta is releasing a massive data set and models, called Open Materials 2024, that could help scientists use AI to discover new materials much faster. OMat24 tackles one of the biggest bottlenecks in the discovery process: data.

To find new materials, scientists calculate the properties of elements across the periodic table and simulate different combinations on computers. This work could help us discover new materials with properties that can help mitigate climate change, for example, by making better batteries or helping create new sustainable fuels. But it requires massive data sets that are hard to come by. Creating them requires a lot of computing power and is very expensive. Many of the top data sets and models available now are also proprietary, and researchers don’t have access to them. That’s where Meta is hoping to help: The company is releasing its new data set and models today for free and is making them open source. The data set and models are available on Hugging Face for anyone to download, tinker with, and use.

 “We’re really firm believers that by contributing to the community and building upon open-source data models, the whole community moves further, faster,” says Larry Zitnick, the lead researcher for the OMat project.

Zitnick says the newOMat24 model will top the Matbench Discovery leaderboard, which ranks the best machine-learning models for materials science. Its data set will also be one of the biggest available. 

“Materials science is having a machine-learning revolution,” says Shyue Ping Ong, a professor of nanoengineering at the University of California, San Diego, who was not involved in the project.

Previously, scientists were limited to doing very accurate calculations of material properties on very small systems or doing less accurate calculations on very big systems, says Ong. The processes were laborious and expensive. Machine learning has bridged that gap, and AI models allow scientists to perform simulations on combinations of any elements in the periodic table much more quickly and cheaply, he says. 

Meta’s decision to make its data set openly available is more significant than the AI model itself, says Gábor Csányi, a professor of molecular modeling at the University of Cambridge, who was not involved in the work. 

“This is in stark contrast to other large industry players such as Google and Microsoft, which also recently published competitive-looking models which were trained on equally large but secret data sets,” Csányi says. 

To create the OMat24 data set, Meta took an existing one called Alexandria and sampled materials from it. Then they ran various simulations and calculations of different atoms to scale it.

Meta’s data set has around 110 million data points, which is many times larger than earlier ones. Others also don’t necessarily have high-quality data, says Ong. 

Meta has significantly expanded the data set beyond what the current materials science community has done, and with high accuracy, says Ong. 

Creating the data sets requires vast computational capacity, and Meta is one of the few companies in the world that can afford that. Zitnick says the company has another motive for this work: It’s hoping to find new materials to make its smart augmented-reality glasses more affordable. 

Previous work on open databases, such as one created by the Materials Project, has transformed computational materials science over the last decade, says Chris Bartel, an assistant professor of chemical engineering and materials science at the University of Minnesota, who was also not involved in Meta’s work. 

Tools such as Google’s GNoME (graphical networks for material exploration) have shown that the potential to find new materials increases with the size of the training set, he adds.  

“The public release of the [OMat24] data set is truly a gift for the community and is certain to immediately accelerate research in this space,” Bartel says. 

Transforming software with generative AI

Generative AI’s promises for the software development lifecycle (SDLC)—code that writes itself, fully automated test generation, and developers who spend more time innovating than debugging—are as alluring as they are ambitious. Some bullish industry forecasts project a 30% productivity boost from AI developer tools, which, if realized, could inject more than $1.5 trillion into the global GDP.

But while there’s little doubt that software development is undergoing a profound transformation, separating the hype and speculation from the realities of implementation and ROI is no simple task. As with previous technological revolutions, the dividends won’t be instant. “There’s an equivalency between what’s going on with AI and when digital transformation first happened,” observes Carolina Dolan Chandler, chief digital officer at Globant. “AI is an integral shift. It’s going to affect every single job role in every single way. But it’s going to be a long-term process.”

Where exactly are we on this transformative journey? How are enterprises navigating this new terrain—and what’s still ahead? To investigate how generative AI is impacting the SDLC, MIT Technology Review Insights surveyed more than 300 business leaders about how they’re using the technology in their software and product lifecycles.

The findings reveal that generative AI has rich potential to revolutionize software development, but that many enterprises are still in the early stages of realizing its full impact. While adoption is widespread and accelerating, there are significant untapped opportunities. This report explores the projected course of these advancements, as well as how emerging innovations, including agentic AI, might bring about some of the technology’s loftier promises.

Key findings include the following:

Substantial gains from generative AI in the SDLC still lie ahead. Only 12% of surveyed business leaders say that the technology has “fundamentally” changed how they develop software today. Future gains, however, are widely anticipated: Thirty-eight percent of respondents believe generative AI will “substantially” change the SDLC across most organizations in one to three years, and another 31% say this will happen in four to 10 years.

Use of generative AI in the SDLC is nearly universal, but adoption is not comprehensive. A full 94% of respondents say they’re using generative AI for software development in some capacity. One-fifth (20%) describe generative AI as an “established, well-integrated part” of their SDLC, and one-third (33%) report it’s “widely used” in at least part of their SDLC. Nearly one-third (29%), however, are still “conducting small pilots” or adopting the technology on an individual-employee basis (rather than via a team-wide integration).

Generative AI is not just for code generation. Writing software may be the most obvious use case, but most respondents (82%) report using generative AI in at least two phases of the SDLC, and one-quarter (26%) say they are using it across four or more. The most common additional use cases include designing and prototyping new features, streamlining requirement development, fast-tracking testing, improving bug detection, and
boosting overall code quality.

Generative AI is already meeting or exceeding expectations in the SDLC. Even with this room to grow in how fully they integrate generative AI into their software development workflows, 46% of survey respondents say generative AI is already meeting expectations, and 33% say it “exceeds” or “greatly exceeds” expectations.

AI agents represent the next frontier. Looking to the future, almost half (49%) of leaders believe advanced AI tools, such as assistants and agents, will lead to efficiency gains or cost savings. Another 20% believe such tools will lead to improved throughput or faster time to market.

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

AI could help people find common ground during deliberations

Reaching a consensus in a democracy is difficult because people hold such different ideological, political, and social views. 

Perhaps an AI tool could help. Researchers from Google DeepMind trained a system of large language models (LLMs) to operate as a “caucus mediator,” generating summaries that outline a group’s areas of agreement on complex but important social or political issues.

The researchers say the tool—named the Habermas machine (HM), after the German philosopher Jürgen Habermas—highlights the potential of AI to help groups of people find common ground when discussing such subjects.

“The large language model was trained to identify and present areas of overlap between the ideas held among group members,” says Michael Henry Tessler, a research scientist at Google DeepMind. “It was not trained to be persuasive but to act as a mediator.” The study is being published today in the journal Science.

Google DeepMind recruited 5,734 participants, some through a crowdsourcing research platform and others through the Sortition Foundation, a nonprofit that organizes citizens’ assemblies. The Sortition groups formed a demographically representative sample of the UK population.

The HM consists of two different LLMs fine-tuned for this task. The first is a generative model, and it suggests statements that reflect the varied views of the group. The second is a personalized reward model, which scores the proposed statements by how much it thinks each participant will agree with them.

The researchers split the participants into groups and tested the HM in two steps: first by seeing if it could accurately summarize collective opinions and then by checking if it could also mediate between different groups and help them find common ground. 

To start, they posed questions such as “Should we lower the voting age to 16?” or “Should the National Health Service be privatized?” The participants submitted responses to the HM before discussing their views within groups of around five people. 

The HM summarized the group’s opinions; then these summaries were sent to individuals to critique. At the end the HM produced a final set of statements, and participants ranked them. 

The researchers then set out to test whether the HM could act as a useful AI mediation tool. 

Participants were divided up into six-person groups, with one participant in each randomly assigned to write statements on behalf of the group. This person was designated the “mediator.” In each round of deliberation, participants were presented with one statement from the human mediator and one AI-generated statement from the HM and asked which they preferred. 

More than half (56%) of the time, the participants chose the AI statement. They found these statements to be of higher quality than those produced by the human mediator and tended to endorse them more strongly. After deliberating with the help of the AI mediator, the small groups of participants were less divided in their positions on the issues. 

Although the research demonstrates that AI systems are good at generating summaries reflecting group opinions, it’s important to be aware that their usefulness has limits, says Joongi Shin, a researcher at Aalto University who studies generative AI. 

“Unless the situation or the context is very clearly open, so they can see the information that was inputted into the system and not just the summaries it produces, I think these kinds of systems could cause ethical issues,” he says. 

Google DeepMind did not explicitly tell participants in the human mediator experiment that an AI system would be generating group opinion statements, although it indicated on the consent form that algorithms would be involved. 

 “It’s also important to acknowledge that the model, in its current form, is limited in its capacity to handle certain aspects of real-world deliberation,” Tessler says. “For example, it doesn’t have the mediation-relevant capacities of fact-checking, staying on topic, or moderating the discourse.” 

Figuring out where and how this kind of technology could be used in the future would require further research to ensure responsible and safe deployment. The company says it has no plans to launch the model publicly.

A data bottleneck is holding AI science back, says new Nobel winner

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

David Baker is sleep-deprived but happy. He’s just won the Nobel prize, after all. 

The call from the Royal Swedish Academy of Sciences woke him in the middle of the night. Or rather, his wife did. She answered the phone at their home in Washington, D.C. and screamed that he’d won the Nobel Prize for Chemistry. The prize is the ultimate recognition of his work as a biochemist at the University of Washington.

“I woke up at two [a.m.] and basically didn’t sleep through the whole day, which was all parties and stuff,” he told me the day after the announcement. “I’m looking forward to getting back to normal a little bit today.”

Last week was a major milestone for AI, with two Nobel prizes awarded for AI-related discoveries. 

Baker wasn’t alone in winning the Nobel Prize for Chemistry. The Royal Swedish Academy of Sciences awarded it to Demis Hassabis, the cofounder and CEO of Google DeepMind, and John M. Jumper, a director at the same company, too. Google DeepMind was awarded for its research on AlphaFold, a tool which can predict how proteins are structured, while Baker was recognized for his work using AI to design new proteinsRead more about it here

Meanwhile, the physics prize went to Geoffrey Hinton, a computer scientist whose pioneering work on deep learning in the 1980s and ’90s underpins all of the most powerful AI models in the world today, and fellow computer scientist John Hopfield, who invented a type of pattern-matching neural network that can store and reconstruct data. Read more about it here.

Speaking to reporters after the prize was announced, Hassabis said he believes that it will herald more AI tools being used for significant scientific discoveries. 

But there is one problem. AI needs masses of high-quality data to be useful for science, and databases containing that sort of data are rare, says Baker. 

The prize is a recognition for the whole community of people working as protein designers. It will help move protein design from the “lunatic fringe of stuff that no one ever thought would be useful for anything to being at the center stage,” he says.  

AI has been a gamechanger for biochemists like Baker. Seeing what DeepMind was able to do with AlphaFold made it clear that deep learning was going to be a powerful tool for their work. 

“There’s just all these problems that were really hard before that we are now having much more success with thanks to generative AI methods. We can do much more complicated things,” Baker says. 

Baker is already busy at work. He says his team is focusing on designing enzymes, which carry out all the chemical reactions that living things rely upon to exist. His team is also working on medicines that only act at the right time and place in the body. 

But Baker is hesitant in calling this a watershed moment for AI in science. 

In AI there’s a saying: Garbage in, garbage out. If the data that is fed into AI models is not good, the outcomes won’t be dazzling either. 

The power of the Chemistry Nobel Prize-winning AI tools lies in the Protein Data Bank (PDB), a rare treasure trove of high-quality, curated and standardized data. This is exactly the kind of data that AI needs to do anything useful. But the current trend in AI development is training ever-larger models on the entire content of the internet, which is increasingly full of AI-generated slop. This slop in turn gets sucked into datasets and pollutes the outcomes, leading to bias and errors. That’s just not good enough for rigorous scientific discovery.

“If there were many databases as good as the PDB, I would say, yes, this [prize] probably is just the first of many, but it is kind of a unique database in biology,” Baker says. “It’s not just the methods, it’s the data. And there aren’t so many places where we have that kind of data.”


Now read the rest of The Algorithm

Deeper Learning

Adobe wants to make it easier for artists to blacklist their work from AI scraping

Adobe has announced a new tool to help creators watermark their work and opt out of having it used to train generative AI models. The web app, called Adobe Content Authenticity, also gives artists the opportunity to add “content credentials,” including their verified identity, social media handles, or other online domains, to their work.

A digital signature: Content credentials are based on C2PA, an internet protocol that uses cryptography to securely label images, video, and audio with information clarifying where they came from—the 21st-century equivalent of an artist’s signature. Creators can apply them to their content regardless of whether it was created using Adobe tools. The company is launching a public beta in early 2025. Read more from Rhiannon Williams here.

Bits and Bytes

Why artificial intelligence and clean energy need each other
A geopolitical battle is raging over the future of AI. The key to winning it is a clean-energy revolution, argue Michael Kearney and Lisa Hansmann, from Engine Ventures, a firm that invests in startups commercializing breakthrough science and engineering. They believe that AI’s huge power demands represent a chance to scale the next generation of clean energy technologies. (MIT Technology Review)

The state of AI in 2025
AI investor Nathan Benaich and Air Street Capital have released their annual analysis of the state of AI. Their predictions for the next year? Big, proprietary models will start to lose their edge, and labs will focus more on planning and reasoning. Perhaps unsurprisingly, the investor also bets that a handful of AI companies will begin to generate serious revenue. 

Silicon Valley, the new lobbying monster
Big Tech’s tentacles reach everywhere in Washington DC. This is a fascinating look at how tech companies lobby politicians to influence how AI is regulated in the United States.  (The New Yorker

Intro to AI: a beginner’s guide to artificial intelligence from MIT Technology Review

It feels as though AI is moving a million miles a minute. Every week, it seems, there are product launches, fresh features and other innovations, and new concerns over ethics and privacy. It’s a lot to keep up with. Maybe you wish someone would just take a step back and explain some of the basics. 

Look no further. Intro to AI is MIT Technology Review’s first newsletter that also serves as a mini-course. You’ll get one email a week for six weeks, and each edition will walk you through a different topic in AI. 

Sign up here to receive it for free. Or if you’re already an AI aficionado, send it on to someone in your life who’s curious about the technology but is just starting to explore what it all means. 

Here’s what we’ll cover:

  • Week 1: What is AI? 

We’ll review a (very brief) history of AI and learn common terms like large language models, machine learning, and generative AI. 

  • Week 2: What you can do with AI 

Explore ways you can use AI in your life. We’ve got recommendations and exercises to help you get acquainted with specific AI tools. Plus, you’ll learn about a few things AI can’t do (yet). 

  • Week 3: How to talk about AI 

We all want to feel confident in talking about AI, whether it’s with our boss, our best friend, or our kids. We’ll help you find ways to frame these chats and keep AI’s pros and cons in mind. 

  • Week 4: AI traps to watch out for 

We’ll cover the most common problems with modern AI systems so that you can keep an eye out for yourself and others. 

  • Week 5: Working with AI 

How will AI change our jobs? How will companies handle any efficiencies created by AI? Our reporters and editors help cut through the noise and even give a little advice on how to think about your own career in the context of AI. 

  • Week 6: Does AI need tougher rules? 

AI tools can cause very real harm if not properly used, and regulation is one way to address this danger. The last edition of the newsletter breaks down the status of AI regulation across the globe, including a close look at the EU’s AI Act and a primer on what the US has done so far. 

There’s so much to learn and say about this powerful new technology. Sign up for Intro to AI and let’s leap into the big, weird world of AI together.