Cultivating the next generation of AI innovators in a global tech hub

A few years ago, I had to make one of the biggest decisions of my life: continue as a professor at the University of Melbourne or move to another part of the world to help build a brand new university focused entirely on artificial intelligence.

With the rapid development we have seen in AI over the past few years, I came to the realization that educating the next generation of AI innovators in an inclusive way and sharing the benefits of technology across the globe is more important than maintaining the status quo. I therefore packed my bags for the Mohammed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi.

The world in all its complexity

Today, the rewards of AI are mostly enjoyed by a few countries in what the Oxford Internet Institute dubs the “Compute North.” These countries, such as the US, the U.K., France, Canada, and China, have dominated research and development, and built state of the art AI infrastructure capable of training foundational models. This should come as no surprise, as these countries are home to many of the world’s top universities and large tech corporations.

But this concentration of innovation comes at a cost for the billions of people who live outside these dominant countries and have different cultural backgrounds.

Large language models (LLMs) are illustrative of this disparity. Researchers have shown that many of the most popular multilingual LLMs perform poorly with languages other than English, Chinese, and a handful of other (mostly) European languages. Yet, there are approximately 6,000 languages spoken today, many of them in communities in Africa, Asia, and South America. Arabic alone is spoken by almost 400 million people and Hindi has 575 million speakers around the world.

For example, LLaMA 2 performs up to 50% better in English compared to Arabic, when measured using the LM-Evaluation-Harness framework. Meanwhile, Jais, an LLM co-developed by MBZUAI, exceeds LLaMA 2 in Arabic and is comparable to Meta’s model in English (see table below).

The chart shows that the only way to develop AI applications that work for everyone is by creating new institutions outside the Compute North that consistently and conscientiously invest in building tools designed for the thousands of language communities across the world.

Environments of innovation

One way to design new institutions is to study history and understand how today’s centers of gravity in AI research emerged decades ago. Before Silicon Valley earned its reputation as the center of global technological innovation, it was called Santa Clara Valley and was known for its prune farms. However, the main catalyst was Stanford University, which had built a reputation as one of the best places in the world to study electrical engineering. Over the years, through a combination of government-led investment through grants and focused research, the university birthed countless inventions that advanced computing and created a culture of entrepreneurship. The results speak for themselves: Stanford alumni have founded companies such as Alphabet, NVIDIA, Netflix, and PayPal, to name a few.

Today, like MBZUAI’s predecessor in Santa Clara Valley, we have an opportunity to build a new technology hub centered around a university.

And that’s why I chose to join MBZUAI, the world’s first research university focused entirely on AI. From MBZUAI’s position at the geographical crossroads of East and West, our goal is to attract the brightest minds from around the world and equip them with the tools they need to push the boundaries of AI research and development.

A community for inclusive AI

MBZUAI’s student body comes from more than 50 different countries around the globe. It has attracted top researchers such as Monojit Choudhury from Microsoft, Elizabeth Churchill from Google, Ted Briscoe from the University of Cambridge, Sami Haddin from the Technical University of Munich, and Yoshihiko Nakamura from the University of Tokyo, just to name a few.

These scientists may be from different places but they’ve found a common purpose at MBZUAI with our interdisciplinary nature, relentless focus on making AI a force for global progress, and emphasis on collaboration across disciplines such as robotics, NLP, machine learning, and computer vision.

In addition to traditional AI disciplines, MBZUAI has built departments in sibling areas that can both contribute to and benefit from AI, including human computer interaction, statistics and data science, and computational biology.

Abu Dhabi’s commitment to MBZUAI is part of a broader vision for AI that extends beyond academia. MBZUAI’s scientists have collaborated with G42, an Abu Dhabi-based tech company, on Jais, an Arabic-centric LLM that is the highest-performing open-weight Arabic LLM; and also NANDA, an advanced Hindi LLM. MBZUAI’s Institute of Foundational Models has created LLM360, an initiative designed to level the playing field of large model research and development by publishing fully open source models and datasets that are competitive with closed source or open weights models available from tech companies in North America or China.

MBZUAI is also developing language models that specialize in Turkic languages, which have traditionally been underrepresented in NLP, yet are spoken by millions of people.

Another recent project has brought together native speakers of 26 languages from 28 different countries to compile a benchmark dataset that evaluates the performance of vision language models and their ability to understand cultural nuances in images.

These kinds of efforts to expand the capabilities of AI to broader communities are necessary if we want to maintain the world’s cultural diversity and provide everyone with AI tools that are useful to them. At MBZUAI, we have created a unique mix of students and faculty to drive globally-inclusive AI innovation for the future. By building a broad community of scientists, entrepreneurs, and thinkers, the university is increasingly establishing itself as a driving force in AI innovation that extends far beyond Abu Dhabi, with the goal of developing technologies that are inclusive for the world’s diverse languages and culture.

This content was produced by the Mohamed bin Zayed University of Artificial Intelligence. It was not written by MIT Technology Review’s editorial staff.

This AI system makes human tutors better at teaching children math

The US has a major problem with education inequality. Children from low-income families are less likely to receive high-quality education, partly because poorer districts struggle to retain experienced teachers. 

Artificial intelligence could help, by improving the one-on-one tutoring sometimes used to supplement class instruction in these schools. With help from an AI tool, tutors could tap into more experienced teachers’ expertise during virtual tutoring sessions. 

Researchers from Stanford University developed an AI system calledTutor CoPilot on top of OpenAI’s GPT-4 and integrated it into a platform called FEV Tutor, which connects students with tutors virtually. Tutors and students type messages to one another through a chat interface, and a tutor who needs help explaining how and why a student went wrong can press a button to generate suggestions from Tutor CoPilot. 

The researchers created the model by training GPT-4 on a database of 700 real tutoring sessions in which experienced teachers worked on on one with first- to fifth-grade students on math lessons, identifying the students’ errors and then working with them to correct the errors in such a way that they learned to understand the broader concepts being taught. From this, the model generates responses that tutors can customize to help their online students.

“I’m really excited about the future of human-AI collaboration systems,” says Rose Wang, a PhD student at Stanford University who worked on the project, which was published on arXiv and has not yet been peer-reviewed “I think this technology is a huge enabler, but only if it’s designed well.”

The tool isn’t designed to actually teach the students math—instead, it offers tutors helpful advice on how to nudge students toward correct answers while encouraging deeper learning. 

For example, it can suggest that the tutor ask how the student came up with an answer, or propose questions that could point to a different way to solve a problem. 

To test its efficacy, the team examined the interactions of 900 tutors virtually teaching math to 1,787 students between five and 13 years old from historically underserved communities in the US South. Half the tutors had the option to activate Tutor CoPilot, while the other half did not. 

The students whose tutors had access to Tutor CoPilot were 4 percentage points more likely to pass their exit ticket—an assessment of whether a student has mastered a subject—than those whose tutors did not have access to it. (Pass rates were 66% and 62%, respectively.)

The tool works as well as it does because it’s being used to teach relatively basic mathematics, says Simon Frieder, a machine-learning researcher at the University of Oxford, who did not work on the project. “You couldn’t really do a study with much more advanced mathematics at this current point in time,” he says.

The team estimates that the tool could improve student learning at a cost of around $20 per tutor annually to the tutoring provider, which is significantly cheaper than the thousands of dollars it usually takes to train educators in person. 

It has the potential to improve the relationship between novice tutors and their students by training them to approach problems the way experienced teachers do, says Mina Lee, an assistant professor of computer science at the University of Chicago, who was not involved in the project.

“This work demonstrates that the tool actually does work in real settings,” she says. “We want to facilitate human connection, and this really highlights how AI can augment human-to-human interaction.”

As a next step, Wang and her colleagues are interested in exploring how well novice tutors remember the teaching methods imparted by Tutor CoPilot. This could help them gain a sense of how long the effects of these kinds of AI interventions might last. They also plan to try to work out which other school subjects or age groups could benefit from such an approach.

“There’s a lot of substantial ways in which the underlying technology can get better,” Wang says. “But we’re not deploying an AI technology willy-nilly without pre-validating it—we want to be sure we’re able to rigorously evaluate it before we actually send it out into the wild. For me, the worst fear is that we’re wasting the students’ time.”

Palmer Luckey on the Pentagon’s future of mixed reality

Palmer Luckey has, in some ways, come full circle. 

His first experience with virtual-reality headsets was as a teenage lab technician at a defense research center in Southern California, studying their potential to curb PTSD symptoms in veterans. He then built Oculus, sold it to Facebook for $2 billion, left Facebook after a highly public ousting, and founded Anduril, which focuses on drones, cruise missiles, and other AI-enhanced technologies for the US Department of Defense. The company is now valued at $14 billion.

Now Luckey is redirecting his energy again, to headsets for the military. In September, Anduril announced it would partner with Microsoft on the US Army’s Integrated Visual Augmentation System (IVAS), arguably the military’s largest effort to develop a headset for use on the battlefield. Luckey says the IVAS project is his top priority at Anduril.

“There is going to be a heads-up display on every soldier within a pretty short period of time,” he told MIT Technology Review in an interview last week on his work with the IVAS goggles. “The stuff that we’re building—it’s going to be a big part of that.”

Though few would bet against Luckey’s expertise in the realm of mixed reality, few observers share his optimism for the IVAS program. They view it, thus far, as an avalanche of failures. 

IVAS was first approved in 2018 as an effort to build state-of-the-art mixed-reality headsets for soldiers. In March 2021, Microsoft was awarded nearly $22 billion over 10 years to lead the project, but it quickly became mired in delays. Just a year later, a Pentagon audit criticized the program for not properly testing the goggles, saying its choices “could result in wasting up to $21.88 billion in taxpayer funds to field a system that soldiers may not want to use or use as intended.” The first two variants of the goggles—of which the army purchased 10,000 units—gave soldiers nausea, neck pain, and eye strain, according to internal documents obtained by Bloomberg. 

Such reports have left IVAS on a short leash with members of the Senate Armed Services Committee, which helps determine how much money should be spent on the program. In a subcommittee meeting in May, Senator Tom Cotton, an Arkansas Republican and ranking member, expressed frustration at IVAS’s slow pace and high costs, and in July the committee suggested a $200 million cut to the program. 

Meanwhile, Microsoft has for years been cutting investments into its HoloLens headset—the hardware on which the IVAS program is based—for lack of adoption. In June, Microsoft announced layoffs to its HoloLens teams, suggesting the project is now focused solely on serving the Department of Defense. The company received a serious blow in August, when reports revealed that the Army is considering reopening bidding for the contract to oust Microsoft entirely. 

This is the catastrophe that Luckey’s stepped into. Anduril’s contribution to the project will be Lattice, an AI-powered system that connects everything from drones to radar jammers to surveil, detect objects, and aid in decision-making. Lattice is increasingly becoming Anduril’s flagship offering. It’s a tool that allows soldiers to receive instantaneous information not only from Anduril’s hardware, but also from radars, vehicles, sensors, and other equipment not made by Anduril. Now it will be built into the IVAS goggles. “It’s not quite a hive mind, but it’s certainly a hive eye” is how Luckey described it to me. 

Palmer Luckey holding an autonomous drone interceptor
Anvil, seen here held by Luckey in Anduril’s Costa Mesa Headquarters, integrates with the Lattice OS and can navigate autonomously to intercept hostile drones.
PHILIP CHEUNG

Boosted by Lattice, the IVAS program aims to produce a headset that can help soldiers “rapidly identify potential threats and take decisive action” on the battlefield, according to the Army. If designed well, the device will automatically sort through countless pieces of information—drone locations, vehicles, intelligence—and flag the most important ones to the wearer in real time. 

Luckey defends the IVAS program’s bumps in the road as exactly what one should expect when developing mixed reality for defense. “None of these problems are anything that you would consider insurmountable,” he says. “It’s just a matter of if it’s going to be this year or a few years from now.” He adds that delaying a product is far better than releasing an inferior product, quoting Shigeru Miyamoto, the game director of Nintendo: “A delayed game is delayed only once, but a bad game is bad forever.”

He’s increasingly convinced that the military, not consumers, will be the most important testing ground for mixed-reality hardware: “You’re going to see an AR headset on every soldier, long before you see it on every civilian,” he says. In the consumer world, any headset company is competing with the ubiquity and ease of the smartphone, but he sees entirely different trade-offs in defense.

“The gains are so different when we talk about life-or-death scenarios. You don’t have to worry about things like ‘Oh, this is kind of dorky looking,’ or ‘Oh, you know, this is slightly heavier than I would prefer,’” he says. “Because the alternatives of, you know, getting killed or failing your mission are a lot less desirable.”

Those in charge of the IVAS program remain steadfast in the expectation that it will pay off with huge gains for those on the battlefield. “If it works,” James Rainey, commanding general of the Army Futures Command, told the Armed Services Committee in May, “it is a legitimate 10x upgrade to our most important formations.” That’s a big “if,” and one that currently depends on Microsoft’s ability to deliver. Luckey didn’t get specific when I asked if Anduril was positioning itself to bid to become IVAS’s primary contractor should the opportunity arise. 

If that happens, US troops may, willingly or not, become the most important test subjects for augmented- and virtual-reality technology as it is developed in the coming decades. The commercial sector doesn’t have thousands of individuals within a single institution who can test hardware in physically and mentally demanding situations and provide their feedback on how to improve it. 

That’s one of the ways selling to the defense sector is very different from selling to consumers, Luckey says: “You don’t actually have to convince every single soldier that they personally want to use it. You need to convince the people in charge of him, his commanding officer, and the people in charge of him that this is a thing that is worth wearing.” The iterations that eventually come from IVAS—if it keeps its funding—could signal what’s coming next for the commercial market. 

When I asked Luckey if there were lessons from Oculus he had to unlearn when working with the Department of Defense, he said there’s one: worrying about budgets. “I prided myself for years, you know—I’m the guy who’s figured out how to make VR accessible to the masses by being absolutely brutal at every part of the design process, trying to get costs down. That isn’t what the DOD wants,” he says. “They don’t want the cheapest headset in a vacuum. They want to save money, and generally, spending a bit more money on a headset that is more durable or that has better vision—and therefore allows you to complete a mission faster—is definitely worth the extra few hundred dollars.”

I asked if he’s impressed by the progress that’s been made during his eight-year hiatus from mixed reality. Since he left Facebook in 2017, Apple, Magic Leap, Meta, Snap, and a cascade of startups have been racing to move the technology from the fringe to the mainstream. Everything in mixed reality is about trade-offs, he says. Would you like more computing power, or a lighter and more comfortable headset? 

With more time at Meta, “I would have made different trade-offs in a way that I think would have led to greater adoption,” he says. “But of course, everyone thinks that.” While he’s impressed with the gains, “having been on the inside, I also feel like things could be moving faster.”

Years after leaving, Luckey remains noticeably annoyed by one specific decision he thinks Meta got wrong: not offloading the battery. Dwelling on technical details is unsurprising from someone who spent his formative years living in a trailer in his parents’ driveway posting in obscure forums and obsessing over goggle prototypes. He pontificated on the benefits of packing the heavy batteries and chips in removable pucks that the user could put in a pocket, rather than in the headset itself. Doing so makes the headset lighter and more comfortable. He says he was pushing Facebook to go that route before he was ousted, but when he left, it abandoned the idea. Apple chose to have an external battery for its Vision Pro, which Luckey praised. 

“Anyway,” he told me. “I’m still sore about it eight years later.”

Speaking of soreness, Luckey’s most public professional wound, his ouster from Facebook in 2017, was partially healed last month. The story—involving countless Twitter threads, doxxing, retractions and corrections to news articles, suppressed statements, and a significant segment in Blake Harris’s 2020 book The History of the Future—is difficult to boil down. But here’s the short version: A donation by Luckey to a pro-Trump group called Nimble America in late 2016 led to turmoil within Facebook after it was reported by the Daily Beast. That turmoil grew, especially after Ars Technica wrote that his donation was funding racist memes (the founders of Nimble America were involved in the subreddit r/TheDonald, but the organization itself was focused on creating pro-Trump billboards). Luckey left in March 2017, but Meta has never disclosed why. 

This April, Oculus’s former CTO John Carmack posted on X that he regretted not supporting Luckey more. Meta’s CTO, Andrew Bosworth, argued with Carmack, largely siding with Meta. In response, Luckey said, “You publicly told everyone my departure had nothing to do with politics, which is absolutely insane and obviously contradicted by reams of internal communications.” The two argued. In the X argument, Bosworth cautioned that there are “limits on what can be said here,” to which Luckey responded, “I am down to throw it all out there. We can make everything public and let people judge for themselves. Just say the word.” 

Six months later, Bosworth apologized to Luckey for the comments. Luckey responded, writing that although he is “infamously good at holding grudges,” neither Bosworth nor current leadership at Meta was involved in the incident. 

By now Luckey has spent years mulling over how much of his remaining anger is irrational or misplaced, but one thing is clear. He has a grudge left, but it’s against people behind the scenes—PR agents, lawyers, reporters—who, from his perspective, created a situation that forced him to accept and react to an account he found totally flawed. He’s angry about the steps Facebook took to keep him from communicating his side (Luckey has said he wrote versions of a statement at the time but that Facebook threatened further escalation if he posted it).

“What am I actually angry at? Am I angry that my life went in that direction? Absolutely,” he says.

“I have a lot more anger for the people who lied in a way that ruined my entire life and that saw my own company ripped out from under me that I’d spent my entire adult life building,” he says. “I’ve got plenty of anger left, but it’s not at Meta, the corporate entity. It’s not at Zuck. It’s not at Boz. Those are not the people who wronged me.”

While various subcommittees within the Senate and House deliberate how many millions to spend on IVAS each year, what is not in question is the Pentagon is investing to prepare for a potential conflict in the Pacific between China and Taiwan. The Pentagon requested nearly $10 billion for the Pacific Deterrence Initiative in its latest budget. The prospect of such a conflict is something Luckey considers often. 

He told the authors of Unit X: How the Pentagon and Silicon Valley Are Transforming the Future of War that Anduril’s “entire internal road map” has been organized around the question “How do you deter China? Not just in Taiwan, but Taiwan and beyond?”

At this point, nothing about IVAS is geared specifically toward use in the South Pacific as opposed to Ukraine or anywhere else. The design is in early stages. According to transcripts of a Senate Armed Services Subcommittee meeting in May, the military was scheduled to receive the third iteration of IVAS goggles earlier this summer. If they were on schedule, they’re currently in testing. That version is likely to change dramatically before it approaches Luckey’s vision for the future of mixed-reality warfare, in which “you have a little bit of an AI guardian angel on your shoulder, helping you out and doing all the stuff that is easy to miss in the midst of battle.”

Palmer Luckey sitting on yellow metal staircase
Designs for IVAS will have to adapt amid a shifting landscape of global conflict.
PHILIP CHEUNG

But will soldiers ever trust such a “guardian angel”? If the goggles of the future rely on AI-powered software like Lattice to identify threats—say, an enemy drone ahead or an autonomous vehicle racing toward you—Anduril is making the promise that it can sort through the false positives, recognize threats with impeccable accuracy, and surface critical information when it counts most. 

Luckey says the real test is how the technology compares with the current abilities of humans. “In a lot of cases, it’s already better,” he says, referring to Lattice, as measured by Anduril’s internal tests (it has not released these, and they have not been assessed by any independent external experts). “People are fallible in ways that machines aren’t necessarily,” he adds.

Still, Luckey admits he does worry about the threats Lattice will miss.

“One of the things that really worries me is there’s going to be people who die because Lattice misunderstood something, or missed a threat to a soldier that it should have seen,” he says. “At the same time, I can recognize that it’s still doing far better than people are doing today.”

When Lattice makes a significant mistake, it’s unlikely the public will know. Asked about the balance between transparency and national security in disclosing these errors, Luckey said that Anduril’s customer, the Pentagon, will receive complete information about what went wrong. That’s in line with the Pentagon’s policies on responsible AI adoption, which require that AI-driven systems be “developed with methodologies, data sources, design procedures, and documentation that are transparent to and auditable by their relevant defense personnel.” 

However, the policies promise nothing about disclosure to the public, a fact that’s led some progressive think tanks, like the Brennan Center for Justice, to call on federal agencies to modernize public transparency efforts for the age of AI. 

“It’s easy to say, Well, shouldn’t you be honest about this failure of your system to detect something?” Luckey says, regarding Anduril’s obligations. “Well, what if the failure was because the Chinese figured out a hole in the system and leveraged that to speed past our defenses of some military base? I’d say there’s not very much public good served in saying, ‘Attention, everyone—there is a way to get past all of the security on every US military base around the world.’ I would say that transparency would be the worst thing you could do.”

AI will add to the e-waste problem. Here’s what we can do about it.

Generative AI could account for up to 5 million metric tons of e-waste by 2030, according to a new study.

That’s a relatively small fraction of the current global total of over 60 million metric tons of e-waste each year. However, it’s still a significant part of a growing problem, experts warn. 

E-waste is the term to describe things like air conditioners, televisions, and personal electronic devices such as cell phones and laptops when they are thrown away. These devices often contain hazardous or toxic materials that can harm human health or the environment if they’re not disposed of properly. Besides those potential harms, when appliances like washing machines and high-performance computers wind up in the trash, the valuable metals inside the devices are also wasted—taken out of the supply chain instead of being recycled.

Depending on the adoption rate of generative AI, the technology could add 1.2 million to 5 million metric tons of e-waste in total by 2030, according to the study, published today in Nature Computational Science

“This increase would exacerbate the existing e-waste problem,” says Asaf Tzachor, a researcher at Reichman University in Israel and a co-author of the study, via email.

The study is novel in its attempts to quantify the effects of AI on e-waste, says Kees Baldé, a senior scientific specialist at the United Nations Institute for Training and Research and an author of the latest Global E-Waste Monitor, an annual report.

The primary contributor to e-waste from generative AI is high-performance computing hardware that’s used in data centers and server farms, including servers, GPUs, CPUs, memory modules, and storage devices. That equipment, like other e-waste, contains valuable metals like copper, gold, silver, aluminum, and rare earth elements, as well as hazardous materials such as lead, mercury, and chromium, Tzachor says.

One reason that AI companies generate so much waste is how quickly hardware technology is advancing. Computing devices typically have lifespans of two to five years, and they’re replaced frequently with the most up-to-date versions. 

While the e-waste problem goes far beyond AI, the rapidly growing technology represents an opportunity to take stock of how we deal with e-waste and lay the groundwork to address it. The good news is that there are strategies that can help reduce expected waste.

Expanding the lifespan of technologies by using equipment for longer is one of the most significant ways to cut down on e-waste, Tzachor says. Refurbishing and reusing components can also play a significant role, as can designing hardware in ways that makes it easier to recycle and upgrade. Implementing these strategies could reduce e-waste generation by up to 86% in a best-case scenario, the study projected. 

Only about 22% of e-waste is being formally collected and recycled today, according to the 2024 Global E-Waste Monitor. Much more is collected and recovered through informal systems, including in low- and lower-middle-income countries that don’t have established e-waste management infrastructure in place. Those informal systems can recover valuable metals but often don’t include safe disposal of hazardous materials, Baldé says.

Another major barrier to reducing AI-related e-waste is concerns about data security. Destroying equipment ensures information doesn’t leak out, while reusing or recycling equipment will require using other means to secure data. Ensuring that sensitive information is erased from hardware before recycling is critical, especially for companies handling confidential data, Tzachor says.

More policies will likely be needed to ensure that e-waste, including from AI, is recycled or disposed of properly. Recovering valuable metals (including iron, gold, and silver) can help make the economic case. However, e-waste recycling will likely still come with a price, since it’s costly to safely handle the hazardous materials often found inside the devices, Baldé says. 

“For companies and manufacturers, taking responsibility for the environmental and social impacts of their products is crucial,” Tzachor says. “This way, we can make sure that the technology we rely on doesn’t come at the expense of human and planetary health.”

Kids are learning how to make their own little language models

“This new AI technology—it’s very interesting to learn how it works and understand it more,” says 10-year-old Luca, a young AI model maker.

Luca is one of the first kids to try Little Language Models, a new application from Manuj and Shruti Dhariwal, two PhD researchers at MIT’s Media Lab, that helps children understand how AI models work—by getting to build small-scale versions themselves. 

The program is a way to introduce the complex concepts that make modern AI models work without droning on about them in a theoretical lecture. Instead, kids can see and build a visualization of the concepts in practice, which helps them get to grips with them.

“What does it mean to have children see themselves as being builders of AI technologies and not just users?” says Shruti.

The program starts out by using a pair of dice to demonstrate probabilistic thinking, a system of decision-making that accounts for uncertainty. Probabilistic thinking underlies the LLMs of today, which predict the most likely next word in a sentence. By teaching a concept like it, the program can help to demystify the workings of LLMs for kids and assist them in understanding that sometimes the model’s choices are not perfect but the result of a series of probabilities. 

Students can modify each side of the dice to whatever variable they want. And then they can change how likely each side is to come up when you roll them. Luca thinks it would be “really cool” to incorporate this feature into the design of a Pokémon-like game he is working on. But it can also demonstrate some crucial realities about AI.

Let’s say a teacher wanted to educate students about how bias comes up in AI models. The kids could be told to create a pair of dice and then set each side to a hand of a different skin color. At first, they could set the probability of a white hand at 100%, reflecting a hypothetical situation where there are only images of white people in the data set. When the AI is asked to generate a visual, it produces only white hands.

Then the teacher can have the kids increase the percentage of other skin colors, simulating a more diverse data set. The AI model now produces hands of varying skin colors.

“It was interesting using Little Language Models, because it makes AI into something small [where the students] can grasp what’s going on,” says Helen Mastico, a middle school librarian in Quincy, Massachusetts, who taught a group of eighth graders to use the program.

“You start to see, ‘Oh, this is how bias creeps in,’” says Shruti. “It provides a rich context for educators to start talking about and for kids to imagine, basically, how these things scale to really big levels.”

They plan for the tool to be used around the world. Students will be able to upload their own data, monitored by their teacher. “[Students] can also add their own sounds, images, and backdrops that represent their culture,” says Manuj. 

The Dhariwals have also implemented a tool where kids can play around with more advanced concepts like Markov chains, where a preceding variable influences what comes after it. For example, a child could build an AI that creates random houses made from Lego bricks. The child can dictate that if the AI uses a red brick first, the percentage of yellow brick coming next is set much higher.

“The best way to support young people as creative learners is through helping them work on projects based on their passions,” says the Dhariwals’ PhD advisor Mitch Resnick, co-creator of Scratch, the most famous program in the world for teaching kids to code. “And that’s what Little Language Models does. It lets children take these new ideas and put them to use in creative ways.”

Little Language Models may fill a hole in the current educational landscape. “There is a real lack of playful resources and tools that teach children about data literacy and about AI concepts creatively,” says Emma Callow, a learning experience designer who works with educators and schools on implementing new ways to teach kids about technology. “Schools are more worried about safety, rather than the potential to use AI. But it is progressing in schools, and people are starting to kind of use it,” she says. “There is a space for education to change.”

Little Language Models is rolling out on the Dhariwals’ online education platform, coco.build, in mid-November, and they’re trialing the program at various schools over the next month. 

Luca’s mom, Diana, hopes the chance to experiment with it will serve him well. “It’s experiences like this that will teach him about AI from a very young age and help him use it in a wiser way,” she says.

Introducing: The AI Hype Index

There’s no denying that the AI industry moves fast. Each week brings a bold new announcement, product release, or lofty claim that pushes the bounds of what we previously thought was possible. Separating AI fact from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry.

Our first index is a white-knuckle ride that ranges from the outright depressing—rising numbers of sexually explicit deepfakes; the complete lack of rules governing Elon Musk’s Grok AI model—to the bizarre, including AI-powered dating wingmen and startup Friend’s dorky intelligent-jewelry line. 

But it’s not all a horror show—at least not entirely. AI is being used for more wholesome endeavors, too, like simulating the classic video game Doom without a traditional gaming engine. Elsewhere, AI models have gotten so good at table tennis they can now beat beginner-level human opponents. They’re also giving us essential insight into the secret names monkeys use to communicate with one another. Because while AI may be a lot of things, it’s never boring. 

How Wayve’s driverless cars will meet one of their biggest challenges yet

The UK driverless-car startup Wayve is headed west. The firm’s cars learned to drive on the streets of London. But Wayve has announced that it will begin testing its tech in and around San Francisco as well. And that brings a new challenge: Its AI will need to switch from driving on the left to driving on the right.

As visitors to or from the UK will know, making that switch is harder than it sounds. Your view of the road, how the vehicle turns—it’s all different, says Wayve’s vice president of software, Silvius Rus. Rus himself learned to drive on the left for the first time last year after years in the US. “Even for a human who has driven a long time, it’s not trivial,” he says.

Wayve’s US fleet of Ford Mustang Mach-E’s.
WAYVE

The move to the US will be a test of Wayve’s technology, which the company claims is more general-purpose than what many of its rivals are offering. Wayve’s approach has attracted massive investment—including a $1 billion funding round that broke UK records this May—and partnerships with Uber and online grocery firms such as Asda and Ocado. But it will now go head to head with the heavyweights of the growing autonomous-car industry, including Cruise, Waymo, and Tesla.  

Back in 2022, when I first visited the company’s offices in north London, there were two or three vehicles parked in the building’s auto shop. But on a sunny day this fall, both the shop and the forecourt are full of cars. A billion dollars buys a lot of hardware.

I’ve come for a ride-along. In London, autonomous vehicles can still turn heads. But what strikes me as I sit in the passenger seat of one of Wayve’s Jaguar I-PACE cars isn’t how weird it feels to be driven around by a computer program, but how normal—how comfortable, how safe. This car drives better than I do.

Regulators have not yet cleared autonomous vehicles to drive on London’s streets without a human in the loop. A test driver sits next to me, his hands hovering a centimeter above the wheel as it turns back and forth beneath them. Rus gives a running commentary from the back.

The midday traffic is light, but that makes things harder, says Rus: “When it’s crowded, you tend to follow the car in front.” We steer around roadworks, cyclists, and other vehicles stopped in the middle of the street. It starts to rain. At one point I think we’re on the wrong side of the road. But it’s a one-way street: The car has spotted a sign that I didn’t. We approach every intersection with what feels like deliberate confidence.

At one point a blue car (with a human at the wheel) sticks its nose into the stream of traffic just ahead of us. Urban drivers know this can go two ways: Hesitate and it’s a cue for the other car to pull out; push ahead and you’re telling it to wait its turn. Wayve’s car pushes ahead.

The interaction lasts maybe a second. But it’s the most impressive moment of my ride. Wayve says its model has picked up lots of defensive driving habits like this. “It was our right of way, and the safest approach was to assert that,” says Rus. “It learned to do that; it’s not programmed.”

Learning to drive

Everything that Wayve’s cars do is learned rather than programmed. The company uses different technology from what’s in most other driverless cars. Instead of separate, specialized models trained to handle individual tasks like spotting obstacles or finding a route around them—models that must then be wired up to work together—Wayve uses an approach called end-to-end learning.

This means that Wayve’s cars are controlled by a single large model that learns all the individual tasks needed to drive at once, using camera footage, feedback from test drivers (many of whom are former driving instructors), and a lot of reruns in simulation.

Wayve has argued that this approach makes its driving models more general-purpose. The firm has shown that it can take a model trained on the streets of London and then use that same model to drive cars in multiple UK cities—something that others have struggled to do.

But a move to the US is more than a simple relocation. It rewrites one of the most basic rules of driving—which side of the road to drive on. With Wayve’s single large model, there’s no left-hand-drive module to swap out. “We did not program it to drive on the left,” says Rus. “It’s just seen it enough to think that’s how it needs to drive. Even if there’s no marking on the road, it will still keep to the left.”  

“So how will the model learn to drive on the right? This will be an interesting question for the US.”

Answering that question involves figuring out whether the side of the road it drives on is a deep feature of Wayve’s model—intrinsic to its behavior—or a more superficial one that can be overridden with a little retraining.

Given the adaptability seen in the model so far, Rus believes it will switch to US streets just fine. He cites the way the cars have shown they can adapt to new UK cities, for example. “That gives us confidence in its capability to learn and to drive in new situations,” he says.

Under the hood

But Wayve needs to be certain. As well as testing its cars in San Francisco, Rus and his colleagues are poking around inside their model to find out what makes it tick. “It’s like you’re doing a brain scan and you can see there’s some activity in a certain part of the brain,” he says.

The team presents the model with many different scenarios and watches what parts of it get activated at specific times. One example is an unprotected turn—a turn that crosses traffic going in the opposite direction, without a traffic signal. “Unprotected turns are to the right here and to the left in the US,” says Rus. “So will it see them as similar? Or will it just see right turns as right turns?”

Figuring out why the model behaves as it does tells Wayve what kinds of scenarios require extra help. Using a hyper-detailed simulation tool called PRISM-1 that can reconstruct 3D street scenes from video footage, the company can generate bespoke scenarios and run the model through them over and over until it learns how to handle them. How much retraining might the model need? “I cannot tell you the amount. This is part of our secret sauce,” says Rus. “But it’s a small amount.”

Wayve’s simulation tool, PRISM-1, can reconstruct virtual street scenes from real video footage. Wayve uses the tool to help train its driving model.
WAYVE

The autonomous-vehicle industry is known for hype and overpromising. Within the past year, Cruise laid off hundreds after its cars caused chaos and injury on the streets of San Francisco. Tesla is facing federal investigation after its driver-assistance technology was blamed for multiple crashes, including a fatal collision with a pedestrian. 

But the industry keeps forging ahead. Waymo has said it is now giving 100,000 robotaxi rides a week in San Francisco, Los Angeles, and Phoenix. In China, Baidu claims it is giving some 287,000 rides in a handful of cities, including Beijing and Wuhan. Undaunted by the allegations that Tesla’s driver-assistance technology is unsafe, Elon Musk announced his Cybercab last week with a timeline that would put these driverless concept cars on the road by 2025. 

What should we make of it all? “The competition between robotaxi operators is heating up,” says Crijn Bouman, CEO and cofounder of Rocsys, a startup that makes charging stations for autonomous electric vehicles. “I believe we are close to their ChatGPT moment.”

“The technology, the business model, and the consumer appetite are all there,” Bouman says. “The question is which operator will seize the opportunity and come out on top.”

Others are more skeptical. We need to be very clear what we’re talking about when we talk about autonomous vehicles, says Saber Fallah, director of the Connected Autonomous Vehicle Research Lab at the University of Surrey, UK. Some of Baidu’s robotaxis still require a safety driver behind the wheel, for example. Cruise and Waymo have shown that a fully autonomous service is viable in certain locations. But it took years to train their vehicles to drive specific streets, and extending routes—safely—beyond existing neighborhoods will take time. “We won’t have robotaxis that can drive anywhere anytime soon,” says Fallah.

Fallah takes the extreme view that this won’t happen until all human drivers hand in their licenses. For robotaxis to be safe, they need to be the only vehicles on the road, he says. He thinks today’s driving models are still not good enough to interact with the complex and subtle behaviors of humans. There are just too many edge cases, he says.

Wayve is betting its approach will win out. In the US, it will begin by testing what it calls an advanced driver assistance system, a technology similar to Tesla’s. But unlike Tesla, Wayve plans to sell that technology to a wide range of existing car manufacturers. The idea is to build on this foundation to achieve full autonomy in the next few years. “We’ll get access to scenarios that are encountered by many cars,” says Rus. “The path to full self-driving is easier if you go level by level.”

But cars are just the start, says Rus. What Wayve is in fact building, he says, is an embodied model that could one day control many different types of machines, whether they have wheels, wings, or legs. 

“We’re an AI shop,” he says. “Driving is a milestone, but it’s a stepping stone as well.”

Reckoning with generative AI’s uncanny valley

Generative AI has the power to surprise in a way that few other technologies can. Sometimes that’s a very good thing; other times, not so good. In theory, as generative AI improves, this issue should become less important. However, in reality, as generative AI becomes more “human” it can begin to turn sinister and unsettling, plunging us into what robotics has long described as the “uncanny valley.”

It might be tempting to overlook this experience as something that can be corrected by bigger data sets or better training. However, insofar as it speaks to a disturbance in our mental model of the technology (e.g., I don’t like what it did there) it’s something that needs to be acknowledged and addressed.

Mental models and antipatterns

Mental models are an important concept in UX and product design, but they need to be more readily embraced by the AI community. At one level, mental models often don’t appear because they are routine patterns of our assumptions about an AI system. This is something we discussed at length in the process of putting together the latest volume of the Thoughtworks Technology Radar, a biannual report based on our experiences working with clients all over the world.

For instance, we called out complacency with AI generated code and replacing pair programming with generative AI as two practices we believe practitioners must avoid as the popularity of AI coding assistants continues to grow. Both emerge from poor mental models that fail to acknowledge how this technology actually works and its limitations. The consequences are that the more convincing and “human” these tools become, the harder it is for us to acknowledge how the technology actually works and the limitations of the “solutions” it provides us.

Of course, for those deploying generative AI into the world, the risks are similar, perhaps even more pronounced. While the intent behind such tools is usually to create something convincing and usable, if such tools mislead, trick, or even merely unsettle users, their value and worth evaporates. It’s no surprise that legislation, such as the EU AI Act, which requires of deep fake creators to label content as “AI generated,” is being passed to address these problems.

It’s worth pointing out that this isn’t just an issue for AI and robotics. Back in 2011, our colleague Martin Fowler wrote about how certain approaches to building cross platform mobile applications can create an uncanny valley, “where things work mostly like… native controls but there are just enough tiny differences to throw users off.”

Specifically, Fowler wrote something we think is instructive: “different platforms have different ways they expect you to use them that alter the entire experience design.” The point here, applied to generative AI, is that different contexts and different use cases all come with different sets of assumptions and mental models that change at what point users might drop into the uncanny valley. These subtle differences change one’s experience or perception of a large language model’s (LLM) output.

For example, for the drug researcher that wants vast amounts of synthetic data, accuracy at a micro level may be unimportant; for the lawyer trying to grasp legal documentation, accuracy matters a lot. In fact, dropping into the uncanny valley might just be the signal to step back and reassess your expectations.

Shifting our perspective

The uncanny valley of generative AI might be troubling, even something we want to minimize, but it should also remind us of generative AI’s limitations—it should encourage us to rethink our perspective.

There have been some interesting attempts to do that across the industry. One that stands out is Ethan Mollick, a professor at the University of Pennsylvania, who argues that AI shouldn’t be understood as good software but instead as “pretty good people.”

Therefore, our expectations about what generative AI can do and where it’s effective must remain provisional and should be flexible. To a certain extent, this might be one way of overcoming the uncanny valley—by reflecting on our assumptions and expectations, we remove the technology’s power to disturb or confound them.

However, simply calling for a mindset shift isn’t enough. There are various practices and tools that can help. One example is the technique, which we identified in the latest Technology Radar, of getting structured outputs from LLMs. This can be done by either instructing a model to respond in a particular format when prompting or through fine-tuning. Thanks to tools like Instructor, it is getting easier to do that and creates greater alignment between expectations and what the LLM will output. While there’s a chance something unexpected or not quite right might happen, this technique goes some way to addressing that.

There are other techniques too, including retrieval augmented generation as a way of better controlling the “context window.” There are frameworks and tools that can help evaluate and measure the success of such techniques, including Ragas and DeepEval, which are libraries that provide AI developers with metrics for faithfulness and relevance.

Measurement is important, as are relevant guidelines and policies for LLMs, such as LLM guardrails. It’s important to take steps to better understand what’s actually happening inside these models. Completely unpacking these black boxes might be impossible, but tools like Langfuse can help. Doing so may go a long way in reorienting the relationship with this technology, shifting mental models, and removing the possibility of falling into the uncanny valley.

An opportunity, not a flaw

These tools—part of a Cambrian explosion of generative AI tools—can help practitioners rethink generative AI and, hopefully, build better and more responsible products. However, for the wider world, this work will remain invisible. What’s important is exploring how we can evolve toolchains to better control and understand generative AI, even though existing mental models and conceptions of generative AI are a fundamental design problem, not a marginal issue we can choose to ignore.

Ken Mugrage is the principal technologist in the office of the CTO at Thoughtworks. Srinivasan Raguraman is a technical principal at Thoughtworks based in Singapore.

This content was produced by Thoughtworks. It was not written by MIT Technology Review’s editorial staff.

Google DeepMind is making its AI text watermark open source

Google DeepMind has developed a tool for identifying AI-generated text and is making it available open source. 

The tool, called SynthID, is part of a larger family of watermarking tools for generative AI outputs. The company unveiled a watermark for images last year, and it has since rolled out one for AI-generated video. In May, Google announced it was applying SynthID in its Gemini app and online chatbots and made it freely available on Hugging Face, an open repository of AI data sets and models. Watermarks have emerged as an important tool to help people determine when something is AI generated, which could help counter harms such as misinformation. 

“Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly,” says Pushmeet Kohli, the vice president of research at Google DeepMind. 

SynthID works by adding an invisible watermark directly into the text when it is generated by an AI model. 

Large language models work by breaking down language into “tokens” and then predicting which token is most likely to follow the other. Tokens can be a single character, word, or part of a phrase, and each one gets a percentage score for how likely it is to be the appropriate next word in a sentence. The higher the percentage, the more likely the model is going to use it. 

SynthID introduces additional information at the point of generation by changing the probability that tokens will be generated, explains Kohli. 

To detect the watermark and determine whether text has been generated by an AI tool, SynthID compares the expected probability scores for words in watermarked and unwatermarked text. 

Google DeepMind found that using the SynthID watermark did not compromise the quality, accuracy, creativity, or speed of generated text. That conclusion was drawn from a massive live experiment of SynthID’s performance after the watermark was deployed in its Gemini products and used by millions of people. Gemini allows users to rank the quality of the AI model’s responses with a thumbs-up or a thumbs-down. 

Kohli and his team analyzed the scores for around 20 million watermarked and unwatermarked chatbot responses. They found that users did not notice a difference in quality and usefulness between the two. The results of this experiment are detailed in a paper published in Nature today. Currently SynthID for text only works on content generated by Google’s models, but the hope is that open-sourcing it will expand the range of tools it’s compatible with. 

SynthID does have other limitations. The watermark was resistant to some tampering, such as cropping text and light editing or rewriting, but it was less reliable when AI-generated text had been rewritten or translated from one language into another. It is also less reliable in responses to prompts asking for factual information, such as the capital city of France. This is because there are fewer opportunities to adjust the likelihood of the next possible word in a sentence without changing facts. 

“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where LLM outputs are near deterministic, such as factual questions or code generation tasks,” says Soheil Feizi, an associate professor at the University of Maryland, who has studied the vulnerabilities of AI watermarking.  

Feizi says Google DeepMind’s decision to open-source its watermarking method is a positive step for the AI community. “It allows the community to test these detectors and evaluate their robustness in different settings, helping to better understand the limitations of these techniques,” he adds. 

There is another benefit too, says João Gante, a machine-learning engineer at Hugging Face. Open-sourcing the tool means anyone can grab the code and incorporate watermarking into their model with no strings attached, Gante says. This will improve the watermark’s privacy, as only the owner will know its cryptographic secrets. 

“With better accessibility and the ability to confirm its capabilities, I want to believe that watermarking will become the standard, which should help us detect malicious use of language models,” Gante says. 

But watermarks are not an all-purpose solution, says Irene Solaiman, Hugging Face’s head of global policy. 

“Watermarking is one aspect of safer models in an ecosystem that needs many complementing safeguards. As a parallel, even for human-generated content, fact-checking has varying effectiveness,” she says. 

Investing in AI to build next-generation infrastructure

The demand for new and improved infrastructure across the world is not being met. The Asian Development Bank has estimated that in Asia alone, roughly $1.7 trillion needs to be invested annually through to 2030 just to sustain economic growth and offset the effects of climate change. Globally, that figure has been put at $15 trillion.

In the US, for example, it is no secret that the country’s highways, railways and bridges are in need of updating. But similar to many other sectors, there are significant shortages in skilled workers and resources, which delays all-important repairs and maintenance and harms efficiency.

This infrastructure gap – the difference between funding and construction – is vast. And while governments and companies everywhere are feeling the strain of constructing an energy efficient and sustainable built environment, it’s proving more than humans can do alone. To redress this imbalance, many organizations are turning to various forms of AI, including large language models (LLMs) and machine learning (ML). Collectively, they are not yet able to fix all current infrastructure problems but they are already helping to reduce costs, risks, and increase efficiency.

Overcoming resource constraints

A shortage of skilled engineering and construction labor is a major problem. In the US, it is estimated that there will be a 33% shortfall in the supply of new talent by 2031, with unfilled positions in software, industrial, civil and electrical engineering. Germany reported a shortage of 320,000 science, technology, engineering, and mathematics (STEM) specialists in 2022 and another engineering powerhouse, Japan, has forecast a deficit of more than 700,000 engineers by 2030. Considering the duration of most engineering projects (repairing a broken gas pipeline for example, can take decades), the demand for qualified engineers will only continue to outstrip supply unless something is done.

Immigration and visa restrictions for international engineering students, and a lack of retention in formative STEM jobs, exert additional constraints. Plus, there is the issue of task duplication which is something AI can do with ease.

Julien Moutte, CTO of Bentley Systems explains, “There’s a massive amount of work that engineers have to do that is tedious and repetitive. Between 30% to 50% of their time is spent just compressing 3D models into 2D PDF formats. If that work can be done by AI-powered tools, they can recover half their working time which could then be invested in performing higher value tasks.”

With guidance, AI can automate the same drawings hundreds of times. Training engineers to ask the right questions and use AI optimally will ease the burden and stress of repetition.

However, this is not without challenges. Users of ChatGPT, or other LLMs, know the pitfalls of AI hallucinations, where the model can logically predict a sequence of words but without contextual understanding of what the words mean. This can lead to nonsensical outputs, but in engineering, hallucinations can sometimes be altogether more risky. “If a recommendation was made by AI, it needs to be validated,” says Moutte. “Is that recommendation safe? Does it respect the laws of physics? And it’s a waste of time for the engineers to have to review all these things.”

But this can be offset by having existing company tools and products running simulations and validating the designs using established engineering rules and design codes which again relieves the burden of having the engineers having to do the validating themselves.

Improving resource efficiency

An estimated 30% of building materials, such as steel and concrete, are wasted on a typical construction site in the United States and United Kingdom, with the majority ending up in landfills, although countries such as Germany and The Netherlands have recently implemented recycling measures. This, and the rising cost of raw materials, is putting pressure on companies to think of solutions to improve construction efficiency and sustainability.

AI can provide solutions to both of these issues during the design and construction phases. Digital twins can help workers spot deviations in product quality even and provide the insights needed to minimize waste and energy output and crucially, save money.

Machine learning models use real-time data from field statistics and process variables to flag off-spec materials, product deviations and excess energy usage, such as machinery and transportation for construction site workers. Engineers can then anticipate the gaps and streamline the processes, making large-scale overall improvements for each project which can be replicated in the future.

“Being able to anticipate and reduce that waste with that visual awareness, with the application of AI to make sure that you are optimizing those processes and those designs and the resources that you need to construct that infrastructure is massive,” says Moutte.

He continues, “The big game changer is going to be around sustainability because we need to create infrastructure with more sustainable and efficient designs, and there’s a lot of room for improvement.” And an important part of this will be how AI can help create new materials and models to reduce waste.

Human and AI partnership

AI might never be entirely error-free, but for the time being, human intervention can catch mistakes. Although there may be some concern in the construction sector that AI will replace humans, there are elements to any construction project that only people can do.

AI lacks the critical thinking and problem-solving that humans excel at, so additional training for engineers to supervise and maintain the automated systems is key so that each side can work together optimally. Skilled workers have creativity and intuition, as well as customer service expertise, while AI is not yet capable of such novel solutions.

With the engineers implementing appropriate guardrails and frameworks, AI can contribute the bulk of automation and repetition to projects, thereby creating a symbiotic and optimal relationship between humans and machines.

“Engineers have been designing impressive buildings for decades already, where they are not doing all the design manually. You need to make sure that those structures are validated first by engineering principles, physical rules, local codes, and the rest. So we have all the tools to be able to validate those designs,” explains Moutte.

As AI advances alongside human care and control, it can help futureproof the construction process where every step is bolstered by the strengths of both sides. By addressing the concerns of the construction industry – costs, sustainability, waste and task repetition – and upskilling engineers to manage AI to address these at the design and implementation stage, the construction sector looks set to be less riddled with potholes.

“We’ve already seen how AI can be used to create new materials and reduce waste,” explains Moutte. “As we move to 2050, I believe engineers will need those AI capabilities to create the best possible designs and I’m looking forward to releasing some of those AI-enabled features in our products.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.