Responsible technology use in the AI age

The sudden appearance of application-ready generative AI tools over the last year has confronted us with challenging social and ethical questions. Visions of how this technology could deeply alter the ways we work, learn, and live have also accelerated conversations—and breathless media headlines—about how and whether these technologies can be responsibly used.

Responsible technology use, of course, is nothing new. The term encompasses a broad range of concerns, from the bias that might be hidden inside algorithms, to the data privacy rights of the users of an application, to the environmental impacts of a new way of work. Rebecca Parsons, CTO emerita at the technology consultancy Thoughtworks, collects all of these concerns under “building an equitable tech future,” where, as new technology is deployed, its benefits are equally shared. “As technology becomes more important in significant aspects of people’s lives,” she says, “we want to think of a future where the tech works right for everyone.”

Technology use often goes wrong, Parsons notes, “because we’re too focused on either our own ideas of what good looks like or on one particular audience as opposed to a broader audience.” That may look like an app developer building only for an imagined customer who shares his geography, education, and affluence, or a product team that doesn’t consider what damage a malicious actor could wreak in their ecosystem. “We think people are going to use my product the way I intend them to use my product, to solve the problem I intend for them to solve in the way I intend for them to solve it,” says Parsons. “But that’s not what happens when things get out in the real world.”

AI, of course, poses some distinct social and ethical challenges. Some of the technology’s unique challenges are inherent in the way that AI works: its statistical rather than deterministic nature, its identification and perpetuation of patterns from past data (thus reinforcing existing biases), and its lack of awareness about what it doesn’t know (resulting in hallucinations). And some of its challenges stem from what AI’s creators and users themselves don’t know: the unexamined bodies of data underlying AI models, the limited explainability of AI outputs, and the technology’s ability to deceive users into treating it as a reasoning human intelligence.

Parsons believes, however, that AI has not changed responsible tech so much as it has brought some of its problems into a new focus. Concepts of intellectual property, for example, date back hundreds of years, but the rise of large language models (LLMs) has posed new questions about what constitutes fair use when a machine can be trained to emulate a writer’s voice or an artist’s style. “It’s not responsible tech if you’re violating somebody’s intellectual property, but thinking about that was a whole lot more straightforward before we had LLMs,” she says.

The principles developed over many decades of responsible technology work still remain relevant during this transition. Transparency, privacy and security, thoughtful regulation, attention to societal and environmental impacts, and enabling wider participation via diversity and accessibility initiatives remain the keys to making technology work toward human good.

MIT Technology Review Insights’ 2023 report with Thoughtworks, “The state of responsible technology,” found that executives are taking these considerations seriously. Seventy-three percent of business leaders surveyed, for example, agreed that responsible technology use will come to be as important as business and financial considerations when making technology decisions. 

This AI moment, however, may represent a unique opportunity to overcome barriers that have previously stalled responsible technology work. Lack of senior management awareness (cited by 52% of those surveyed as a top barrier to adopting responsible practices) is certainly less of a concern today: savvy executives are quickly becoming fluent in this new technology and are continually reminded of its potential consequences, failures, and societal harms.

The other top barriers cited were organizational resistance to change (46%) and internal competing priorities (46%). Organizations that have realigned themselves behind a clear AI strategy, and who understand its industry-altering potential, may be able to overcome this inertia and indecision as well. At this singular moment of disruption, when AI provides both the tools and motivation to redesign many of the ways in which we work and live, we can fold responsible technology principles into that transition—if we choose to.

For her part, Parsons is deeply optimistic about humans’ ability to harness AI for good, and to work around its limitations with common-sense guidelines and well-designed processes with human guardrails. “As technologists, we just get so focused on the problem we’re trying to solve and how we’re trying to solve it,” she says. “And all responsible tech is really about is lifting your head up, and looking around, and seeing who else might be in the world with me.”

To read more about Thoughtworks’ analysis and recommendations on responsible technology, visit its Looking Glass 2024.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Google’s new version of Gemini can handle far bigger amounts of data

Google DeepMind today launched the next generation of its powerful artificial intelligence model Gemini, which has an enhanced ability to work with large amounts of video, text, and images.

It’s an advancement from the three versions of Gemini 1.0 that Google announced back in December, ranging in size and complexity from Nano to Pro to Ultra. (It rolled out Gemini 1.0 Pro and 1.0 Ultra across many of its products last week.) Google is now releasing a preview of Gemini 1.5 Pro to select developers and business customers. The company says that the mid-tier Gemini 1.5 Pro matches its previous top-tier model, Gemini 1.0 Ultra, in performance, but uses less computing power (yes, the names are confusing!). 

Crucially, the 1.5 Pro model can handle much larger amounts of data from users, including the size of prompts. While every AI model has a ceiling of how much data it can digest, the standard version of the new Gemini 1.5 Pro can handle inputs as large as 128,000 tokens, which are words or parts of words that an AI model breaks inputs into. That’s on a par with the best version of GPT-4 (GPT-4 Turbo). 

However, a limited group of developers will be able to submit up to 1 million tokens to Gemini 1.5 Pro, which equates to roughly 1 hour of video, 11 hours of audio, or 700,000 words of text. That’s a significant jump that makes it possible to do things that no other models are currently capable of.

In one demonstration video shown by Google, using the million-token version, researchers fed the model a 402-page transcript of the Apollo moon landing mission. Then they showed Gemini a hand-drawn sketch of a boot, and asked it to identify the moment in the transcript that the drawing represents.

“This is the moment Neil Armstrong landed on the moon,” the chatbot responded correctly. “He said, ‘One small step for man, one giant leap for mankind.’”

The model was also able to identify moments of humor. When asked by the researchers to find a funny moment in the Apollo transcript, it picked out when astronaut Mike Collins referred to Armstrong as “the Czar.” (Probably not the best line, but you get the point).  

In another demonstration, the team uploaded a 44-minute silent film featuring Buster Keaton and asked the AI to identify what information was on a piece of paper that, at some point in the movie, is removed from a character’s pocket. In less than a minute, the model found the scene and correctly recalled the text written on the paper. Researchers also repeated a similar task from the Apollo experiment, asking the model to find a scene in the film based on a drawing, which it completed. 

Google says it put Gemini 1.5 Pro through the usual battery of tests it uses when developing large language models, including evaluations that combine text, code, images, audio and video. It found that 1.5 Pro outperformed 1.0 Pro on 87% of the benchmarks and more or less matched 1.0 Ultra across all of them while using less computing power. 

The ability to handle larger inputs, Google says, is a result of progress in what’s called mixture-of-experts architecture. An AI using this design divides its neural network into chunks, only activating the parts that are relevant to the task at hand, rather than firing up the whole network at once. (Google is not alone in using this architecture; French AI firm Mistral released a model using it, and GPT-4 is rumored to employ the tech as well.)

“In one way it operates much like our brain does, where not the whole brain activates all the time,” says Oriol Vinyals, a deep learning team lead at DeepMind. This compartmentalizing saves the AI computing power and can generate responses faster.

“That kind of fluidity going back and forth across different modalities, and using that to search and understand, is very impressive,” says Oren Etzioni, former technical director of the Allen Institute for Artificial Intelligence, who was not involved in the work. “This is stuff I have not seen before.”

An AI that can operate across modalities would more closely resemble the way that human beings behave. “People are naturally multimodal,” Etzioni says, because we can effortlessly switch between speaking, writing, and drawing images or charts to convey ideas. 

Etzioni cautioned against taking too much meaning from the developments, however. “There’s a famous line,” he says. “Never trust an AI demo.” 

For one, it’s not clear how much the demonstration videos left out or cherry-picked from various tasks (Google indeed received criticism for its early Gemini launch for not disclosing that the video was sped up). It’s also possible the model would not be able to replicate some of the demonstrations if the input wording were slightly tweaked. AI models in general, says Etzioni, are brittle. 

Today’s release of Gemini 1.5 Pro is limited to developers and enterprise customers. Google did not specify when it will be available for wider release. 

OpenAI teases an amazing new generative video model called Sora

OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.

Based on four sample videos that OpenAI shared with MIT Technology Review ahead of today’s announcement, the San Francisco-based firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024).

“We think building models that can understand video, and understand all these very complex interactions of our world, is an important step for all future AI systems,” says Tim Brooks, a scientist at OpenAI.

But there’s a disclaimer. OpenAI gave us a preview of Sora (which means sky in Japanese) under conditions of strict secrecy. In an unusual move, the firm would only share information about Sora if we agreed to wait until after news of the model was made public to seek the opinions of outside experts. [Editor’s note: we’ve updated this story with outside comment below.] OpenAI has not released a technical report or demonstrated the model actually working. And it says it won’t be releasing Sora anytime soon.

PROMPT: animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. the use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image. (Credit: OpenAI)
PROMPT: a gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures (Credit: OpenAI)

The first generative models that could produce video from snippets of text appeared in late 2022. But early examples from Meta, Google, and a startup called Runway were glitchy and grainy. Since then, the tech has been getting better fast. Runway’s Gen-2 model, released last year, can produce short clips that come close to matching big-studio animation in their quality. But most of these examples are still only a few seconds long.  

The sample videos from OpenAI’s Sora are high-definition and full of detail. OpenAI also says it can generate videos up to a minute long. One video of a Tokyo street scene shows that Sora has learned how objects fit together in 3D: the camera swoops into the scene to follow a couple as they walk past a row of shops.

OpenAI also claims that Sora handles occlusion well. One problem with existing models is that they can fail to keep track of objects when they drop out of view. For example, if a truck passes in front of a street sign, the sign might not reappear afterward.  

In a video of a papercraft underwater scene, Sora has added what look like cuts between different pieces of footage, and the model has maintained a consistent style between them.

It’s not perfect. In the Tokyo video, cars to the left look smaller than the people walking beside them. They also pop in and out between the tree branches. “There’s definitely some work to be done in terms of long-term coherence,” says Brooks. “For example, if someone goes out of view for a long time, they won’t come back. The model kind of forgets that they were supposed to be there.”

Tech tease

Impressive as they are, the sample videos shown here were no doubt cherry-picked to show Sora at its best. Without more information, it is hard to know how representative they are of the model’s typical output.   

It may be some time before we find out. OpenAI’s announcement of Sora today is a tech tease, and the company says it has no current plans to release it to the public. Instead, OpenAI will today begin sharing the model with third-party safety testers for the first time.

In particular, the firm is worried about the potential misuses of fake but photorealistic video. “We’re being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public,” says Aditya Ramesh, a scientist at OpenAI, who created the firm’s text-to-image model DALL-E.

But OpenAI is eyeing a product launch sometime in the future. As well as safety testers, the company is also sharing the model with a select group of video makers and artists to get feedback on how to make Sora as useful as possible to creative professionals. “The other goal is to show everyone what is on the horizon, to give a preview of what these models will be capable of,” says Ramesh.

To build Sora, the team adapted the tech behind DALL-E 3, the latest version of OpenAI’s flagship text-to-image model. Like most text-to-image models, DALL-E 3 uses what’s known as a diffusion model. These are trained to turn a fuzz of random pixels into a picture.

Sora takes this approach and applies it to videos rather than still images. But the researchers also added another technique to the mix. Unlike DALL-E or most other generative video models, Sora combines its diffusion model with a type of neural network called a transformer.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini. But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Brooks.

The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, including different resolutions, durations, aspect ratio, and orientation. “It really helps the model,” says Brooks. “That is something that we’re not aware of any existing work on.”

PROMPT: several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field (Credit: OpenAI)
PROMPT: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes (Credit: OpenAI)

“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.” 

OpenAI is well aware of the risks that come with a generative video model. We are already seeing the large-scale misuse of deepfake images. Photorealistic video takes this to another level.

Gregory notes that you could use technology like this to misinform people about conflict zones or protests. The range of styles is also interesting, he says. If you could generate shaky footage that looked like it was shot with a phone it would come across as more authentic.

The tech is not there yet, but generative video has gone from zero to Sora in just 18 months. “We’re going to be entering a universe where there will be fully synthetic content, human-generated content and a mix of the two,” says Gregory.

The OpenAI team plans to draw on the safety testing it did last year for DALL-E 3. Sora already includes a filter that runs on all prompts sent to the model that will block requests for violent, sexual, or hateful images, as well as images of known people. Another filter will look at frames of generated videos and block material that violates OpenAI’s safety policies.

OpenAI says it is also adapting a fake-image detector developed for DALL-E 3 to use with Sora. And the company will embed industry-standard C2PA tags, metadata that states how an image was generated, into all of Sora’s output. But these steps are far from foolproof. Fake-image detectors are hit-or-miss. Metadata is easy to remove, and most social media sites strip it from uploaded images by default.  

“We’ll definitely need to get more feedback and learn more about the types of risks that need to be addressed with video before it would make sense for us to release this,” says Ramesh.

Brooks agrees. “Part of the reason that we’re talking about this research now is so that we can start getting the input that we need to do the work necessary to figure out how it could be safely deployed,” he says.

Update 2/15: Comments from Sam Gregory were added.

Providing the right products at the right time with machine learning

Whether your favorite condiment is Heinz ketchup or your preferred spread for your bagel is Philadelphia cream cheese, ensuring that all customers have access to their preferred products at the right place, at the right price, and at the right time requires careful supply chain organization and distribution. Amid the proliferation of e-commerce and shifting demand within the consumer-packaged goods (CPG) sector, AI and machine learning (ML) have become helpful tools in enabling efficiency and better business outcomes.

The journey toward successfully deployed machine learning operations (MLOps) starts with data, says global head of machine learning operations and platforms at Kraft Heinz Company, Jorge Balestra. Curating well-organized and accessible data means enterprises can leverage their data volumes to train and develop AI and machine learning models. A strong data strategy lays the foundation for these AI and machine learning tools to use data to detect supply chain disruptions, identify and address cost inefficiencies, and predict demand for products.

“Never forget that data is the fuel, and data, it takes effort, it is a journey, it never ends, because that’s what is really what I would call what differentiates a lot of successful efforts compared to unsuccessful ones,” says Balestra.

This is especially crucial but challenging within the CPG sector where data is often incomplete given the inconsistent methods for consumer habit tracking among different retailers.

He explains, “We don’t know exactly and we don’t even want to know exactly what people are doing in their daily lives. What we want is just to get enough of the data so we can provide the right product for our consumers.”

To deploy AI and machine learning tools at scale, the Heinz Kraft Company has turned to the flexibility of the cloud. Using the cloud can allow for much-needed data accessibility while mitigating compute power.  “The agility of the whole thing increases exponentially because what used to take months, now can be done in a matter of seconds via code. So, definitely, I see how all of this explosion around analytics, around AI, is possible, because of cloud really powering all of these initiatives that are popping up left, right, and center.” says Balestra.

While it may be challenging to predict future trends in a sector so prone to change, Balestra says that preparing for the road ahead means focusing on adaptability and agility.

“Our mission is to delight people via food. And the technology, AI or what have you, is our tool to excel at our mission. Being able to learn how to leverage existing and future [technology] to get the right product at the right price, at the right location is what we are all about.”

This episode of Business Lab is produced in partnership with Infosys Topaz and Infosys Cobalt.

Full Transcript

Laurel Ruma: From MIT Technology Review, I’m Laurel Ruma, and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

Our topic is machine learning in the food and beverage industry. AI offers opportunities for innovation for customers and operational efficiencies for employees, but having a data strategy in place to capture these benefits is crucial.

Two words for you: global innovation.

My guest is Jorge Balestra, global head of machine learning operations and platforms at Kraft Heinz Company.

This episode of Business Lab is produced in partnership with Infosys Topaz and Infosys Cobalt.

Welcome, Jorge.

Jorge Balestra: Thank you very much. Glad to be here.

Laurel: Well, wonderful to have you. So people are likely familiar with Kraft Heinz since it is one of the world’s largest food and beverage companies. Could you talk to us about your role at Kraft Heinz and how machine learning can help consumers in the grocery aisle?

Jorge: Certainly. My role, I will call, has two major focuses in two areas. One of them is I lead the machine learning engineering operations of the company globally. And on the other hand, I provide all of the analytical platforms that the company is using also on a global basis. So in role number one in my machine learning engineering and operations, what my team does is we grab all of these models that our community of data scientists that are working globally are coming up with, and we grabbed them and we strengthened it. Our major mission here is the first thing we need to do is we need to make sure that we are applying engineering practices to make them production ready and they can scale, they can also run in a cost-effective manner, and from there we ensure that in my operations hat they are there when needed.

So a lot of these models, because they become part of our day-to-day operations, they’re going to come with certain specific service level commitments that we need to make, so my team makes sure that we are delivering on those with the right expectations. And on my other hand, which is the analytical platforms, is that we do a lot of descriptive, predictive, and prescriptive work in terms of analytics. The descriptive portion where you’re talking about just the regular dashboarding, summarization piece around our data and where the data lives, all of those analytical platforms that the company is using are also something that I take care of. And with that, you would think that I have a very broad base of customers in the company both in terms of geographies where they are from some of our businesses in Asia, all the way to North America, but also across the organization from marketing to HR and everything in between.

Going into your other question about how machine learning is helping our consumers in the grocery aisle, I’ll probably summarize that for a CPG it’s all about having the right product at the right price, at the right location for you. What that means is on the right product, their machine learning can help a lot of our marketing teams, for example, even when they are now with the latest generative AI capabilities are showing up like brainstorming and creating new content to R&D, what we’re trying to figure out what is the best formulas for our products, there’s definitely now ML is making inroads in that space, the right price, all about cost efficiencies throughout from our plans to our distribution centers, making sure that we are eliminating waste. Leveraging machine learning capabilities is something that we are doing across the board from our revenue management, which is the right price for people to buy our products.

And then last but not least is the right location. So we need to make sure that when our consumers are going into their stores or are buying our products online that the product is there for you and you’re going to find the product you like, the flavor you like immediately. And so there is a huge effort around predicting our demand, organizing our supply chain, our distribution, scheduling our plans to make sure that we are producing the right quantities and delivering them to the right places so our consumers can find our products.

Laurel: Well, that certainly makes sense since data does play such a crucial role in deploying advanced technologies, especially machine learning. So how does Kraft Heinz ensure the accessibility, quality and security of all of that data at the right place at the right time to drive effective machine learning operations or MLOps? Are there specific best practices that you’ve discovered?

Jorge: Well, the best practice that I can probably advise people on is definitely data is the fuel of machine learning. So without data, there is no modeling. And data, organizing your data, both the data that you have internally and externally takes time. Making sure that it’s not only accessible and you are organizing it in a way that you don’t have a gazillion technologies to deal with is important, but also I would say the curation of it. That is a long-term commitment. So I strongly advise anyone that is listening right now to understand that your data journey, as it is, is a journey, it doesn’t have an end destination, and also it’s going to take time.

And the more you are successful in terms of getting all the data that you need organized and making sure that is available, the more successful you’re going to be leveraging all of that with models in machine learning and great things that are there to actually then accomplish a specific business outcome. So a good metaphor that I like to say is there’s a lot of researchers, and MIT is known for its research, but the researchers cannot do anything without the librarians, with all the people that’s organizing the knowledge around so you can go and actually do what you need to do, which is in this case research. Never forget that data is the fuel, and data, it takes effort, it is a journey, it never ends, because that’s what is really what I would call what differentiates a lot of successful efforts compared to unsuccessful ones.

Laurel: Getting back to that right place at the right time mentality, within the last few years, the consumer packaged goods, or you mentioned earlier, the CPG sector, has seen such major shifts from changing customer demands to the proliferation of e-commerce channels. So how can AI and machine learning tools help influence business outcomes or improve operational efficiency?

Jorge: I’ve got two examples that I can say. One is, well, obviously we all want to forget about what happened during the pandemic, but for us it was a key, very challenging time, because out of nowhere all of our supply chains got disrupted, our consumers needed our products more than ever because there were more hunkered down at home. So one of the things that I tell you, at least for us, that was key was through our modeling, through the data that we’ve had, we’ve had some good early warning of certain disruptions in the supply chain and we were able to at least get… Especially when the outbreak started, a couple of weeks in advance, we were moving product, we were taking early actions in terms of ensuring that we were delivering an increased amount of product that was needed.

And that was because we had the data and we had some of those models that were alerting us about, “Hey, something is wrong here, something is happening with our supply chain, you need to take action.” And taking action at the right time, it’s key in terms of getting ahead of a lot of the things that can happen. And in our case, obviously we live in a competitive world, so taking actions before competition is important, that timing component. Another example I can give you and is more of something that is we’re doing more and more nowadays is this piece that I was referring to about the right location about product availability is key for CPG, and that is measured in something that is called the CFR, and is the customer field rate, which means is when someone is ordering product from Kraft Heinz that we are able to fulfill that order to 100%, and we are expecting to be really high with high 90s in terms of how efficient we are filling those orders.

We have developed new technology that I think we are pretty proud of because I think it is unique within CPG that allows us to really predict what is going to happen with CFR in the future based on the specific actions we’re taking today, whereas it’s changing our production lines, whereas changes in distribution, et cetera, we’re able to see not only the immediate effect, but what’s going to happen in the future with that CFR so we can really act on it and deliver actions right now that are in the benefit of our distribution in the future. So those are, I would call it, say, two examples in terms of how we’re leveraging AI and machine learning tools in our day-to-day operations.

Laurel: Are those examples, the CFR as well as the supply chain and making sure consumers had everything on demand almost, is this unique to the food and beverage industry? And what are perhaps some other unique challenges that the food and beverage industry faces when you’re implementing AI and machine learning innovations? And how do you navigate challenges like that?

Jorge: Yeah, I think something that is very unique for us is that we always have to deal with an incomplete picture in terms of the data that we have in our consumers. So if you think about it, when you go into a grocery store, a couple of things, well, you are buying from that store, the Kroger’s, Walmart’s, et cetera, and some of those will have you identified in terms of what is your consumption patterns, some will not. But also, in our case, if you are going to go buy a Philadelphia [cream cheese], for example, you may choose to buy your Philadelphia in multiple outlets. Sometimes you want more and you go to Costco, sometimes you need less, in my case, I live in the Chicago land area, I go to a Jewel supermarket.

We always have to deal with incomplete data on our customers, and that is a challenge because what we are trying to figure out is how to better serve our consumers based on what product you like, where you’re buying them, what is the right price point for you, but we’re always dealing with data that is incomplete. So in this case, having a clear data strategy around what we have there and a clear understanding of the markets that we have out there so we can really grab that incomplete data that we have out there and still come up with the right actions in terms of what are the right products to put, just to give you an example, a clear example of it is… And I’m going back to Philadelphia because, by the way, that’s my favorite Kraft product ever…

Laurel: Philadelphia cream cheese, right?

Jorge: Yes, absolutely. It’s followed by a close second with our ketchup. I have a soft spot for Philadelphia, pun intended.

Laurel: – and the ketchup.

Jorge: Exactly. No, but you have different presentations. You have the spreadable, you have the brick of cream cheese, within the brick you have some flavors, and what we want to do is make sure that we are providing the flavors that people really want, not producing the ones that people don’t want, because that’s just waste, without knowing specifically who is buying on the other side and you want to buy it in a supermarket, one or two, or sometimes you are shifting. But those are the things that we are constantly on the lookout for, and obviously dealing with the reality about, hey, data is going to be incomplete. We don’t know exactly and we don’t even want to know exactly what people are doing in their daily lives. What we want is just to get enough of the data so we can provide the right product for our consumers.

Laurel: And an example like cream cheese and ketchup probably, especially if a kid is in the house, it’s one of those products that you use on a fairly daily basis. So knowing all of this, how does Kraft Heinz prepare data for AI projects, because that in itself is a project? So what are the first steps to get ready for AI?

Jorge: One thing that we have been pretty successful on is what I would call the potluck approach for data. Meaning that individual projects, individual groups are focused on delivering a very specific use case, and that is the right thing to do. When you are dealing with a project in supply chain and you’re trying just to, for example, say, “Hey, I want to optimize my CFR,” you are really not going to be caring that much about what sales wants to do. However, if you implement a potluck approach, meaning that, okay, you need data from somebody else, and it’s very likely that you have data to offer because that’s part of your use case. So the potluck approach means that if you want to try out the food of somebody else, you need to bring your own to the table. So if you do that, what starts happening is your data, your enterprise data, becomes little by little more accessible, and if you do it right eventually you pretty much have a lot and almost everything in there.

That is one thing that I will strongly advise people to do. Think big, think strategically, but act tactically, act knowing that individual projects, they’re going to have more limited scope, but if you establish certain practices around sharing around how data should be managed, then each individual projects are going to be contributing to the larger strategy without the largest strategy being a burden for the individual projects, if that makes sense.

Laurel: Sure.

Jorge: So at least for us that has been pretty successful over time. So we have data challenges absolutely as everybody else has, but at least from what I’ve been able to hear from other people, but Kraft Heinz is in a good place in terms of that availability. Because once you reach a certain critical mass, what ends up happening is there’s no need to bring additional data, you are always now reusing it because data is large but it’s finite. So it’s not infinite. It’s not something that’s going to grow forever. If you do it right, you should see that eventually, you don’t need to bring in more and more data. You just need to fine-tune and really leverage the data that you have, probably be more granular, and probably get it faster. That’s a good signal. I have the data, but I need it faster because I need to act on it. Great, you’re on the right track. And also your associated cost around data should reflect that. It should not grow to infinity. Data is large but is finite.

Laurel: So speaking of getting data quickly and making use of it, how does Kraft Heinz use compute power and the cloud scaling ability for AI projects? How do you see these two strategies coming together?

Jorge: Definitely the technology has come a long way in the last few years, because what cloud is offering is more of that flexibility, and it’s removing a lot of the limitations, both in terms of the scale and performance we used to have. So to give you an example, a few years back I had to worry about “Do I have enough storage in my servers to host all the data that we are getting in?” And then if I didn’t, how long is it going to take for me to add another server? With the cloud as an enabler, that’s no longer an issue. It’s a few lines of code and you get what you need. Also, especially on the data side, some of the more modern technologies, talking about Snowflake or BigQuery, enable you to separate your compute from your storage. What it basically means in practical terms is you don’t have people fighting over limited compute power.

So data can be the same for everyone and everybody can be accessing the data without having to overlap each other and then fighting about, oh, if you run this, I cannot run that, and then we have all sorts of problems so definitely what the cloud allowed us to do is get out of the way in terms of the technology as a limitation. And the great thing that happened down there now with all the AI projects is now you could focus on actually delivering on the use cases that you have without having to have limitations around “how am I going to scale?”. That is no longer the case. You have to worry about costs because it could cost you an arm and a leg, but not necessarily around how to scale and how long it’s going to take you to scale.

The agility of the whole thing increases exponentially because what used to take months, now can be done in a matter of seconds via code. So definitely I see how all of this explosion around analytics, around AI is possible, because of cloud really powering all of these initiatives that are popping up left, right, and center.

Laurel: And speaking about this, you can’t really go it alone, so how do partners like Infosys help bring in those new skills and industry know-how to help build the overall digital strategy for data, AI, cloud, and whatever comes next?

Jorge: Much in the same way that I think cloud has been an enabler in terms of this, I think companies and partners like Infosys are also that kind of enablers, because, in a way, they are part of what I would call an expertise ecosystem. I don’t think any company nowadays can do any of this on its own. You need partners. You need partners that both are bringing in new ideas, new technology, but also they are bringing in the right level of expertise in terms of people that you need, and in a global sense, at least for us, having someone that has a global footprint is important because we are a global company. So I will say that it’s the same thing that we talked about earlier about cloud being an enabler: that expert ecosystem represented by companies like Infosys is just another key enabler without which you will really struggle to deliver. So that’s what I’ll probably say to anyone that is listening right now, make sure that your ecosystem, your expert ecosystem is good and is thriving and you have the right partners for the right job.

Laurel: When you think about the future and also all these tough problems that you’re tackling at Kraft Heinz, how important will something like synthetic data be to your data strategy and business strategy as well? What is synthetic data? And then what are some of those challenges associated with using it to fill in the gaps for real-world data?

Jorge: In our case, we don’t use a lot of synthetic data nowadays because at least from the areas that we have holes to fill in terms of data is something that we’ve been dealing with for a while. So we are, let’s put it this way, already familiar on how to produce and fill in the gaps using some of the synthetic data techniques, but not really to the same extent as other organizations are. So we are still looking for opportunities when that is the case in terms of what we need to use and leverage synthetic data, but it’s not something that least for Kraft Heinz and CPG at all we use extensively in multiple places as other organizations are.

Laurel: And so, lastly, when you think ahead to the future, what will the digital operating model for an AI-first firm that’s focused on data look like? What do you see for the future?

Jorge: What I see for the future is, well, first of all, uncertainty, meaning that I don’t think we can predict exactly what’s going to happen because the area in particular is growing and evolving at a speed that I think is just honestly dazzling just because of the major things. I think at least what I would say is the real muscle that we need to be exercising and be ready for is adaptability. Meaning that we can learn, we can react, and apply all of the new things that are coming in hopefully at the same speed that they’re occurring and really leveraging new opportunities when they present themselves in an agile way. But at least from the how to prepare for it I think it’s more about preparing the organization, your team, to be ready for that, really act on it, and be ready also to understand the specific business challenges that are there, and look for opportunities where any of the new things or maybe existences that are happening can be applied to solve a specific problem.

We are a CPG company, and that means the right product, right price, right location, so anything boils down to how can I be better in those three dimensions leveraging whatever is available today, whatever’s going to be available tomorrow. But keep focusing on, at least for us, we are a CPG company, we manufacture in Philadelphia, we manufacture ketchup, we feed people. Our mission is to delight people via food. And the technology, AI or what have you, is our tool to excel at our mission. Being able to learn how to leverage existing and future to get the right product at the right price at the right location is what we are all about.

Laurel: That’s fantastic. Thank you so much, Jorge. I appreciate you being with us today on the Business Lab.

Jorge: Thank you very much. Thank you for inviting me.

Laurel: That was Jorge Balestra, global head of machine learning operations and platforms at Kraft Heinz Company, who I spoke with from Cambridge, Massachusetts, the home of MIT and MIT Technology Review.

That’s it for this episode of Business Lab. I’m your host, Laurel Ruma, I’m the director of insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology, and you can also find us in print, on the web, and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

 This show is available wherever you get your podcasts. If you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review. This episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Why Big Tech’s watermarking plans are some welcome good news

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

This week I am happy to bring you some encouraging news from the world of AI. Following the depressing Taylor Swift deepfake porn scandal and the proliferation of political deepfakes, such as AI-generated robocalls of President Biden asking voters to stay home, tech companies are stepping up and putting into place measures to better detect AI-generated content. 

On February 6, Meta said it was going to label AI-generated images on Facebook, Instagram, and Threads. When someone uses Meta’s AI tools to create images, the company will add visible markers to the image, as well as invisible watermarks and metadata in the image file. The company says its standards are in line with best practices laid out by the Partnership on AI, an AI research nonprofit.

Big Tech is also throwing its weight behind a promising technical standard that could add a “nutrition label” to images, video, and audio. Called C2PA, it’s an open-source internet protocol that relies on cryptography to encode details about the origins of a piece of content, or what technologists refer to as “provenance” information. The developers of C2PA often compare the protocol to a nutrition label, but one that says where content came from and who—or what—created it. Read more about it here

On February 8, Google announced it is joining other tech giants such as Microsoft and Adobe in the steering committee of C2PA and will include its watermark SynthID in all AI-generated images in its new Gemini tools. Meta says it is also participating in C2PA. Having an industry-wide standard makes it easier for companies to detect AI-generated content, no matter which system it was created with.

OpenAI too announced new content provenance measures last week. It says it will add watermarks to the metadata of images generated with ChatGPT and DALL-E 3, its image-making AI. OpenAI says it will now include a visible label in images to signal they have been created with AI. 

These methods are a promising start, but they’re not foolproof. Watermarks in metadata are easy to circumvent by taking a screenshot of images and just using that, while visual labels can be cropped or edited out. There is perhaps more hope for invisible watermarks like Google’s SynthID, which subtly changes the pixels of an image so that computer programs can detect the watermark but the human eye cannot. These are harder to tamper with. What’s more, there aren’t reliable ways to label and detect AI-generated video, audio, or even text. 

But there is still value in creating these provenance tools. As Henry Ajder, a generative-AI expert, told me a couple of weeks ago when I interviewed him about how to prevent deepfake porn, the point is to create a “perverse customer journey.” In other words, add barriers and friction to the deepfake pipeline in order to slow down the creation and sharing of harmful content as much as possible. A determined person will likely still be able to override these protections, but every little bit helps. 

There are also many nontechnical fixes tech companies could introduce to prevent problems such as deepfake porn. Major cloud service providers and app stores, such as Google, Amazon, Microsoft, and Apple could move to ban services that can be used to create nonconsensual deepfake nudes. And watermarks should be included in all AI-generated content across the board, even by smaller startups developing the technology.

What gives me hope is that alongside these voluntary measures we’re starting to see binding regulations, such as the EU’s AI Act and the Digital Services Act, which require tech companies to disclose AI-generated content and take down harmful content faster. There’s also renewed interest among US lawmakers in passing some binding rules on deepfakes. And following AI-generated robocalls of President Biden telling voters not to vote, the US Federal Communications Commission announced last week that it was banning the use of AI in these calls. 

In general I’m pretty skeptical about voluntary guidelines and rules, because there’s no real accountability mechanism and companies can choose to change these rules whenever they want. The tech sector has a really bad track record for regulating itself. In the cutthroat, growth-driven tech world, things like responsible AI are often the first to face cuts. 

But despite that, these announcements are extremely welcome. They’re also much better than the status quo, which is next to nothing. 

Deeper Learning

Google’s Gemini is now in everything. Here’s how you can try it out.

In the biggest mass-market AI launch yet, Google is rolling out Gemini, its family of large language models, across almost all its products, from Android to the iOS Google app to Gmail to Docs and more. You can now get your hands on Gemini Ultra, the most powerful version of the model, for the first time. 

Bard is dead; long live Gemini: Google is also sunsetting Bard, its ChatGPT rival. Bard, which has been powered by a version of Gemini since December, will now be known as Gemini too. By baking Gemini into its ubiquitous tools, Google is hoping to make up lost ground and even overtake its rival OpenAI. Read more from Will Douglas Heaven

Bits and Bytes

A chatbot helped more people access mental-health services
An AI chatbot from a startup called Limbic helped increase the number of patients referred for mental-health services through England’s National Health Service (particularly among members of underrepresented groups, who are less likely to seek help), new research has found. (MIT Technology Review

This robot can tidy a room without any help
A new system called OK-Robot could train robots to pick up and move objects in settings they haven’t encountered before. It’s an approach that might be able to plug the gap between rapidly improving AI models and actual robot capabilities, because it doesn’t require any additional costly, complex training. (MIT Technology Review

Inside OpenAI’s plan to make AI more “democratic”
This feature looks at how computer scientists at OpenAI are trying to address the technical problem of how to align their AIs to human values. But a bigger question remains unanswered: Exactly whose values should AI reflect? And who should get to decide? 
(Time

OpenAI’s Sam Altman wants trillions to build chips for AI
The CEO has often complained that the company does not have enough computing power to train and run its powerful AI models. Altman is reportedly talking with investors in the United Arab Emirates government to raise up to $7 trillion to boost the world’s chip-building capacity. (The Wall Street Journal

A new app to “dignify” women
Ugh. In contrast to apps that sexualize images of women, some 4Chan users are using generative AI to add clothes, erase their tattoos and piercings, and make them look more modest. How about … we just leave women alone. (404 Media

Google’s Gemini is now in everything. Here’s how you can try it out.

In the biggest mass-market AI launch yet, Google is rolling out Gemini, its family of large language models, across almost all its products, from Android to the iOS Google app to Gmail to Docs and more. You can now get your hands on Gemini Ultra, the most powerful version of the model, for the first time.  

Google is also sunsetting Bard, its ChatGPT rival. Bard, which has been powered by a version of Gemini since December, will now be known as Gemini too.  

ChatGPT, released by Microsoft-backed OpenAI just 14 months ago, changed people’s expectations of what computers could do. Google, which has been racing to catch up ever since, unveiled its Gemini family of models in December. They are multimodal large language models that can interact with you via voice, image, and text. Google claimed that its own benchmarking showed Gemini outperforming GPT-4 on a range of standard tests. But the margins were slim. 

By baking Gemini into its ubiquitous tools, Google is hoping to make up lost ground and even overtake its rival.

“Every launch is big, but this one is the biggest yet,” Sissie Hsiao, Google vice president and general manager of Google Assistant and Bard (now Gemini), said in a press conference yesterday. “We think this is one of the most profound ways that we’re going to advance our company’s mission.”

But some will have to wait longer than others to play with Google’s new tools. The company has announced rollouts in the US and East Asia but said nothing about when the Android and iOS apps will come to the UK, the EU, and Switzerland. This may be because the company is waiting for the EU’s new AI Act to be set in stone, says Dragoș Tudorache, a Romanian politician and member of the European Parliament, who was a key negotiator on the law.

“We’re working with local regulators to make sure that we’re abiding by local regime requirements before we can expand,” Hsiao said. “Rest assured, we are absolutely working on it and I hope we’ll be able to announce expansion very, very soon.”

How can you get it? Gemini Pro, Google’s middle-tier model that has been available via Bard since December, will continue to be available for free on the web at gemini.google.com (instead of bard.google.com). But now there is a mobile app as well. If you have an Android device, you can either download the Gemini app or opt in to an upgrade in Google Assistant. And iOS users simply download the Google app, which will now include Gemini. This will let you call up Gemini in the same way that you use Google Assistant: by pressing the power button, swiping from the corner of the screen, or saying “Hey, Google!”

This brings up a Gemini overlay on your screen, where you can ask it questions or give it instructions about whatever’s on your phone at the time, such as summarizing an article or generating a caption for a photo.  

Finally, Google is launching a paid-for service called Gemini Advanced. This comes bundled in a subscription costing $19.99 a month that the company is calling the Google One Premium AI Plan. It combines the perks of the existing Google One Premium Plan, such as 2TB of extra storage, with access to Google’s most powerful model, Gemini Ultra, for the first time. This will compete with OpenAI’s offering, where for $20 a month ChatGPT Plus buys you access to GPT-4 rather than GPT-3.5.

At some point soon (Google didn’t say exactly when) this subscription will also unlock Gemini across Google’s Workspace apps like Docs, Sheets, and Slides, where it works as a smart assistant similar to the GPT-4-powered Copilot that Microsoft is trialing in Office 365.

When can you get it? The free Gemini app (powered by Gemini Pro) is available from today in English in the US. Starting next week, you’ll be able to access it across the Asia Pacific region in English and in Japanese and Korean. But there is no word on when the app will come to the UK, countries in the EU, or Switzerland.

Gemini Advanced (the paid-for service that gives access to Gemini Ultra) is available in English in more than 150 countries, including the UK and EU (but not France). Google says it is analyzing local requirements and fine-tuning Gemini for cultural nuance in different countries. But it claims that more languages and regions are coming.

What can you do with it? Google says it has developed its Gemini products with the help of more than 100 testers and power users. At the press conference yesterday, Google execs outlined a handful of use cases, such as getting Gemini to help write a cover letter for a job application. “This can help you come across as more professional and increase your relevance to recruiters,” said Google’s vice president for product management, Kristina Behr.

Or you could take a picture of your flat tire and ask Gemini how to fix it. A more elaborate example involved Gemini managing a snack rota for the parents of kids on a soccer team. Gemini would come up with a schedule for who should bring snacks and when, help you email other parents, and then field their replies. In future versions, Gemini will be able to draw on data in your Google Drive that could help manage carpooling around game schedules, Behr said.   

But we should expect users to find a lot more uses for these tools. “I’m really excited to see how people around the world are going to push the envelope on this AI,” Hsaio said.

Is it safe? Google has been working hard to make sure its slick products are safe to use. But no amount of testing can anticipate all the ways that tech will get used and misused once it is released. In the last few months, Meta saw people use its image-making app to produce pictures of Mickey Mouse with guns and SpongeBob SquarePants flying a jet into two towers. Others used Microsoft’s image-making software to create fake pornographic images of Taylor Swift.

The AI Act aims to mitigate some—but not all—of these problems. For example, it requires the makers of powerful AI like Gemini to build in safeguards, such as watermarking for generated images and steps to avoid reproducing copyrighted material. Google says that all images generated by its products will include its SynthID watermarks. 

Like most companies, Google was knocked onto the back foot when ChatGPT arrived. Microsoft’s partnership with OpenAI has given it a boost over its old rival. But with Gemini, Google has come back strong: this is the slickest packaging of this generation’s tech yet. 

Correction: we made it clearer that you will need a subscription to access Gemini in Docs and Gmail.

Google’s Gemini is now in everything. Here’s how you can try it out.

In the biggest mass-market AI launch yet, Google is rolling out Gemini, its family of large language models, across almost all its products, from Android to the iOS Google app to Gmail to Docs and more. You can now get your hands on Gemini Ultra, the most powerful version of the model, for the first time.  

Google is also sunsetting Bard, its ChatGPT rival. Bard, which has been powered by a version of Gemini since December, will now be known as Gemini too.  

ChatGPT, released by Microsoft-backed OpenAI just 14 months ago, changed people’s expectations of what computers could do. Google, which has been racing to catch up ever since, unveiled its Gemini family of models in December. They are multimodal large language models that can interact with you via voice, image, and text. Google claimed that its own benchmarking showed Gemini outperforming GPT-4 on a range of standard tests. But the margins were slim. 

By baking Gemini into its ubiquitous tools, Google is hoping to make up lost ground and even overtake its rival.

“Every launch is big, but this one is the biggest yet,” Sissie Hsiao, Google vice president and general manager of Google Assistant and Bard (now Gemini), said in a press conference yesterday. “We think this is one of the most profound ways that we’re going to advance our company’s mission.”

But some will have to wait longer than others to play with Google’s new tools. The company has announced rollouts in the US and East Asia but said nothing about when the Android and iOS apps will come to the UK, the EU, and Switzerland. This may be because the company is waiting for the EU’s new AI Act to be set in stone, says Dragoș Tudorache, a Romanian politician and member of the European Parliament, who was a key negotiator on the law.

“We’re working with local regulators to make sure that we’re abiding by local regime requirements before we can expand,” Hsiao said. “Rest assured, we are absolutely working on it and I hope we’ll be able to announce expansion very, very soon.”

How can you get it? Gemini Pro, Google’s middle-tier model that has been available via Bard since December, will continue to be available for free on the web at gemini.google.com (instead of bard.google.com). But now there is a mobile app as well. If you have an Android device, you can either download the Gemini app or opt in to an upgrade in Google Assistant. And iOS users simply download the Google app, which will now include Gemini. This will let you call up Gemini in the same way that you use Google Assistant: by pressing the power button, swiping from the corner of the screen, or saying “Hey, Google!”

This brings up a Gemini overlay on your screen, where you can ask it questions or give it instructions about whatever’s on your phone at the time, such as summarizing an article or generating a caption for a photo.  

Finally, Google is launching a paid-for service called Gemini Advanced. This comes bundled in a subscription costing $19.99 a month that the company is calling the Google One Premium AI Plan. It combines the perks of the existing Google One Premium Plan, such as 2TB of extra storage, with access to Google’s most powerful model, Gemini Ultra, for the first time. This will compete with OpenAI’s offering, where for $20 a month ChatGPT Plus buys you access to GPT-4 rather than GPT-3.5.

At some point soon (Google didn’t say exactly when) this subscription will also unlock Gemini across Google’s Workspace apps like Docs, Sheets, and Slides, where it works as a smart assistant similar to the GPT-4-powered Copilot that Microsoft is trialing in Office 365.

When can you get it? The free Gemini app (powered by Gemini Pro) is available from today in English in the US. Starting next week, you’ll be able to access it across the Asia Pacific region in English and in Japanese and Korean. But there is no word on when the app will come to the UK, countries in the EU, or Switzerland.

Gemini Advanced (the paid-for service that gives access to Gemini Ultra) is available in English in more than 150 countries, including the UK and EU (but not France). Google says it is analyzing local requirements and fine-tuning Gemini for cultural nuance in different countries. But it claims that more languages and regions are coming.

What can you do with it? Google says it has developed its Gemini products with the help of more than 100 testers and power users. At the press conference yesterday, Google execs outlined a handful of use cases, such as getting Gemini to help write a cover letter for a job application. “This can help you come across as more professional and increase your relevance to recruiters,” said Google’s vice president for product management, Kristina Behr.

Or you could take a picture of your flat tire and ask Gemini how to fix it. A more elaborate example involved Gemini managing a snack rota for the parents of kids on a soccer team. Gemini would come up with a schedule for who should bring snacks and when, help you email other parents, and then field their replies. In future versions, Gemini will be able to draw on data in your Google Drive that could help manage carpooling around game schedules, Behr said.   

But we should expect users to find a lot more uses for these tools. “I’m really excited to see how people around the world are going to push the envelope on this AI,” Hsaio said.

Is it safe? Google has been working hard to make sure its slick products are safe to use. But no amount of testing can anticipate all the ways that tech will get used and misused once it is released. In the last few months, Meta saw people use its image-making app to produce pictures of Mickey Mouse with guns and SpongeBob SquarePants flying a jet into two towers. Others used Microsoft’s image-making software to create fake pornographic images of Taylor Swift.

The AI Act aims to mitigate some—but not all—of these problems. For example, it requires the makers of powerful AI like Gemini to build in safeguards, such as watermarking for generated images and steps to avoid reproducing copyrighted material. Google says that all images generated by its products will include its SynthID watermarks. 

Like most companies, Google was knocked onto the back foot when ChatGPT arrived. Microsoft’s partnership with OpenAI has given it a boost over its old rival. But with Gemini, Google has come back strong: this is the slickest packaging of this generation’s tech yet. 

Correction: we made it clearer that you will need a subscription to access Gemini in Docs and Gmail.

A chatbot helped more people access mental-health services

An AI chatbot helped increase the number of patients referred for mental-health services through England’s National Health Service (NHS), particularly among underrepresented groups who are less likely to seek help, new research has found.

Demand for mental-health services in England is on the rise, particularly since the covid-19 pandemic. Mental-health services received 4.6 million patient referrals in 2022—the highest number on record—and the number of people in contact with such services is growing steadily. But neither the funding nor the number of mental-health professionals is adequate to meet this rising demand, according to the British Medical Association.  

The chatbot’s creators, from the AI company Limbic, set out to investigate whether AI could lower the barrier to care by helping patients access help more quickly and efficiently.

A new study, published today in Nature Medicine, evaluated the effect that the chatbot, called Limbic Access, had on referrals to the NHS Talking Therapies for Anxiety and Depression program, a series of evidence-based psychological therapies for adults experiencing anxiety disorders, depression, or both.  

It examined data from 129,400 people visiting websites to refer themselves to 28 different NHS Talking Therapies services across England, half of which used the chatbot on their website and half of which used other data-collecting methods such as web forms. The number of referrals from services using the Limbic chatbot rose by 15% during the study’s three-month time period, compared with a 6% rise in referrals for the services that weren’t using it.  

Referrals among minority groups, including ethnic and sexual minorities, grew significantly when the chatbot was available—rising 179% among people who identified as nonbinary, 39% for Asian patients, and 40% for Black patients. 

Crucially, the report’s authors said that the higher numbers of patients being referred for help from the services did not increase waiting times or cause a reduction in the number of clinical assessments being performed. That’s because the detailed information the chatbot collected reduced the amount of time human clinicians needed to spend assessing patients, while improving the quality of the assessments and freeing up other resources.

It’s worth bearing in mind that an interactive chatbot and a static web form are very different methods of gathering information, points out John Torous, director of the digital psychiatry division at Beth Israel Deaconess Medical Center in Massachusetts, who was not involved in the study.

“In some ways, this is showing us where the field may be going—that it’ll be easier to reach people to screen them, regardless of the technology,” he says. “But it does beg the question of what type of services are we going to be offering people, and how do we allocate those services?”

Overall, patients who’d used the chatbot and provided positive feedback to Limbic mentioned its ease and convenience. They also said that the referral made them feel more hopeful about getting better or helped them know they were not alone. Nonbinary respondents mentioned the non-human nature of the chatbot more frequently than patients who identified as male or female, which may suggest that interacting with the bot helped avoid feelings of judgment, stigma, or anxiety that can be triggered by speaking to a person.

“Seeing proportionally greater improvements from individuals in minority communities across gender, sexual, and ethnic minorities, who are typically hard-to-reach individuals, was a really exciting finding,” says Ross Harper, Limbic’s founder and CEO, who coauthored the research. “It shows that in the right hands, AI can be a powerful tool for equity and inclusion.”

Visitors to the chatbot-enabled websites were met with a pop-up explaining that Limbic is a robotic assistant designed to help them access psychological support. As part of an initial evidence-based screening process, the chatbot asks a series of questions, including whether the patient has any long-term medical conditions or former diagnoses from mental-health professionals. It follows these with multiple questions designed to measure symptoms of common mental-health issues and anxiety, tailoring its questioning to the symptoms most relevant to the patient’s problems.

The chatbot uses the data it collects to create a detailed referral, which it shares with the electronic record system the service uses. A human care professional can then access that referral and contact the patient within a couple of days to make an assessment and start treatment.

Limbic’s chatbot is a combination of different kinds of AI models. The first uses natural-language processing to analyze a patient’s typed responses and provide appropriate, empathetic answers. Probabilistic models take the data the patient has entered and use it to tailor the chatbot’s responses in line with the patient’s most likely mental-health problem. These models are capable of classifying eight common mental-health issues with 93% accuracy, the report’s authors said.

“There aren’t enough mental-health professionals, so we want to use AI to amplify what we do have,” adds Harper. “That collaboration between human specialists and an AI specialist—that’s where we’ll really solve the supply-demand imbalance in mental health.”

This robot can tidy a room without any help

Robots are good at certain tasks. They’re great at picking up and moving objects, for example, and they’re even getting better at cooking.

But while robots may easily complete tasks like these in a laboratory, getting them to work in an unfamiliar environment where there’s little data available is a real challenge.

Now, a new system called OK-Robot could train robots to pick up and move objects in settings they haven’t encountered before. It’s an approach that might be able to plug the gap between rapidly improving AI models and actual robot capabilities, as it doesn’t require any additional costly, complex training.

To develop the system, researchers from New York University and Meta tested Stretch, a commercially available robot made by Hello Robot that consists of a wheeled unit, a tall pole, and a retractable arm, in a total of 10 rooms in five homes. 

While in a room with the robot, a researcher would scan their surroundings using Record3D, an iPhone app that uses the phone’s lidar system to take a 3D video to share with the robot. 

The OK-Robot system then ran an open-source AI object detection model over the video’s frames. This, in combination with other open-source models, helped the robot identify objects in that room like a toy dragon, a tube of toothpaste, and a pack of playing cards, as well as locations around the room including a chair, a table, and a trash can.

The team then instructed the robot to pick up a specific item and move it to a new location. The robot’s pincer arm did this successfully in 58.5% of cases; the success rate rose to 82% in rooms that were less cluttered. (Their research has not yet been peer reviewed.)

The recent AI boom has led to enormous leaps in language and computer vision capabilities, allowing robotics researchers access to open-source AI models and tools that didn’t exist even three years ago, says Matthias Minderer, a senior computer vision research scientist at Google DeepMind, who was not involved in the project.

“I would say it’s quite unusual to be completely reliant on off-the-shelf models, and that it’s quite impressive to make them work,” he says.

“We’ve seen a revolution in machine learning that has made it possible to create models that work not just in laboratories, but in the open world,” he adds. “Seeing that this actually works in a real physical environment is very useful information.”

Because the researchers’ system used models that weren’t fine-tuned to this particular project, when the robot couldn’t find the object it was instructed to look for it simply stopped in its tracks instead of trying to work out a solution. That significant limitation is one reason the robot was more likely to succeed in tidier environments—fewer objects meant fewer chances for confusion, and a clearer space for navigation.

Using ready-made open-source models was both a blessing and a curse, says Lerrel Pinto, an assistant professor of computer science at New York University, who co-led the project. 

“On the positive side, you don’t have to give the robot any additional training data in the environment, it just works,” he says. “On the con side, it can only pick an object up and drop it somewhere else. You can’t ask it to open a drawer, because it only knows how to do those two things.” 

Combining OK-Robot with voice recognition models could allow researchers to deliver instructions simply by speaking to the robot, making it easier for them to experiment with readily available datasets, says Mahi Shafiullah, a PhD student at New York University who co-led the research.

“There is a very pervasive feeling in the [robotics] community that homes are difficult, robots are difficult, and combining homes and robots is just completely impossible,” he says. “I think once people start believing home robots are possible, a lot more work will start happening in this space.”

The next generation of mRNA vaccines is on its way

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

Welcome back to The Checkup! Today I want to talk about … mRNA vaccines.

I can hear the collective groan from here, but wait—hear me out! I know you’ve heard a lot about mRNA vaccines, but Japan recently approved a new one for covid. And this one is pretty exciting. Just like the mRNA vaccines you know and love, it delivers the instructions for making the virus’s spike protein. But here’s what makes it novel: it also tells the body how to make more mRNA. Essentially, it provides instructions for making more instructions. It’s self-amplifying.

I’ll wait while your head explodes.

Self-amplifying RNA vaccines (saRNA) offer a couple of important advantages over conventional mRNA vaccines, at least in theory. Because saRNA vaccines come with a built-in photocopier, the dose can be much lower. One team of researchers tested both an mRNA vaccine and an saRNA vaccine in mice and found that they could achieve equivalent levels of protection against influenza with just 1/64th the dose. Second, it’s possible that saRNA vaccines will induce a more durable immune response because the RNA keeps copying itself and  sticks around longer. While mRNA might last a day or two, self-amplifying RNA can persist for a month.

Lest you think that this is just a tweaked version of conventional mRNA, It’s not. “saRNA is a totally different beast,” Anna Blakney, a bioengineer at the University of British Columbia, told Nature. (Blakney was one of our 35 Innovators Under 35 in 2023.)

What makes it a different beast? Conventional mRNA vaccines consist of messenger RNA that carries the genetic code for covid’s spike protein. Once that mRNA enters the body, it gets translated into proteins by the same cellular machinery that translates our own messenger RNA. 

Self-amplifying mRNA vaccines contain a gene that encodes the spike protein as well as viral genes that code for replicase, the enzyme that serves as a photocopier. So one self-amplifying mRNA molecule can produce many more. The idea of a vaccine that copies itself in the body might sound a little, well, unnerving. But there are a few things I should make clear. Although the genes that give these vaccines the ability to self-amplify come from viruses, they don’t encode the information needed to make the virus itself. So saRNA vaccines can’t produce new viruses. And just like mRNA, saRNA degrades quickly in the body. It lasts longer than mRNA, but it doesn’t amplify forever. 

Japan approved the new vaccine, called LUNAR-COV19, in late November on the basis of results from a 16,000-person trial in Vietnam. Last month researchers published results of a head-to-head comparison between LUNAR-COV19 and Comirnaty, the mRNA vaccine from Pfizer-BioNTech. In that 800-person study, vaccinated participants received either five  micrograms of LUNAR-COV19 or 30 micrograms of Comirnaty as a fourth dose booster. Reactions to both shots tended to be mild and resolve quickly. But the self-amplifying mRNA shot did elicit antibodies in a greater percentage of people than Comirnaty. And a month out, antibody levels against Omicron BA.4/5 were higher in people who received LUNAR-COV19. That could be a signal of increased durability.

The company has already filed for approval in Europe. It’s also working on a self-amplifying mRNA vaccine for flu, both seasonal and pandemic. Other companies are exploring the possibility that self-amplifying mRNA might be useful in rare genetic conditions to replace missing proteins. Arcturus, the company that co-developed LUNAR-COV19 with the global biotech CSL, is also developing self-amplifying messenger RNA to treat ornithine transcarbamylase deficiency, a rare and life-threatening genetic disease. It’s an mRNA bonanza that will hopefully lead to better vaccines and new therapies. 

Another thing

Babies and AI learn language in very different ways. The former rely on a relatively small set of experiences. The latter relies on data sets that encompass a trillion words. But this week I wrote about a new study that shows AI can learn language like a baby—at least some aspects of language. The researchers found that a neural network trained on things a single child saw and heard over the course of a year and a half could learn to match words to the objects they represent. Here’s the story. 

Read more from MIT Technology Review’s archive

mRNA vaccines helped tackle covid, but they can help with so much more—malaria, HIV, TB, Zika, even cancer. Jessica Hamzelou wrote about their potential in January, and I followed up with a story after two mRNA researchers won a Nobel Prize. 

Using self-amplifying RNA isn’t the only way to make mRNA vaccines more powerful. Researchers are tweaking them in other ways that might help boost the immune response, writes Anne Trafton

From around the web

Elon Musk says his company Neuralink has implanted a brain chip in a person for the first time. The device is designed to allow people to control external devices like smartphones and computers with their thoughts. (Washington Post)

In August I  wrote about Vertex’s quest to develop a non-opioid pain pill. This week the company announced positive results from phase 3 trials. The company expects to seek regulatory approval in the coming months, and if approved, the drug is likely to become a blockbuster. (Stat)

In some rare cases, it appears that Alzheimer’s can be transmitted from one person to another. That’s the conclusion of a new study: it found that eight people who received growth hormone from the brains of cadavers before the 1980s had sticky beta-amyloid plaques in their brains, a hallmark of the disease. The growth hormone they received also contained these proteins. And when researchers injected these proteins into mice, the mice also developed amyloid plaques. (Science)