App Artificial intelligence

Feb 1 2025

DeepSeek might not be such good news for energy after all

In the week since a Chinese AI model called DeepSeek became a household name, a dizzying number of narratives have gained steam, with varying degrees of accuracy: that the model is collecting your personal data (maybe); that it will upend AI as we know it (too soon to tell—but do read my colleague Will’s story on that!); and perhaps most notably, that DeepSeek’s new, more efficient approach means AI might not need to guzzle the massive amounts of energy that it currently does.

The latter notion is misleading, and new numbers shared with MIT Technology Review help show why. These early figures—based on the performance of one of DeepSeek’s smaller models on a small number of prompts—suggest it could be more energy intensive when generating responses than the equivalent-size model from Meta. The issue might be that the energy it saves in training is offset by its more intensive techniques for answering questions, and by the long answers they produce.

Add the fact that other tech firms, inspired by DeepSeek’s approach, may now start building their own similar low-cost reasoning models, and the outlook for energy consumption is already looking a lot less rosy.

The life cycle of any AI model has two phases: training and inference. Training is the often months-long process in which the model learns from data. The model is then ready for inference, which happens each time anyone in the world asks it something. Both usually take place in data centers, where they require lots of energy to run chips and cool servers.

On the training side for its R1 model, DeepSeek’s team improved what’s called a “mixture of experts” technique, in which only a portion of a model’s billions of parameters—the “knobs” a model uses to form better answers—are turned on at a given time during training. More notably, they improved reinforcement learning, where a model’s outputs are scored and then used to make it better. This is often done by human annotators, but the DeepSeek team got good at automating it.

The introduction of a way to make training more efficient might suggest that AI companies will use less energy to bring their AI models to a certain standard. That’s not really how it works, though.

“⁠Because the value of having a more intelligent system is so high,” wrote Anthropic cofounder Dario Amodei on his blog, it “causes companies to spend more, not less, on training models.” If companies get more for their money, they will find it worthwhile to spend more, and therefore use more energy. “The gains in cost efficiency end up entirely devoted to training smarter models, limited only by the company’s financial resources,” he wrote. It’s an example of what’s known as the Jevons paradox.

But that’s been true on the training side as long as the AI race has been going. The energy required for inference is where things get more interesting.

DeepSeek is designed as a reasoning model, which means it’s meant to perform well on things like logic, pattern-finding, math, and other tasks that typical generative AI models struggle with. Reasoning models do this using something called “chain of thought.” It allows the AI model to break its task into parts and work through them in a logical order before coming to its conclusion.

You can see this with DeepSeek. Ask whether it’s okay to lie to protect someone’s feelings, and the model first tackles the question with utilitarianism, weighing the immediate good against the potential future harm. It then considers Kantian ethics, which propose that you should act according to maxims that could be universal laws. It considers these and other nuances before sharing its conclusion. (It finds that lying is “generally acceptable in situations where kindness and prevention of harm are paramount, yet nuanced with no universal solution,” if you’re curious.)

Chain-of-thought models tend to perform better on certain benchmarks such as MMLU, which tests both knowledge and problem-solving in 57 subjects. But, as is becoming clear with DeepSeek, they also require significantly more energy to come to their answers. We have some early clues about just how much more.

Scott Chamberlin spent years at Microsoft, and later Intel, building tools to help reveal the environmental costs of certain digital activities. Chamberlin did some initial tests to see how much energy a GPU uses as DeepSeek comes to its answer. The experiment comes with a bunch of caveats: He tested only a medium-size version of DeepSeek’s R-1, using only a small number of prompts. It’s also difficult to make comparisons with other reasoning models.

DeepSeek is “really the first reasoning model that is fairly popular that any of us have access to,” he says. OpenAI’s o1 model is its closest competitor, but the company doesn’t make it open for testing. Instead, he tested it against a model from Meta with the same number of parameters: 70 billion.

The prompt asking whether it’s okay to lie generated a 1,000-word response from the DeepSeek model, which took 17,800 joules to generate—about what it takes to stream a 10-minute YouTube video. This was about 41% more energy than Meta’s model used to answer the prompt. Overall, when tested on 40 prompts, DeepSeek was found to have a similar energy efficiency to the Meta model, but DeepSeek tended to generate much longer responses and therefore was found to use 87% more energy.

How does this compare with models that use regular old-fashioned generative AI as opposed to chain-of-thought reasoning? Tests from a team at the University of Michigan in October found that the 70-billion-parameter version of Meta’s Llama 3.1 averaged just 512 joules per response.

Neither DeepSeek nor Meta responded to requests for comment.

Again: uncertainties abound. These are different models, for different purposes, and a scientifically sound study of how much energy DeepSeek uses relative to competitors has not been done. But it’s clear, based on the architecture of the models alone, that chain-of-thought models use lots more energy as they arrive at sounder answers.

Sasha Luccioni, an AI researcher and climate lead at Hugging Face, worries that the excitement around DeepSeek could lead to a rush to insert this approach into everything, even where it’s not needed.

“If we started adopting this paradigm widely, inference energy usage would skyrocket,” she says. “If all of the models that are released are more compute intensive and become chain-of-thought, then it completely voids any efficiency gains.”

AI has been here before. Before ChatGPT launched in 2022, the name of the game in AI was extractive—basically finding information in lots of text, or categorizing images. But in 2022, the focus switched from extractive AI to generative AI, which is based on making better and better predictions. That requires more energy.

“That’s the first paradigm shift,” Luccioni says. According to her research, that shift has resulted in orders of magnitude more energy being used to accomplish similar tasks. If the fervor around DeepSeek continues, she says, companies might be pressured to put its chain-of-thought-style models into everything, the way generative AI has been added to everything from Google search to messaging apps.

We do seem to be heading in a direction of more chain-of-thought reasoning: OpenAI announced on January 31 that it would expand access to its own reasoning model, o3. But we won’t know more about the energy costs until DeepSeek and other models like it become better studied.

“It will depend on whether or not the trade-off is economically worthwhile for the business in question,” says Nathan Benaich, founder and general partner at Air Street Capital. “The energy costs would have to be off the charts for them to play a meaningful role in decision-making.”

Ecommerce MGMT 0 Comments

App Artificial intelligence The Algorithm

Jan 29 2025

AI’s energy obsession just got a reality check

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Just a week in, the AI sector has already seen its first battle of wits under the new Trump administration. The clash stems from two key pieces of news: the announcement of the Stargate project, which would spend $500 billion—more than the Apollo space program—on new AI data centers, and the release of a powerful new model from China. Together, they raise important questions the industry needs to answer about the extent to which the race for more data centers—with their heavy environmental toll—is really necessary.

A reminder about the first piece: OpenAI, Oracle, SoftBank, and an Abu Dhabi–based investment fund called MGX plan to spend up to $500 billion opening massive data centers around the US to build better AI. Much of the groundwork for this project was laid in 2024, when OpenAI increased its lobbying spending sevenfold (which we were first to report last week) and AI companies started pushing for policies that were less about controlling problems like deepfakes and misinformation, and more about securing more energy.

Still, Trump received credit for it from tech leaders when he announced the effort on his second day in office. “I think this will be the most important project of this era,” OpenAI’s Sam Altman said at the launch event, adding, “We wouldn’t be able to do this without you, Mr. President.”

It’s an incredible sum, just slightly less than the inflation-adjusted cost of building the US highway system over the course of more than 30 years. However, not everyone sees Stargate as having the same public benefit. Environmental groups say it could strain local grids and further drive up the cost of energy for the rest of us, who aren’t guzzling it to train and deploy AI models. Previous research has also shown that data centers tend to be built in areas that use much more carbon-intensive sources of energy, like coal, than the national average. It’s not clear how much, if at all, Stargate will rely on renewable energy.

Even louder critics of Stargate, though, include Elon Musk. None of Musk’s companies are involved in the project, and he has attempted to publicly sow doubt that OpenAI and SoftBank have enough of the money needed for the plan anyway, claims that Altman disputed on X. Musk’s decision to publicly criticize the president’s initiative has irked people in Trump’s orbit, Politico reports, but it’s not clear if those people have expressed that to Musk directly.

On to the second piece. On the day Trump was inaugurated, a Chinese startup released an AI model that started making a whole bunch of important people in Silicon Valley very worried about their competition. (This close timing is almost certainly not an accident.)

The model, called DeepSeek R1, is a reasoning model. These types of models are designed to excel at math, logic, pattern-finding, and decision-making. DeepSeek proved it could “reason” through complicated problems as well as one of OpenAI’s reasoning models, o1—and more efficiently. What’s more, DeepSeek isn’t a super-secret project kept behind lock and key like OpenAI’s. It was released for all to see.

DeepSeek was released as the US has made outcompeting China in the AI race a top priority. This goal was a driving force behind the 2022 CHIPS Act to make more chips domestically. It’s influenced the position of tech companies like OpenAI, which has embraced lending its models to national security work and has partnered with the defense-tech company Anduril to help the military take down drones. It’s led to export controls that limit what types of chips Nvidia can sell to China.

The success of DeepSeek signals that these efforts aren’t working as well as AI leaders in the US would like (though it’s worth noting that the impact of export controls for chips isn’t felt for a few years, so the policy wouldn’t be expected to have prevented a model like DeepSeek).

Still, the model poses a threat to the bottom line of certain players in Big Tech. Why pay for an expensive model from OpenAI when you can get access to DeepSeek for free? Even other makers of open-source models, especially Meta, are panicking about the competition, according to The Information. The company has set up a number of “war rooms” to figure out how DeepSeek was made so efficient. (A couple of days after the Stargate announcement, Meta said it would increase its own capital investments by 70% to build more AI infrastructure.)

What does this all mean for the Stargate project? Let’s think about why OpenAI and its partners are willing to spend $500 billion on data centers to begin with. They believe that AI in its various forms—not just chatbots or generative video or even new AI agents, but also developments yet to be unveiled—will be the most lucrative tool humanity has ever built. They also believe that access to powerful chips inside massive data centers is the key to getting there.

DeepSeek poked some holes in that approach. It didn’t train on yet-unreleased chips that are light-years ahead. It didn’t, to our knowledge, require the eye-watering amounts of computing power and energy behind the models from US companies that have made headlines. Its designers made clever decisions in the name of efficiency.

In theory, it could make a project like Stargate seem less urgent and less necessary. If, in dissecting DeepSeek, AI companies discover some lessons about how to make models use existing resources more effectively, perhaps constructing more and more data centers won’t be the only winning formula for better AI. That would be welcome to the many people affected by the problems data centers can bring, like lots of emissions, the loss of fresh, drinkable water used to cool them, and the strain on local power grids.

Thus far, DeepSeek doesn’t seem to have sparked such a change in approach. OpenAI researcher Noam Brown wrote on X, “I have no doubt that with even more compute it would be an even more powerful model.”

If his logic wins out, the players with the most computing power will win, and getting it is apparently worth at least $500 billion to AI’s biggest companies. But let’s remember—announcing it is the easiest part.

Now read the rest of The Algorithm

Deeper Learning

What’s next for robots

Many of the big questions about AI–-how it learns, how well it works, and where it should be deployed—are now applicable to robotics. In the year ahead, we will see humanoid robots being put to the test in warehouses and factories, robots learning in simulated worlds, and a rapid increase in the military’s adoption of autonomous drones, submarines, and more.

Why it matters: Jensen Huang, the highly influential CEO of the chipmaker Nvidia, stated last month that the next advancement in AI will mean giving the technology a “body” of sorts in the physical world. This will come in the form of advanced robotics. Even with the caveat that robotics is full of futuristic promises that usually aren’t fulfilled by their deadlines, the marrying of AI methods with new advancements in robots means the field is changing quickly. Read more here.

Bits and Bytes

Leaked documents expose deep ties between Israeli army and Microsoft

Since the attacks of October 7, the Israeli military has relied heavily on cloud and AI services from Microsoft and its partner OpenAI, and the tech giant’s staff has embedded with different units to support rollout, a joint investigation reveals. (+972 Magazine)

The tech arsenal that could power Trump’s immigration crackdown

The effort by federal agencies to acquire powerful technology to identify and track migrants has been unfolding for years across multiple administrations. These technologies may be called upon more directly under President Trump. (The New York Times)

OpenAI launches Operator—an agent that can use a computer for you

Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or making an online grocery order. (MIT Technology Review)

The second wave of AI coding is here

A string of startups are racing to build models that can produce better and better software. But it’s not only AI’s increasingly powerful ability to write code that’s impressive. They claim it’s the shortest path to superintelligent AI. (MIT Technology Review)

Ecommerce MGMT 0 Comments

App Artificial intelligence China

Jan 25 2025

How a top Chinese AI model overcame US sanctions

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model.

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.

“This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South,” says Hancheng Cao, an assistant professor in information systems at Emory University.

DeepSeek’s success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.

To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University.

DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a “chain of thought” approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.

Dimitris Papailiopoulos, principal researcher at Microsoft’s AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness,” he says.

DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. It claims that one of them even outperforms OpenAI’s o1-mini on certain benchmarks.“DeepSeek has largely replicated o1-mini and has open sourced it,” tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review’s request for comments.

Despite the buzz around R1, DeepSeek remains relatively unknown. Based in Hangzhou, China, it was founded in July 2023 by Liang Wenfeng, an alumnus of Zhejiang University with a background in information and electronic engineering. It was incubated by High-Flyer, a hedge fund that Liang founded in 2015. Like Sam Altman of OpenAI, Liang aims to build artificial general intelligence (AGI), a form of AI that can match or even beat humans on a range of tasks.

Training large language models (LLMs) requires a team of highly trained researchers and substantial computing power. In a recent interview with the Chinese media outlet LatePost, Kai-Fu Lee, a veteran entrepreneur and former head of Google China, said that only “front-row players” typically engage in building foundation models such as ChatGPT, as it’s so resource-intensive. The situation is further complicated by the US export controls on advanced semiconductors. High-Flyer’s decision to venture into AI is directly related to these constraints, however. Long before the anticipated sanctions, Liang acquired a substantial stockpile of Nvidia A100 chips, a type now banned from export to China. The Chinese media outlet 36Kr estimates that the company has over 10,000 units in stock, but Dylan Patel, founder of the AI research consultancy SemiAnalysis, estimates that it has at least 50,000. Recognizing the potential of this stockpile for AI training is what led Liang to establish DeepSeek, which was able to use them in combination with the lower-power chips to develop its models.

Tech giants like Alibaba and ByteDance, as well as a handful of startups with deep-pocketed investors, dominate the Chinese AI space, making it challenging for small or medium-sized enterprises to compete. A company like DeepSeek, which has no plans to raise funds, is rare.

Zihan Wang, the former DeepSeek employee, told MIT Technology Review that he had access to abundant computing resources and was given freedom to experiment when working at DeepSeek, “a luxury that few fresh graduates would get at any company.”

In an interview with the Chinese media outlet 36Kr in July 2024 Liang said that an additional challenge Chinese companies face on top of chip sanctions, is that their AI engineering techniques tend to be less efficient. “We [most Chinese companies] have to consume twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps,” he said.

But DeepSeek found ways to reduce memory usage and speed up calculation without significantly sacrificing accuracy. “The team loves turning a hardware challenge into an opportunity for innovation,” says Wang.

Liang himself remains deeply involved in DeepSeek’s research process, running experiments alongside his team. “The whole team shares a collaborative culture and dedication to hardcore research,” Wang says.

As well as prioritizing efficiency, Chinese companies are increasingly embracing open-source principles. Alibaba Cloud has released over 100 new open-source AI models, supporting 29 languages and catering to various applications, including coding and mathematics. Similarly, startups like Minimax and 01.AI have open-sourced their models.

According to a white paper released last year by the China Academy of Information and Communications Technology, a state-affiliated research institute, the number of AI large language models worldwide has reached 1,328, with 36% originating in China. This positions China as the second-largest contributor to AI, behind the United States.

“This generation of young Chinese researchers identify strongly with open-source culture because they benefit so much from it,” says Thomas Qitong Cao, an assistant professor of technology policy at Tufts University.

“The US export control has essentially backed Chinese companies into a corner where they have to be far more efficient with their limited computing resources,” says Matt Sheehan, an AI researcher at the Carnegie Endowment for International Peace. “We are probably going to see a lot of consolidation in the future related to the lack of compute.”

That might already have started to happen. Two weeks ago, Alibaba Cloud announced that it has partnered with the Beijing-based startup 01.AI, founded by Kai-Fu Lee, to merge research teams and establish an “industrial large model laboratory.”

“It is energy-efficient and natural for some kind of division of labor to emerge in the AI industry,” says Cao, the Tufts professor. “The rapid evolution of AI demands agility from Chinese firms to survive.”

Ecommerce MGMT 0 Comments

Jan 24 2025

OpenAI launches Operator—an agent that can use a computer for you

After weeks of buzz, OpenAI has released Operator, its first AI agent. Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or filling an online grocery order. The app is powered by a new model called Computer-Using Agent—CUA (“coo-ah”), for short—built on top of OpenAI’s multimodal large language model GPT-4o.

Operator is available today at operator.chatgpt.com to people in the US signed up with ChatGPT Pro, OpenAI’s premium $200-a-month service. The company says it plans to roll the tool out to other users in the future.

OpenAI claims that Operator outperforms similar rival tools, including Anthropic’s Computer Use (a version of Claude 3.5 Sonnet that can carry out simple tasks on a computer) and Google DeepMind’s Mariner (a web-browsing agent built on top of Gemini 2.0).

The fact that three of the world’s top AI firms have converged on the same vision of what agent-based models could be makes one thing clear. The battle for AI supremacy has a new frontier—and it’s our computer screens.

“Moving from generating text and images to doing things is the right direction,” says Ali Farhadi, CEO of the Allen Institute for AI (AI2). “It unlocks business, solves new problems.”

Farhadi thinks that doing things on a computer screen is a natural first step for agents: “It is constrained enough that the current state of the technology can actually work,” he says. “At the same time, it’s impactful enough that people might use it.” (AI2 is working on its own computer-using agent, says Farhadi.)

Don’t believe the hype

OpenAI’s announcement also confirms one of two rumors that circled the internet this week. One predicted that OpenAI was about to reveal an agent-based app, after details about Operator were leaked on social media ahead of its release. The other predicted that OpenAI was about to reveal a new superintelligence—and that officials for newly inaugurated President Trump would be briefed on it.

Could the two rumors be linked? OpenAI superfans wanted to know.

Nope. OpenAI gave MIT Technology Review a preview of Operator in action yesterday. The tool is an exciting glimpse of large language models’ potential to do a lot more than answer questions. But Operator is an experimental work in progress. “It’s still early, it still makes mistakes,” says Yash Kumar, a researcher at OpenAI.

(As for the wild superintelligence rumors, let’s leave that to OpenAI CEO Sam Altman to address: “twitter hype is out of control again,” he posted on January 20. “pls chill and cut your expectations 100x!”)

Like Anthropic’s Computer Use and Google DeepMind’s Mariner, Operator takes screenshots of a computer screen and scans the pixels to figure out what actions it can take. CUA, the model behind it, is trained to interact with the same graphical user interfaces—buttons, text boxes, menus—that people use when they do things online. It scans the screen, takes an action, scans the screen again, takes another action, and so on. That lets the model carry out tasks on most websites that a person can use.

“Traditionally the way models have used software is through specialized APIs,” says Reiichiro Nakano, a scientist at OpenAI. (An API, or application programming interface, is a piece of code that acts as a kind of connector, allowing different bits of software to be hooked up to one another.) That puts a lot of apps and most websites off limits, he says: “But if you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible.”

CUA also breaks tasks down into smaller steps and tries to work through them one by one, backtracking when it gets stuck. OpenAI says CUA was trained with techniques similar to those used for its so-called reasoning models, o1 and o3.

Operator can be instructed to search for campsites in Yosemite with good picnic tables.

OpenAI has tested CUA against a number of industry benchmarks designed to assess the ability of an agent to carry out tasks on a computer. The company claims that its model beats Computer Use and Mariner in all of them.

For example, on OSWorld, which tests how well an agent performs tasks such as merging PDF files or manipulating an image, CUA scores 38.1% to Computer Use’s 22.0% In comparison, humans score 72.4%. On a benchmark called WebVoyager, which tests how well an agent performs tasks in a browser, CUA scores 87%, Mariner 83.5%, and Computer Use 56%. (Mariner can only carry out tasks in a browser and therefore does not score on OSWorld.)

For now, Operator can also only carry out tasks in a browser. OpenAI plans to make CUA’s wider abilities available in the future via an API that other developers can use to build their own apps. This is how Anthropic released Computer Use in December.

OpenAI says it has tested CUA’s safety, using red teams to explore what happens when users ask it to do unacceptable tasks (such as research how to make a bioweapon), when websites contain hidden instructions designed to derail it, and when the model itself breaks down. “We’ve trained the model to stop and ask the user for information before doing anything with external side effects,” says Casey Chu, another researcher on the team.

Look! No hands

To use Operator, you simply type instructions into a text box. But instead of calling up the browser on your computer, Operator sends your instructions to a remote browser running on an OpenAI server. OpenAI claims that this makes the system more efficient. It’s another key difference between Operator, Computer Use and Mariner (which runs inside Google’s Chrome browser on your own computer).

Because it’s running in the cloud, Operator can carry out multiple tasks at once, says Kumar. In the live demo, he asked Operator to use OpenTable to book him a table for two at 6.30 p.m. at a restaurant called Octavia in San Francisco. Straight away, Operator opened up OpenTable and started clicking through options. “As you can see, my hands are off the keyboard,” he said.

OpenAI is collaborating with a number of businesses, including OpenTable, StubHub, Instacart, DoorDash, and Uber. The nature of those collaborations is not exactly clear, but Operator appears to suggest preset websites to use for certain tasks.

While the tool navigated dropdowns on OpenTable, Kumar sent Operator off to find four tickets for a Kendrick Lamar show on StubHub. While it did that, he pasted a photo of a handwritten shopping list and asked Operator to add the items to his Instacart.

He waited, flicking between Operator’s tabs. “If it needs help or if it needs confirmations, it’ll come back to you with questions and you can answer it,” he said.

Kumar says he has been using Operator at home. It helps him stay on top of grocery shopping: “I can just quickly click a photo of a list and send it to work,” he says.

It’s also become a sidekick in his personal life. “I have a date night every Thursday,” says Kumar. So every Thursday morning, he instructs Operator to send him a list of five restaurants that have a table for two that evening. “Of course, I could do that, but it takes me 10 minutes,” he says. “And I often forget to do it. With Operator, I can run the task with one click. There’s no burden of booking.”

Ecommerce MGMT 0 Comments

App Artificial intelligence What's Next in Tech

Jan 24 2025

What’s next for robots

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

Jan Liphardt teaches bioengineering at Stanford, but to many strangers in Los Altos, California, he is a peculiar man they see walking a four-legged robotic dog down the street.

Liphardt has been experimenting with building and modifying robots for years, and when he brings his “dog” out in public, he generally gets one of three reactions. Young children want to have one, their parents are creeped out, and baby boomers try to ignore it. “They’ll quickly walk by,” he says, “like, ‘What kind of dumb new stuff is going on here?’”

In the many conversations I’ve had about robots, I’ve also found that most people tend to fall into these three camps, though I don’t see such a neat age division. Some are upbeat and vocally hopeful that a future is just around the corner in which machines can expertly handle much of what is currently done by humans, from cooking to surgery. Others are scared: of job losses, injuries, and whatever problems may come up as we try to live side by side.

The final camp, which I think is the largest, is just unimpressed. We’ve been sold lots of promises that robots will transform society ever since the first robotic arm was installed on an assembly line at a General Motors plant in New Jersey in 1961. Few of those promises have panned out so far.

But this year, there’s reason to think that even those staunchly in the “bored” camp will be intrigued by what’s happening in the robot races. Here’s a glimpse at what to keep an eye on.

Humanoids are put to the test

The race to build humanoid robots is motivated by the idea that the world is set up for the human form, and that automating that form could mean a seismic shift for robotics. It is led by some particularly outspoken and optimistic entrepreneurs, including Brett Adcock, the founder of Figure AI, a company making such robots that’s valued at more than $2.6 billion (it’s begun testing its robots with BMW). Adcock recently told Time, “Eventually, physical labor will be optional.” Elon Musk, whose company Tesla is building a version called Optimus, has said humanoid robots will create “a future where there is no poverty.” A robotics company called Eliza Wakes Up is taking preorders for a $420,000 humanoid called, yes, Eliza.

In June 2024, Agility Robotics sent a fleet of its Digit humanoid robots to GXO Logistics, which moves products for companies ranging from Nike to Nestlé. The humanoids can handle most tasks that involve picking things up and moving them somewhere else, like unloading pallets or putting boxes on a conveyor.

There have been hiccups: Highly polished concrete floors can cause robots to slip at first, and buildings need good Wi-Fi coverage for the robots to keep functioning. But charging is a bigger issue. Agility’s current version of Digit, with a 39-pound battery, can run for two to four hours before it needs to charge for one hour, so swapping out the robots for fresh ones is a common task on each shift. If there are a small number of charging docks installed, the robots can theoretically charge by shuffling among the docks themselves overnight when some facilities aren’t running, but moving around on their own can set off a building’s security system. “It’s a problem,” says CTO Melonee Wise.

Wise is cautious about whether humanoids will be widely adopted in workplaces. “I’ve always been a pessimist,” she says. That’s because getting robots to work well in a lab is one thing, but integrating them into a bustling warehouse full of people and forklifts moving goods on tight deadlines is another task entirely.

If 2024 was the year of unsettling humanoid product launch videos, this year we will see those humanoids put to the test, and we’ll find out whether they’ll be as productive for paying customers as promised. Now that Agility’s robots have been deployed in fast-paced customer facilities, it’s clear that small problems can really add up.

Then there are issues with how robots and humans share spaces. In the GXO facility the two work in completely separate areas, Wise says, but there are cases where, for example, a human worker might accidentally leave something obstructing a charging station. That means Agility’s robots can’t return to the dock to charge, so they need to alert a human employee to move the obstruction out of the way, slowing operations down.

It’s often said that robots don’t call out sick or need health care. But this year, as fleets of humanoids arrive on the job, we’ll begin to find out the limitations they do have.

Learning from imagination

The way we teach robots how to do things is changing rapidly. It used to be necessary to break their tasks down into steps with specifically coded instructions, but now, thanks to AI, those instructions can be gleaned from observation. Just as ChatGPT was taught to write through exposure to trillions of sentences rather than by explicitly learning the rules of grammar, robots are learning through videos and demonstrations.

That poses a big question: Where do you get all these videos and demonstrations for robots to learn from?

Nvidia, the world’s most valuable company, has long aimed to meet that need with simulated worlds, drawing on its roots in the video-game industry. It creates worlds in which roboticists can expose digital replicas of their robots to new environments to learn. A self-driving car can drive millions of virtual miles, or a factory robot can learn how to navigate in different lighting conditions.

In December, the company went a step further, releasing what it’s calling a “world foundation model.” Called Cosmos, the model has learned from 20 million hours of video—the equivalent of watching YouTube nonstop since Rome was at war with Carthage—that can be used to generate synthetic training data.

Here’s an example of how this model could help in practice. Imagine you run a robotics company that wants to build a humanoid that cleans up hospitals. You can start building this robot’s “brain” with a model from Nvidia, which will give it a basic understanding of physics and how the world works, but then you need to help it figure out the specifics of how hospitals work. You could go out and take videos and images of the insides of hospitals, or pay people to wear sensors and cameras while they go about their work there.

“But those are expensive to create and time consuming, so you can only do a limited number of them,” says Rev Lebaredian, vice president of simulation technologies at Nvidia. Cosmos can instead take a handful of those examples and create a three-dimensional simulation of a hospital. It will then start making changes—different floor colors, different sizes of hospital beds—and create slightly different environments. “You’ll multiply that data that you captured in the real world millions of times,” Lebaredian says. In the process, the model will be fine-tuned to work well in that specific hospital setting.

It’s sort of like learning both from your experiences in the real world and from your own imagination (stipulating that your imagination is still bound by the rules of physics).

Teaching robots through AI and simulations isn’t new, but it’s going to become much cheaper and more powerful in the years to come.

A smarter brain gets a smarter body

Plenty of progress in robotics has to do with improving the way a robot senses and plans what to do—its “brain,” in other words. Those advancements can often happen faster than those that improve a robot’s “body,” which determine how well a robot can move through the physical world, especially in environments that are more chaotic and unpredictable than controlled assembly lines.

The military has always been keen on changing that and expanding the boundaries of what’s physically possible. The US Navy has been testing machines from a company called Gecko Robotics that can navigate up vertical walls (using magnets) to do things like infrastructure inspections, checking for cracks, flaws, and bad welding on aircraft carriers.

There are also investments being made for the battlefield. While nimble and affordable drones have reshaped rural battlefields in Ukraine, new efforts are underway to bring those drone capabilities indoors. The defense manufacturer Xtend received an $8.8 million contract from the Pentagon in December 2024 for its drones, which can navigate in confined indoor spaces and urban environments. These so-called “loitering munitions” are one-way attack drones carrying explosives that detonate on impact.

“These systems are designed to overcome challenges like confined spaces, unpredictable layouts, and GPS-denied zones,” says Rubi Liani, cofounder and CTO at Xtend. Deliveries to the Pentagon should begin in the first few months of this year.

Another initiative—sparked in part by the Replicator project, the Pentagon’s plan to spend more than $1 billion on small unmanned vehicles—aims to develop more autonomously controlled submarines and surface vehicles. This is particularly of interest as the Department of Defense focuses increasingly on the possibility of a future conflict in the Pacific between China and Taiwan. In such a conflict, the drones that have dominated the war in Ukraine would serve little use because battles would be waged almost entirely at sea, where small aerial drones would be limited by their range. Instead, undersea drones would play a larger role.

All these changes, taken together, point toward a future where robots are more flexible in how they learn, where they work, and how they move.

Jan Liphardt from Stanford thinks the next frontier of this transformation will hinge on the ability to instruct robots through speech. Large language models’ ability to understand and generate text has already made them a sort of translator between Liphardt and his robot.

“We can take one of our quadrupeds and we can tell it, ‘Hey, you’re a dog,’ and the thing wants to sniff you and tries to bark,” he says. “Then we do one word change—‘You’re a cat.’ Then the thing meows and, you know, runs away from dogs. And we haven’t changed a single line of code.”

Correction: A previous version of this story incorrectly stated that the robotics company Eliza Wakes Up has ties to a16z.

Ecommerce MGMT 0 Comments

Jan 23 2025

Implementing responsible AI in the generative age

Many organizations have experimented with AI, but they haven’t always gotten the full value from their investments. A host of issues standing in the way center on the accuracy, fairness, and security of AI systems. In response, organizations are actively exploring the principles of responsible AI: the idea that AI systems must be fair, transparent, and beneficial to society for it to be widely adopted.

When responsible AI is done right, it unlocks trust and therefore customer adoption of enterprise AI. According to the US National Institute of Standards and Technology the essential building blocks of AI trustworthiness include:

Validity and reliability
Safety
Security and resiliency
Accountability and transparency
Explainability and interpretability
Privacy
Fairness with mitigation of harmful bias

DOWNLOAD THE REPORT

To investigate the current landscape of responsible AI across the enterprise, MIT Technology Review Insights surveyed 250 business leaders about how they’re implementing principles that ensure AI trustworthiness. The poll found that responsible AI is important to executives, with 87% of respondents rating it a high or medium priority for their organization.

A majority of respondents (76%) also say that responsible AI is a high or medium priority specifically for creating a competitive advantage. But relatively few have figured out how to turn these ideas into reality. We found that only 15% of those surveyed felt highly prepared to adopt effective responsible AI practices, despite the importance they placed on them.

Putting responsible AI into practice in the age of generative AI requires a series of best practices that leading companies are adopting. These practices can include cataloging AI models and data and implementing governance controls. Companies may benefit from conducting rigorous assessments, testing, and audits for risk, security, and regulatory compliance. At the same time, they should also empower employees with training at scale and ultimately make responsible AI a leadership priority to ensure their change efforts stick.

“We all know AI is the most influential change in technology that we’ve seen, but there’s a huge disconnect,” says Steven Hall, chief AI officer and president of EMEA at ISG, a global technology research and IT advisory firm. “Everybody understands how transformative AI is going to be and wants strong governance, but the operating model and the funding allocated to responsible AI are well below where they need to be given its criticality to the organization.”

Download the full report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Ecommerce MGMT 0 Comments

App Artificial intelligence

Jan 22 2025

OpenAI has upped its lobbying efforts nearly sevenfold

OpenAI spent $1.76 million on government lobbying in 2024 and $510,000 in the last three months of the year alone, according to a new disclosure filed on Tuesday—a significant jump from 2023, when the company spent just $260,000 on Capitol Hill. The company also disclosed a new in-house lobbyist, Meghan Dorn, who worked for five years for Senator Lindsey Graham and started at OpenAI in October. The filing also shows activity related to two new pieces of legislation in the final months of the year: the House’s AI Advancement and Reliability Act, which would set up a government center for AI research, and the Senate’s Future of Artificial Intelligence Innovation Act, which would create shared benchmark tests for AI models.

OpenAI did not respond to questions about its lobbying efforts.

But perhaps more important, the disclosure is a clear signal of the company’s arrival as a political player, as its first year of serious lobbying ends and Republican control of Washington begins. While OpenAI’s lobbying spending is still dwarfed by its peers’—Meta tops the list of Big Tech spenders, with more than $24 million in 2024—the uptick comes as it and other AI companies have helped redraw the shape of AI policy.

For the past few years, AI policy has been something like a whack-a-mole response to the risks posed by deepfakes and misinformation. But over the last year, AI companies have started to position the success of the technology as pivotal to national security and American competitiveness, arguing that the government must therefore support the industry’s growth. As a result, OpenAI and others now seem poised to gain access to cheaper energy, lucrative national security contracts, and a more lax regulatory environment that’s unconcerned with the minutiae of AI safety.

While the big players seem more or less aligned on this grand narrative, messy divides on other issues are still threatening to break through the harmony on display at President Trump’s inauguration this week.

AI regulation really began in earnest after ChatGPT launched in November 2022. At that point, “a lot of the conversation was about responsibility,” says Liana Keesing, campaigns manager for technology reform at Issue One, a democracy nonprofit that tracks Big Tech’s influence.

Companies were asked what they’d do about sexually abusive deepfake images and election disinformation. “Sam Altman did a very good job coming in and painting himself early as a supporter of that process,” Keesing says.

OpenAI started its official lobbying effort around October 2023, hiring Chan Park—a onetime Senate Judiciary Committee counsel and Microsoft lobbyist—to lead the effort. Lawmakers, particularly then Senate majority leader Chuck Schumer, were vocal about wanting to curb these particular harms; OpenAI hired Schumer’s former legal counsel, Reginald Babin, as a lobbyist, according to data from OpenSecrets. This past summer, the company hired the veteran political operative Chris Lehane as its head of global policy.

OpenAI’s previous disclosures confirm that the company’s lobbyists subsequently focused much of last year on legislation like the No Fakes Act and the Protect Elections from Deceptive AI Act. The bills did not materialize into law. But as the year went on, the regulatory goals of AI companies began to change. “One of the biggest shifts that we’ve seen,” Keesing says, “is that they’ve really started to focus on energy.”

In September, Altman, along with leaders from Nvidia, Anthropic, and Google, visited the White House and pitched the vision that US competitiveness in AI will depend on subsidized energy infrastructure to train the best models. Altman proposed to the Biden administration the construction of multiple five-gigawatt data centers, which would each consume as much electricity as New York City.

Around the same time, companies like Meta and Microsoft started to say that nuclear energy will provide the path forward for AI, announcing deals aimed at firing up new nuclear power plants.

It seems likely OpenAI’s policy team was already planning for this particular shift. In April, the company hired lobbyist Matthew Rimkunas, who worked for Bill Gates’s sustainable energy effort Breakthrough Energies and, before that, spent 16 years working for Senator Graham; the South Carolina Republican serves on the Senate subcommittee that manages nuclear safety.

This new AI energy race is inseparable from the positioning of AI as essential for national security and US competitiveness with China. OpenAI laid out its position in a blog post in October, writing, “AI is a transformational technology that can be used to strengthen democratic values or to undermine them. That’s why we believe democracies should continue to take the lead in AI development.” Then in December, the company went a step further and reversed its policy against working with the military, announcing it would develop AI models with the defense-tech company Anduril to help take down drones around military bases.

That same month, Sam Altman said during an interview with The Free Press that the Biden administration was “not that effective” in shepherding AI: “The things that I think should have been the administration’s priorities, and I hope will be the next administration’s priorities, are building out massive AI infrastructure in the US, having a supply chain in the US, things like that.”

That characterization glosses over the CHIPS Act, a $52 billion stimulus to the domestic chips industry that is, at least on paper, aligned with Altman’s vision. (It also preceded an executive order Biden issued just last week, to lease federal land to host the type of gigawatt-scale data centers that Altman had been asking for.)

Intentionally or not, Altman’s posture aligned him with the growing camaraderie between President Trump and Silicon Valley. Mark Zuckerberg, Elon Musk, Jeff Bezos, and Sundar Pichai all sat directly behind Trump’s family at the inauguration on Monday, and Altman also attended. Many of them had also made sizable donations to Trump’s inaugural fund, with Altman personally throwing in $1 million.

It’s easy to view the inauguration as evidence that these tech leaders are aligned with each other, and with other players in Trump’s orbit. But there are still some key dividing lines that will be worth watching. Notably, there’s the clash over H-1B visas, which allow many noncitizen AI researchers to work in the US. Musk and Vivek Ramaswamy (who is, as of this week, no longer a part of the so-called Department of Government Efficiency) have been pushing for that visa program to be expanded. This sparked backlash from some allies of the Trump administration, perhaps most loudly Steve Bannon.

Another fault line is the battle between open- and closed-source AI. Google and OpenAI prevent anyone from knowing exactly what’s in their most powerful models, often arguing that this keeps them from being used improperly by bad actors. Musk has sued OpenAI and Microsoft over the issue, alleging that closed-source models are antithetical to OpenAI’s hybrid nonprofit structure. Meta, whose Llama model is open-source, recently sided with Musk in that lawsuit. Venture capitalist and Trump ally Marc Andreessen echoed these criticisms of OpenAI on X just hours after the inauguration. (Andreessen has also said that making AI models open-source “makes overbearing regulations unnecessary.”)

Finally, there are the battles over bias and free speech. The vastly different approaches that social media companies have taken to moderating content—including Meta’s recent announcement that it would end its US fact-checking program—raise questions about whether the way AI models are moderated will continue to splinter too. Musk has lamented what he calls the “wokeness” of many leading models, and Andreessen said on Tuesday that “Chinese LLMs are much less censored than American LLMs” (though that’s not quite true, given that many Chinese AI models have government-mandated censorship in place that forbids particular topics). Altman has been more equivocal: “No two people are ever going to agree that one system is perfectly unbiased,” he told The Free Press.

It’s only the start of a new era in Washington, but the White House has been busy. It has repealed many executive orders signed by President Biden, including the landmark order on AI that imposed rules for government use of the technology (while it appears to have kept Biden’s order on leasing land for more data centers). Altman is busy as well. OpenAI, Oracle, and SoftBank reportedly plan to spend up to $500 billion on a joint venture for new data centers; the project was announced by President Trump, with Altman standing alongside. And according to Axios, Altman will also be part of a closed-door briefing with government officials on January 30, reportedly about OpenAI’s development of a powerful new AI agent.

Ecommerce MGMT 0 Comments

App Artificial intelligence

Jan 21 2025

The second wave of AI coding is here

Ask people building generative AI what generative AI is good for right now—what they’re really fired up about—and many will tell you: coding.

“That’s something that’s been very exciting for developers,” Jared Kaplan, chief scientist at Anthropic, told MIT Technology Review this month: “It’s really understanding what’s wrong with code, debugging it.”

Copilot, a tool built on top of OpenAI’s large language models and launched by Microsoft-backed GitHub in 2022, is now used by millions of developers around the world. Millions more turn to general-purpose chatbots like Anthropic’s Claude, OpenAI’s ChatGPT, and Google DeepMind’s Gemini for everyday help.

“Today, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers,” Alphabet CEO Sundar Pichai claimed on an earnings call in October: “This helps our engineers do more and move faster.” Expect other tech companies to catch up, if they haven’t already.

It’s not just the big beasts rolling out AI coding tools. A bunch of new startups have entered this buzzy market too. Newcomers such as Zencoder, Merly, Cosine, Tessl (valued at $750 million within months of being set up), and Poolside (valued at $3 billion before it even released a product) are all jostling for their slice of the pie. “It actually looks like developers are willing to pay for copilots,” says Nathan Benaich, an analyst at investment firm Air Street Capital: “And so code is one of the easiest ways to monetize AI.”

Such companies promise to take generative coding assistants to the next level. Instead of providing developers with a kind of supercharged autocomplete, like most existing tools, this next generation can prototype, test, and debug code for you. The upshot is that developers could essentially turn into managers, who may spend more time reviewing and correcting code written by a model than writing it from scratch themselves.

But there’s more. Many of the people building generative coding assistants think that they could be a fast track to artificial general intelligence (AGI), the hypothetical superhuman technology that a number of top firms claim to have in their sights.

“The first time we will see a massively economically valuable activity to have reached human-level capabilities will be in software development,” says Eiso Kant, CEO and cofounder of Poolside. (OpenAI has already boasted that its latest o3 model beat the company’s own chief scientist in a competitive coding challenge.)

Welcome to the second wave of AI coding.

Correct code

Software engineers talk about two types of correctness. There’s the sense in which a program’s syntax (its grammar) is correct—meaning all the words, numbers, and mathematical operators are in the right place. This matters a lot more than grammatical correctness in natural language. Get one tiny thing wrong in thousands of lines of code and none of it will run.

The first generation of coding assistants are now pretty good at producing code that’s correct in this sense. Trained on billions of pieces of code, they have assimilated the surface-level structures of many types of programs.

But there’s also the sense in which a program’s function is correct: Sure, it runs, but does it actually do what you wanted it to? It’s that second level of correctness that the new wave of generative coding assistants are aiming for—and this is what will really change the way software is made.

“Large language models can write code that compiles, but they may not always write the program that you wanted,” says Alistair Pullen, a cofounder of Cosine. “To do that, you need to re-create the thought processes that a human coder would have gone through to get that end result.”

The problem is that the data most coding assistants have been trained on—the billions of pieces of code taken from online repositories—doesn’t capture those thought processes. It represents a finished product, not what went into making it. “There’s a lot of code out there,” says Kant. “But that data doesn’t represent software development.”

What Pullen, Kant, and others are finding is that to build a model that does a lot more than autocomplete—one that can come up with useful programs, test them, and fix bugs—you need to show it a lot more than just code. You need to show it how that code was put together.

In short, companies like Cosine and Poolside are building models that don’t just mimic what good code looks like—whether it works well or not—but mimic the process that produces such code in the first place. Get it right and the models will come up with far better code and far better bug fixes.

Breadcrumbs

But you first need a data set that captures that process—the steps that a human developer might take when writing code. Think of these steps as a breadcrumb trail that a machine could follow to produce a similar piece of code itself.

Part of that is working out what materials to draw from: Which sections of the existing codebase are needed for a given programming task? “Context is critical,” says Zencoder founder Andrew Filev. “The first generation of tools did a very poor job on the context, they would basically just look at your open tabs. But your repo [code repository] might have 5000 files and they’d miss most of it.”

Zencoder has hired a bunch of search engine veterans to help it build a tool that can analyze large codebases and figure out what is and isn’t relevant. This detailed context reduces hallucinations and improves the quality of code that large language models can produce, says Filev: “We call it repo grokking.”

Cosine also thinks context is key. But it draws on that context to create a new kind of data set. The company has asked dozens of coders to record what they were doing as they worked through hundreds of different programming tasks. “We asked them to write down everything,” says Pullen: “Why did you open that file? Why did you scroll halfway through? Why did you close it?” They also asked coders to annotate finished pieces of code, marking up sections that would have required knowledge of other pieces of code or specific documentation to write.

Cosine then takes all that information and generates a large synthetic data set that maps the typical steps coders take, and the sources of information they draw on, to finished pieces of code. They use this data set to train a model to figure out what breadcrumb trail it might need to follow to produce a particular program, and then how to follow it.

Poolside, based in San Francisco, is also creating a synthetic data set that captures the process of coding, but it leans more on a technique called RLCE—reinforcement learning from code execution. (Cosine uses this too, but to a lesser degree.)

RLCE is analogous to the technique used to make chatbots like ChatGPT slick conversationalists, known as RLHF—reinforcement learning from human feedback. With RLHF, a model is trained to produce text that’s more like the kind human testers say they favor. With RLCE, a model is trained to produce code that’s more like the kind that does what it is supposed to do when it is run (or executed).

Gaming the system

Cosine and Poolside both say they are inspired by the approach DeepMind took with its game-playing model AlphaZero. AlphaZero was given the steps it could take—the moves in a game—and then left to play against itself over and over again, figuring out via trial and error what sequence of moves were winning moves and which were not.

“They let it explore moves at every possible turn, simulate as many games as you can throw compute at—that led all the way to beating Lee Sedol,” says Pengming Wang, a founding scientist at Poolside, referring to the Korean Go grandmaster that AlphaZero beat in 2016. Before Poolside, Wang worked at Google DeepMind on applications of AlphaZero beyond board games, including FunSearch, a version trained to solve advanced math problems.

When that AlphaZero approach is applied to coding, the steps involved in producing a piece of code—the breadcrumbs—become the available moves in a game, and a correct program becomes winning that game. Left to play by itself, a model can improve far faster than a human could. “A human coder tries and fails one failure at a time,” says Kant. “Models can try things 100 times at once.”

A key difference between Cosine and Poolside is that Cosine is using a custom version of GPT-4o provided by OpenAI, which makes it possible to train on a larger data set than the base model can cope with, but Poolside is building its own large language model from scratch.

Poolside’s Kant thinks that training a model on code from the start will give better results than adapting an existing model that has sucked up not only billions of pieces of code but most of the internet. “I’m perfectly fine with our model forgetting about butterfly anatomy,” he says.

Cosine claims that its generative coding assistant, called Genie, tops the leaderboard on SWE-Bench, a standard set of tests for coding models. Poolside is still building its model but claims that what it has so far already matches the performance of GitHub’s Copilot.

“I personally have a very strong belief that large language models will get us all the way to being as capable as a software developer,” says Kant.

Not everyone takes that view, however.

Illogical LLMs

To Justin Gottschlich, the CEO and founder of Merly, large language models are the wrong tool for the job—period. He invokes his dog: “No amount of training for my dog will ever get him to be able to code, it just won’t happen,” he says. “He can do all kinds of other things, but he’s just incapable of that deep level of cognition.”

Having worked on code generation for more than a decade, Gottschlich has a similar sticking point with large language models. Programming requires the ability to work through logical puzzles with unwavering precision. No matter how well large language models may learn to mimic what human programmers do, at their core they are still essentially statistical slot machines, he says: “I can’t train an illogical system to become logical.”

Instead of training a large language model to generate code by feeding it lots of examples, Merly does not show its system human-written code at all. That’s because to really build a model that can generate code, Gottschlich argues, you need to work at the level of the underlying logic that code represents, not the code itself. Merly’s system is therefore trained on an intermediate representation—something like the machine-readable notation that most programming languages get translated into before they are run.

Gottschlich won’t say exactly what this looks like or how the process works. But he throws out an analogy: There’s this idea in mathematics that the only numbers that have to exist are prime numbers, because you can calculate all other numbers using just the primes. “Take that concept and apply it to code,” he says.

Not only does this approach get straight to the logic of programming; it’s also fast, because millions of lines of code are reduced to a few thousand lines of intermediate language before the system analyzes them.

Shifting mindsets

What you think of these rival approaches may depend on what you want generative coding assistants to be.

In November, Cosine banned its engineers from using tools other than its own products. It is now seeing the impact of Genie on its own engineers, who often find themselves watching the tool as it comes up with code for them. “You now give the model the outcome you would like, and it goes ahead and worries about the implementation for you,” says Yang Li, another Cosine cofounder.

Pullen admits that it can be baffling, requiring a switch of mindset. “We have engineers doing multiple tasks at once, flitting between windows,” he says. “While Genie is running code in one, they might be prompting it to do something else in another.”

These tools also make it possible to protype multiple versions of a system at once. Say you’re developing software that needs a payment system built in. You can get a coding assistant to simultaneously try out several different options—Stripe, Mango, Checkout—instead of having to code them by hand one at a time.

Genie can be left to fix bugs around the clock. Most software teams use bug-reporting tools that let people upload descriptions of errors they have encountered. Genie can read these descriptions and come up with fixes. Then a human just needs to review them before updating the code base.

No single human understands the trillions of lines of code in today’s biggest software systems, says Li, “and as more and more software gets written by other software, the amount of code will only get bigger.”

This will make coding assistants that maintain that code for us essential. “The bottleneck will become how fast humans can review the machine-generated code,” says Li.

How do Cosine’s engineers feel about all this? According to Pullen, at least, just fine. “If I give you a hard problem, you’re still going to think about how you want to describe that problem to the model,” he says. “Instead of writing the code, you have to write it in natural language. But there’s still a lot of thinking that goes into that, so you’re not really taking the joy of engineering away. The itch is still scratched.”

Some may adapt faster than others. Cosine likes to invite potential hires to spend a few days coding with its team. A couple of months ago it asked one such candidate to build a widget that would let employees share cool bits of software they were working on to social media.

The task wasn’t straightforward, requiring working knowledge of multiple sections of Cosine’s millions of lines of code. But the candidate got it done in a matter of hours. “This person who had never seen our code base turned up on Monday and by Tuesday afternoon he’d shipped something,” says Li. “We thought it would take him all week.” (They hired him.)

But there’s another angle too. Many companies will use this technology to cut down on the number of programmers they hire. Li thinks we will soon see tiers of software engineers. At one end there will be elite developers with million-dollar salaries who can diagnose problems when the AI goes wrong. At the other end, smaller teams of 10 to 20 people will do a job that once required hundreds of coders. “It will be like how ATMs transformed banking,” says Li.

“Anything you want to do will be determined by compute and not head count,” he says. “I think it’s generally accepted that the era of adding another few thousand engineers to your organization is over.”

Warp drives

Indeed, for Gottschlich, machines that can code better than humans are going to be essential. For him, that’s the only way we will build the vast, complex software systems that he thinks we will eventually need. Like many in Silicon Valley, he anticipates a future in which humans move to other planets. That’s only going to be possible if we get AI to build the software required, he says: “Merly’s real goal is to get us to Mars.”

Gottschlich prefers to talk about “machine programming” rather than “coding assistants,” because he thinks that term frames the problem the wrong way. “I don’t think that these systems should be assisting humans—I think humans should be assisting them,” he says. “They can move at the speed of AI. Why restrict their potential?”

“There’s this cartoon called The Flintstones where they have these cars, but they only move when the drivers use their feet,” says Gottschlich. “This is sort of how I feel most people are doing AI for software systems.”

“But what Merly’s building is, essentially, spaceships,” he adds. He’s not joking. “And I don’t think spaceships should be powered by humans on a bicycle. Spaceships should be powered by a warp engine.”

If that sounds wild—it is. But there’s a serious point to be made about what the people building this technology think the end goal really is.

Gottschlich is not an outlier with his galaxy-brained take. Despite their focus on products that developers will want to use today, most of these companies have their sights on a far bigger payoff. Visit Cosine’s website and the company introduces itself as a “Human Reasoning Lab.” It sees coding as just the first step toward a more general-purpose model that can mimic human problem-solving in a number of domains.

Poolside has similar goals: The company states upfront that it is building AGI. “Code is a way of formalizing reasoning,” says Kant.

Wang invokes agents. Imagine a system that can spin up its own software to do any task on the fly, he says. “If you get to a point where your agent can really solve any computational task that you want through the means of software—that is a display of AGI, essentially.”

Down here on Earth, such systems may remain a pipe dream. And yet software engineering is changing faster than many at the cutting edge expected.

“We’re not at a point where everything’s just done by machines, but we’re definitely stepping away from the usual role of a software engineer,” says Cosine’s Pullen. “We’re seeing the sparks of that new workflow—what it means to be a software engineer going into the future.”

Ecommerce MGMT 0 Comments

App Artificial intelligence

Jan 16 2025

Meta’s new AI model can translate speech from more than 100 languages

Meta has released a new AI model that can translate speech from 101 different languages. It represents a step toward real-time, simultaneous interpretation, where words are translated as soon as they come out of someone’s mouth.

Typically, translation models for speech use a multistep approach. First they translate speech into text. Then they translate that text into text in another language. Finally, that translated text is turned into speech in the new language. This method can be inefficient, and at each step, errors and mistranslations can creep in. But Meta’s new model, called SeamlessM4T, enables more direct translation from speech in one language to speech in another. The model is described in a paper published today in Nature.

Seamless can translate text with 23% more accuracy than the top existing models. And although another model, Google’s AudioPaLM, can technically translate more languages—113 of them, versus 101 for Seamless—it can translate them only into English. SeamlessM4T can translate into 36 other languages.

The key is a process called parallel data mining, which finds instances when the sound in a video or audio matches a subtitle in another language from crawled web data. The model learned to associate those sounds in one language with the matching pieces of text in another. This opened up a whole new trove of examples of translations for their model.

“Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition,” says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. “The mere number of languages they are supporting is a tremendous achievement.”

Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker, Canada Research Chair in Translation, Technologies and Society at Université Laval in Quebec, who didn’t work on Seamless. “Languages are a reflection of cultures, and cultures have their own ways of knowing things,” she says.

When it comes to applications like medicine or law, machine translations need to be thoroughly checked by a human, she says. If not, misunderstandings can result. For example, when Google Translate was used to translate public health information about the covid-19 vaccine from the Virginia Department of Health in January 2021, it translated “not mandatory” in English into “not necessary” in Spanish, changing the whole meaning of the message.

AI models have much more examples to train on in some languages than others. This means current speech-to-speech models may be able to translate a language like Greek into English, where there may be many examples, but cannot translate from Swahili to Greek. The team behind Seamless aimed to solve this problem by pre-training the model on millions of hours of spoken audio in different languages. This pre-training allowed it to recognize general patterns in language, making it easier to process less widely spoken languages because it already had some baseline for what spoken language is supposed to sound like.

The system is open-source, which the researchers hope will encourage others to build upon its current capabilities. But some are skeptical of how useful it may be compared with available alternatives. “Google’s translation model is not as open-source as Seamless, but it’s way more responsive and fast, and it doesn’t cost anything as an academic,” says Jaiswal.

The most exciting thing about Meta’s system is that it points to the possibility of instant interpretation across languages in the not-too-distant future—like the Babel fish in Douglas Adams’ cult novel The Hitchhiker’s Guide to the Galaxy. SeamlessM4T is faster than existing models but still not instant. That said, Meta claims to have a newer version of Seamless that’s as fast as human interpreters.

“While having this kind of delayed translation is okay and useful, I think simultaneous translation will be even more useful,” says Kenny Zhu, director of the Arlington Computational Linguistics Lab at the University of Texas at Arlington, who is not affiliated with the new research.

Ecommerce MGMT 0 Comments

App Artificial intelligence The Algorithm

Jan 15 2025

Here’s our forecast for AI this year

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In December, our small but mighty AI reporting team was asked by our editors to make a prediction: What’s coming next for AI?

In 2024, AI contributed both to Nobel Prize–winning chemistry breakthroughs and a mountain of cheaply made content that few people asked for but that nonetheless flooded the internet. Take AI-generated Shrimp Jesus images, among other examples. There was also a spike in greenhouse-gas emissions last year that can be attributed partly to the surge in energy-intensive AI. Our team got to thinking about how all of this will shake out in the year to come.

As we look ahead, certain things are a given. We know that agents—AI models that do more than just converse with you and can actually go off and complete tasks for you—are the focus of many AI companies right now. Building them will raise lots of privacy questions about how much of our data and preferences we’re willing to give up in exchange for tools that will (allegedly) save us time. Similarly, the need to make AI faster and more energy efficient is putting so-called small language models in the spotlight.

We instead wanted to focus on less obvious predictions. Mine were about how AI companies that previously shunned work in defense and national security might be tempted this year by contracts from the Pentagon, and how Donald Trump’s attitudes toward China could escalate the global race for the best semiconductors. Read the full list.

What’s not evident in that story is that the other predictions were not so clear-cut. Arguments ensued about whether or not 2025 will be the year of intimate relationships with chatbots, AI throuples, or traumatic AI breakups. To witness the fallout from our team’s lively debates (and hear more about what didn’t make the list), you can join our upcoming LinkedIn Live this Thursday, January 16. I’ll be talking it all over with Will Douglas Heaven, our senior editor for AI, and our news editor, Charlotte Jee.

There are a couple other things I’ll be watching closely in 2025. One is how little the major AI players—namely OpenAI, Microsoft, and Google—are disclosing about the environmental burden of their models. Lots of evidence suggests that asking an AI model like ChatGPT about knowable facts, like the capital of Mexico, consumes much more energy (and releases far more emissions) than simply asking a search engine. Nonetheless, OpenAI’s Sam Altman in recent interviews has spoken positively about the idea of ChatGPT replacing the googling that we’ve all learned to do in the past two decades. It’s already happening, in fact.

The environmental cost of all this will be top of mind for me in 2025, as will the possible cultural cost. We will go from searching for information by clicking links and (hopefully) evaluating sources to simply reading the responses that AI search engines serve up for us. As our editor in chief, Mat Honan, said in his piece on the subject, “Who wants to have to learn when you can just know?”

Now read the rest of The Algorithm

Deeper Learning

What’s next for our privacy?

The US Federal Trade Commission has taken a number of enforcement actions against data brokers, some of which have tracked and sold geolocation data from users at sensitive locations like churches, hospitals, and military installations without explicit consent. Though limited in nature, these actions may offer some new and improved protections for Americans’ personal information.

Why it matters: A consensus is growing that Americans need better privacy protections—and that the best way to deliver them would be for Congress to pass comprehensive federal privacy legislation. Unfortunately, that’s not going to happen anytime soon. Enforcement actions from agencies like the FTC might be the next best thing in the meantime. Read more in Eileen Guo’s excellent story here.

Bits and Bytes

Meta trained its AI on a notorious piracy database

New court records, Wired reports, reveal that Meta used “a notorious so-called shadow library of pirated books that originated in Russia” to train its generative AI models. (Wired)

OpenAI’s top reasoning model struggles with the NYT Connections game

The game requires players to identify how groups of words are related. OpenAI’s o1 reasoning model had a hard time. (Mind Matters)

Anthropic’s chief scientist on 5 ways agents will be even better in 2025

The AI company Anthropic is now worth $60 billion. The company’s cofounder and chief scientist, Jared Kaplan, shared how AI agents will develop in the coming year. (MIT Technology Review)

A New York legislator attempts to regulate AI with a new bill

This year, a high-profile bill in California to regulate the AI industry was vetoed by Governor Gavin Newsom. Now, a legislator in New York is trying to revive the effort in his own state. (MIT Technology Review)

Ecommerce MGMT 0 Comments