AI models are using material from retracted scientific papers

Some AI chatbots rely on flawed research from retracted scientific papers to answer questions, according to recent studies. The findings, confirmed by MIT Technology Review, raise questions about how reliable AI tools are at evaluating scientific research and could complicate efforts by countries and industries seeking to invest in AI tools for scientists.

AI search tools and chatbots are already known to fabricate links and references. But answers based on the material from actual papers can mislead as well if those papers have been retracted. The chatbot is “using a real paper, real material, to tell you something,” says Weikuan Gu, a medical researcher at the University of Tennessee in Memphis and an author of one of the recent studies. But, he says, if people only look at the content of the answer and do not click through to the paper and see that it’s been retracted, that’s really a problem. 

Gu and his team asked OpenAI’s ChatGPT, running on the GPT-4o model, questions based on information from 21 retracted papers about medical imaging. The chatbot’s answers referenced retracted papers in five cases but advised caution in only three. While it cited non-retracted papers for other questions, the authors note that it may not have recognized the retraction status of the articles. In a study from August, a different group of researchers used ChatGPT-4o mini to evaluate the quality of 217 retracted and low-quality papers from different scientific fields; they found that none of the chatbot’s responses mentioned retractions or other concerns. (No similar studies have been released on GPT-5, which came out in August.)

The public uses AI chatbots to ask for medical advice and diagnose health conditions. Students and scientists increasingly use science-focused AI tools to review existing scientific literature and summarize papers. That kind of usage is likely to increase. The US National Science Foundation, for instance, invested $75 million in building AI models for science research this August.

“If [a tool is] facing the general public, then using retraction as a kind of quality indicator is very important,” says Yuanxi Fu, an information science researcher at the University of Illinois Urbana-Champaign. There’s “kind of an agreement that retracted papers have been struck off the record of science,” she says, “and the people who are outside of science—they should be warned that these are retracted papers.” OpenAI did not provide a response to a request for comment about the paper results.

The problem is not limited to ChatGPT. In June, MIT Technology Review tested AI tools specifically advertised for research work, such as Elicit, Ai2 ScholarQA (now part of the Allen Institute for Artificial Intelligence’s Asta tool), Perplexity, and Consensus, using questions based on the 21 retracted papers in Gu’s study. Elicit referenced five of the retracted papers in its answers, while Ai2 ScholarQA referenced 17, Perplexity 11, and Consensus 18—all without noting the retractions.

Some companies have since made moves to correct the issue. “Until recently, we didn’t have great retraction data in our search engine,” says Christian Salem, cofounder of Consensus. His company has now started using retraction data from a combination of sources, including publishers and data aggregators, independent web crawling, and Retraction Watch, which manually curates and maintains a database of retractions. In a test of the same papers in August, Consensus cited only five retracted papers. 

Elicit told MIT Technology Review that it removes retracted papers flagged by the scholarly research catalogue OpenAlex from its database and is “still working on aggregating sources of retractions.” Ai2 told us that its tool does not automatically detect or remove retracted papers currently. Perplexity said that it “[does] not ever claim to be 100% accurate.” 

However, relying on retraction databases may not be enough. Ivan Oransky, the cofounder of Retraction Watch, is careful not to describe it as a comprehensive database, saying that creating one would require more resources than anyone has: “The reason it’s resource intensive is because someone has to do it all by hand if you want it to be accurate.”

Further complicating the matter is that publishers don’t share a uniform approach to retraction notices. “Where things are retracted, they can be marked as such in very different ways,” says Caitlin Bakker from University of Regina, Canada, an expert in research and discovery tools. “Correction,” “expression of concern,” “erratum,” and “retracted” are among some labels publishers may add to research papers—and these labels can be added for many reasons, including concerns about the content, methodology, and data or the presence of conflicts of interest. 

Some researchers distribute their papers on preprint servers, paper repositories, and other websites, causing copies to be scattered around the web. Moreover, the data used to train AI models may not be up to date. If a paper is retracted after the model’s training cutoff date, its responses might not instantaneously reflect what’s going on, says Fu. Most academic search engines don’t do a real-time check against retraction data, so you are at the mercy of how accurate their corpus is, says Aaron Tay, a librarian at Singapore Management University.

Oransky and other experts advocate making more context available for models to use when creating a response. This could mean publishing information that already exists, like peer reviews commissioned by journals and critiques from the review site PubPeer, alongside the published paper.  

Many publishers, such as Nature and the BMJ, publish retraction notices as separate articles linked to the paper, outside paywalls. Fu says companies need to effectively make use of such information, as well as any news articles in a model’s training data that mention a paper’s retraction. 

The users and creators of AI tools need to do their due diligence. “We are at the very, very early stages, and essentially you have to be skeptical,” says Tay.

Ananya is a freelance science and technology journalist based in Bengaluru, India.

This medical startup uses LLMs to run appointments and make diagnoses

Imagine this: You’ve been feeling unwell, so you call up your doctor’s office to make an appointment. To your surprise, they schedule you in for the next day. At the appointment, you aren’t rushed through describing your health concerns; instead, you have a full half hour to share your symptoms and worries and the exhaustive details of your health history with someone who listens attentively and asks thoughtful follow-up questions. You leave with a diagnosis, a treatment plan, and the sense that, for once, you’ve been able to discuss your health with the care that it merits.

The catch? You might not have spoken to a doctor, or other licensed medical practitioner, at all.

This is the new reality for patients at a small number of clinics in Southern California that are run by the medical startup Akido Labs. These patients—some of whom are on Medicaid—can access specialist appointments on short notice, a privilege typically only afforded to the wealthy few who patronize concierge clinics.

The key difference is that Akido patients spend relatively little time, or even no time at all, with their doctors. Instead, they see a medical assistant, who can lend a sympathetic ear but has limited clinical training. The job of formulating diagnoses and concocting a treatment plan is done by a proprietary, LLM-based system called ScopeAI that transcribes and analyzes the dialogue between patient and assistant. A doctor then approves, or corrects, the AI system’s recommendations.

“Our focus is really on what we can do to pull the doctor out of the visit,” says Jared Goodner, Akido’s CTO. 

According to Prashant Samant, Akido’s CEO, this approach allows doctors to see four to five times as many patients as they could previously. There’s good reason to want doctors to be much more productive. Americans are getting older and sicker, and many struggle to access adequate health care. The pending 15% reduction in federal funding for Medicaid will only make the situation worse.

But experts aren’t convinced that displacing so much of the cognitive work of medicine onto AI is the right way to remedy the doctor shortage. There’s a big gap in expertise between doctors and AI-enhanced medical assistants, says Emma Pierson, a computer scientist at UC Berkeley.  Jumping such a gap may introduce risks. “I am broadly excited about the potential of AI to expand access to medical expertise,” she says. “It’s just not obvious to me that this particular way is the way to do it.”

AI is already everywhere in medicine. Computer vision tools identify cancers during preventive scans, automated research systems allow doctors to quickly sort through the medical literature, and LLM-powered medical scribes can take appointment notes on a clinician’s behalf. But these systems are designed to support doctors as they go about their typical medical routines.

What distinguishes ScopeAI, Goodner says, is its ability to independently complete the cognitive tasks that constitute a medical visit, from eliciting a patient’s medical history to coming up with a list of potential diagnoses to identifying the most likely diagnosis and proposing appropriate next steps.

Under the hood, ScopeAI is a set of large language models, each of which can perform a specific step in the visit—from generating appropriate follow-up questions based on what a patient has said to to populating a list of likely conditions. For the most part, these LLMs are fine-tuned versions of Meta’s open-access Llama models, though Goodner says that the system also makes use of Anthropic’s Claude models. 

During the appointment, assistants read off questions from the ScopeAI interface, and ScopeAI produces new questions as it analyzes what the patient says. For the doctors who will review its outputs later, ScopeAI produces a concise note that includes a summary of the patient’s visit, the most likely diagnosis, two or three alternative diagnoses, and recommended next steps, such as referrals or prescriptions. It also lists a justification for each diagnosis and recommendation.

ScopeAI is currently being used in cardiology, endocrinology, and primary care clinics and by Akido’s street medicine team, which serves the Los Angeles homeless population. That team—which is led by Steven Hochman, a doctor who specializes in addiction medicine—meets patients out in the community to help them access medical care, including treatment for substance use disorders. 

Previously, in order to prescribe a drug to treat an opioid addiction, Hochman would have to meet the patient in person; now, caseworkers armed with ScopeAI can interview patients on their own, and Hochman can approve or reject the system’s recommendations later. “It allows me to be in 10 places at once,” he says.

Since they started using ScopeAI, the team has been able to get patients access to medications to help treat their substance use within 24 hours—something that Hochman calls “unheard of.”

This arrangement is only possible because homeless patients typically get their health insurance from Medicaid, the public insurance system for low-income Americans. While Medicaid allows doctors to approve ScopeAI prescriptions and treatment plans asynchronously, both for street medicine and clinic visits, many other insurance providers require that doctors speak directly with patients before approving those recommendations. Pierson says that discrepancy raises concerns. “You worry about that exacerbating health disparities,” she says.

Samant is aware of the appearance of inequity, and he says the discrepancy isn’t intentional—it’s just a feature of how the insurance plans currently work. He also notes that being seen quickly by an AI-enhanced medical assistant may be better than dealing with long wait times and limited provider availability, which is the status quo for Medicaid patients. And all Akido patients can opt for traditional doctor’s appointments, if they are willing to wait for them, he says.

Part of the challenge of deploying a tool like ScopeAI is navigating a regulatory and insurance landscape that wasn’t designed for AI systems that can independently direct medical appointments. Glenn Cohen, a professor at Harvard Law School, says that any AI system that effectively acts as a “doctor in a box” would likely need to be approved by the FDA and could run afoul of medical licensure laws, which dictate that only doctors and other licensed professionals can practice medicine.

The California Medical Practice Act says that AI can’t replace a doctor’s responsibility to diagnose and treat a patient, but doctors are allowed to use AI in their work, and they don’t need to see patients in-person or in real-time before diagnosing them. Neither the FDA nor the Medical Board of California were able to say whether or not ScopeAI was on solid legal footing based only on a written description of the system.

But Samant is confident that Akido is in compliance, as ScopeAI was intentionally designed to fall short of being a “doctor in a box.” Because the system requires a human doctor to review and approve of all of its diagnostic and treatment recommendations, he says, it doesn’t require FDA approval. 

At the clinic, this delicate balance between AI and doctor decision making happens entirely behind the scenes. Patients don’t ever see the ScopeAI interface directly—instead, they speak with a medical assistant who asks questions in the way that a doctor might in a typical appointment. That arrangement might make patients feel more comfortable. But Zeke Emanuel, a professor of medical ethics and health policy at the University of Pennsylvania who served in the Obama and Biden administrations, worries that this comfort could be obscuring from patients the extent to which an algorithm is influencing their care.

Pierson agrees. “That certainly isn’t really what was traditionally meant by the human touch in medicine,” she says.

DeAndre Siringoringo, a medical assistant who works at Akido’s cardiology office in Rancho Cucamonga, says that while he tells the patients he works with that an AI system will be listening to the appointment in order to gather information for their doctor, he doesn’t inform them about the specifics of how ScopeAI works, including the fact that it makes diagnostic recommendations to doctors. 

Because all ScopeAI recommendations are reviewed by a doctor, that might not seem like such a big deal—it’s the doctor who makes the final diagnosis, not the AI. But it’s been widely documented that doctors using AI systems tend to go along with the system’s recommendations more often than they should, a phenomenon known as automation bias. 

At this point, it’s impossible to know whether automation bias is affecting doctors’ decisions at Akido clinics, though Pierson says it’s a risk—especially when doctors aren’t physically present for appointments. “I worry that it might predispose you to sort of nodding along in a way that you might not if you were actually in the room watching this happen,” she says.

An Akido spokesperson says that automation bias is a valid concern for any AI tool that assists a doctor’s decision-making and that the company has made efforts to mitigate that bias. “We designed ScopeAI specifically to reduce bias by proactively countering blind spots that can influence medical decisions, which historically lean heavily on physician intuition and personal experience,” she says. “We also train physicians explicitly on how to use ScopeAI thoughtfully, so they retain accountability and avoid over-reliance.”

Akido evaluates ScopeAI’s performance by testing it on historical data and monitoring how often doctors correct its recommendations; those corrections are also used to further train the underlying models. Before deploying ScopeAI in a given specialty, Akido ensures that when tested on historical data sets, the system includes the correct diagnosis in its top three recommendations at least 92% of the time.

But Akido hasn’t undertaken more rigorous testing, such as studies that compare ScopeAI appointments with traditional in-person or telehealth appointments, in order to determine whether the system improves—or at least maintains—patient outcomes. Such a study could help indicate whether automation bias is a meaningful concern.

“Making medical care cheaper and more accessible is a laudable goal,” Pierson says. “But I just think it’s important to conduct strong evaluations comparing to that baseline.”

An oil and gas giant signed a $1 billion deal with Commonwealth Fusion Systems

Eni, one of the world’s largest oil and gas companies, just agreed to buy $1 billion in electricity from a power plant being built by Commonwealth Fusion Systems. The deal is the latest to illustrate just how much investment Commonwealth and other fusion companies are courting as they attempt to take fusion power from the lab to the power grid. 

“This is showing in concrete terms that people that use large amounts of energy, that know the energy market—they want fusion power, and they’re willing to contract for it and to pay for it,” said Bob Mumgaard, cofounder and CEO of Commonwealth, on a press call about the deal.   

The agreement will see Eni purchase electricity from Commonwealth’s first commercial fusion power plant, in Virginia. The facility is still in the planning stages but is scheduled to come online in the early 2030s.

The news comes a few weeks after Commonwealth announced a $863 million funding round, bringing its total funding raised to date to nearly $3 billion. The fusion company also announced earlier this year that Google would be its first commercial power customer for the Virginia plant.

Commonwealth, a spinout from MIT’s Plasma Science and Fusion Center, is widely considered one of the leading companies in fusion power. Investment in the company represents nearly one-third of the total global investment in private fusion companies. (MIT Technology Review is owned by MIT but is editorially independent.)

Eni has invested in Commonwealth since 2018 and participated in the latest fundraising round. The vast majority of the company’s business is in oil and gas, but in recent years it’s made investments in technologies like biofuels and renewables.

“A company like us—we cannot stay and wait for things to happen,” says Lorenzo Fiorillo, Eni’s director of technology, research and development, and digital. 

One open question is what, exactly, Eni plans to do with this electricity. When asked about it on the press call, Fiorillo referenced wind and solar plants that Eni owns and said the plan “is not different from what we do in other areas in the US and the world.” (Eni sells electricity from power plants that it owns, including renewable and fossil-fuel plants.)

Commonwealth is building tokamak fusion reactors that use superconducting magnets to hold plasma in place. That plasma is where fusion reactions happen, forcing hydrogen atoms together to release large amounts of energy.

The company’s first demonstration reactor, which it calls Sparc, is over 65% complete, and the team is testing components and assembling them. The plan is for the reactor, which is located outside Boston, to make plasma within two years and then demonstrate that it can generate more energy than is required to run it.

While Sparc is still under construction, Commonwealth is working on plans for Arc, its first commercial power plant. That facility should begin construction in 2027 or 2028 and generate electricity for the grid in the early 2030s, Mumgaard says.

Despite the billions of dollars Commonwealth has already raised, the company still needs more money to build its Arc power plant—that will be a multibillion-dollar project, Mumgaard said on a press call in August about the company’s latest fundraising round. 

The latest commitment from Eni could help Commonwealth secure the funding it needs to get Arc built. “These agreements are a really good way to create the right environment for building up more investment,” says Paul Wilson, chair of the department of nuclear engineering and engineering physics at the University of Wisconsin, Madison.

Even though commercial fusion energy is still years away at a minimum, investors and big tech companies have pumped money into the industry and signed agreements to buy power from plants once they’re operational. 

Helion, another leading fusion startup, has plans to produce electricity from its first reactor in 2028 (an aggressive timeline that has some experts expressing skepticism). That facility will have a full generating capacity of 50 megawatts, and in 2023 Microsoft signed an agreement to purchase energy from the facility in order to help power its data centers.

As billions of dollars pour into the fusion industry, there are still many milestones ahead. To date, only the National Ignition Facility at Lawrence Livermore National Laboratory has demonstrated that a fusion reactor can generate more energy than the amount put into the reaction. No commercial project has achieved that yet. 

“There’s a lot of capital going out now to these startup companies,” says Ed Morse, a professor of nuclear engineering at the University of California, Berkeley. “What I’m not seeing is a peer-reviewed scientific article that makes me feel like, boy, we really turned the corner with the physics.”

But others are taking major commercial deals from Commonwealth and others as reasons to be optimistic. “Fusion is moving from the lab to be a proper industry,” says Sehila Gonzalez de Vicente, global director of fusion energy at the nonprofit Clean Air Task Force. “This is very good for the whole sector to be perceived as a real source of energy.”

Clean hydrogen is facing a big reality check

Hydrogen is sometimes held up as a master key for the energy transition. It can be made using several low-emissions methods and could play a role in cleaning up industries ranging from agriculture and chemicals to aviation and long-distance shipping.

This moment is a complicated one for the green fuel, though, as a new report from the International Energy Agency lays out. A number of major projects face cancellations and delays, especially in the US and Europe. The US in particular is seeing a slowdown after changes to key tax credits and cuts in support for renewable energy. Still, there are bright spots for the industry, including in China, and new markets could soon become crucial for growth.

Here are three things to know about the state of hydrogen in 2025.

1. Expectations for annual clean hydrogen production by 2030 are shrinking, for the first time.

    While hydrogen has the potential to serve as a clean fuel, today most is made with processes that use fossil fuels. As of 2025, about a million metric tons of low-emissions hydrogen are produced annually. That’s less than 1% of total hydrogen production.

    In last year’s Global Hydrogen Report, the IEA projected that global production of low-emissions hydrogen would grow to as high as 49 million metric tons annually by 2030. That prediction has been steadily climbing since 2021, as more places around the world sink money into developing and scaling up the technology.

    In the 2025 edition, though, the IEA’s production prediction had shrunk to 37 million metric tons annually by 2030.

    That’s still a major expansion from today’s numbers, but it’s the first time the agency has cut its predictions for the end of the decade. The report cited the cancellations of both electrolysis projects (those that use electricity to generate hydrogen) and carbon capture projects as reasons for the pullback. The cancelled and delayed projects included sites across Africa, the Americas, Europe, and Australia. 

    2. China is dominating production today and could produce competitively cheap green hydrogen by the end of the decade.

      Speaking of electrolysis projects, China is the driving force in manufacturing and development of electrolyzers, the devices that use electricity to generate green hydrogen, according to the new IEA report. As of July 2025, the country accounted for 65% of the installed or almost installed electrolyzer capacity in the world. It also manufactures nearly 60% of the world’s electrolyzers.

      A major barrier for clean hydrogen today is that dirty methods based on fossil fuels are just so much cheaper than cleaner ones.

      But China is well on its way to narrowing that gap. Today, it’s roughly three times more expensive to make and install an electrolyzer anywhere else in the world than in China. The country could produce green hydrogen that’s cost-competitive with fossil hydrogen by the end of the decade, according to the IEA report. That could make the fuel an obvious choice for both new and existing uses of hydrogen.

      3. Southeast Asia could be a major emerging market for low-emissions hydrogen.

        One region that could become a major player in the green hydrogen market is Southeast Asia. The economy is growing fast, and so is energy demand.

        There’s an existing market for hydrogen in Southeast Asia already. Today, the region uses about 4 million metric tons of hydrogen annually, largely in the oil refining industry and the chemical business, where it is used to make ammonia and methanol.

        International shipping is also concentrated in the region—the port of Singapore supplied about one-sixth of all the fuel used in global shipping in 2024, more than any other single location. Today, that total consists almost exclusively of fossil fuels. But there’s been work to test cleaner fuels, including methanol and ammonia, and interest in shifting to hydrogen in the longer term.

        Clean hydrogen could slot into these existing industries and help cut emissions. There are 25 projects under development right now in the region, though additional support for renewables will be crucial to getting significant capacity up and running.

        Overall, hydrogen is getting a reality check, revealing problems cutting through the hype we’ve seen in recent years. The next five years will tell whether the fuel can live up to the still-lofty hopes.  

        This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

        A pivotal meeting on vaccine guidance is underway—and former CDC leaders are alarmed

        This week has been an eventful one for America’s public health agency. Two former leaders of the US Centers for Disease Control and Prevention explained the reasons for their sudden departures from the agency in a Senate hearing. And they described how CDC employees are being instructed to turn their backs on scientific evidence.

        The CDC’s former director Susan Monarez and former chief medical officer Debra Houry took questions from a Senate committee on Wednesday. They painted a picture of a health agency in turmoil—and at risk of harming the people it is meant to serve.

        On Thursday, an advisory CDC panel that develops vaccine guidance met for a two-day discussion on multiple childhood vaccines. During the meeting, which was underway as The Checkup went to press, members of the panel were set to discuss those vaccines and propose recommendations on their use.

        Monarez worries that access to childhood vaccines is under threat—and that the public health consequences could be dire. “If vaccine protections are weakened, preventable diseases will return,” she said.

        As the current secretary of health and human services, Robert F. Kennedy Jr. oversees federal health and science agencies that include the CDC, which monitors and responds to threats to public health. Part of that role involves developing vaccine recommendations.

        As we’ve noted before, RFK Jr. has long been a prominent critic of vaccines. He has incorrectly linked commonly used ingredients to autism and made other incorrect statements about risks associated with various vaccines.

        Still, he oversaw the recruitment of Monarez—who does not share those beliefs—to lead the agency. When she was sworn in on July 31, Monarez, who is a microbiologist and immunologist, had already been serving as acting director of the agency. She had held prominent positions at other federal agencies and departments too, including the Advanced Research Projects Agency for Health (ARPA-H) and the Biomedical Advanced Research and Development Authority (BARDA). Kennedy described her as “a public health expert with unimpeachable scientific credentials.”

        His opinion seems to have changed somewhat since then. Just 29 days after Monarez took on her position, she was turfed out of the agency. And in yesterday’s hearing, she explained why.

        On August 25, Kennedy asked Monarez to do two things, she said. First, he wanted her to commit to firing scientists at the agency. And second, he wanted her to “pre-commit” to approve vaccine recommendations made by the agency’s Advisory Committee on Immunization Practices (ACIP), regardless of whether there was any scientific evidence to support those recommendations, she said. “He just wanted blanket approval,” she said during her testimony

        She refused both requests.

        Monarez testified that she didn’t want to get rid of hardworking scientists who played an important role in keeping Americans safe. And she said she could not commit to approving vaccine recommendations without reviewing the scientific evidence behind them and maintain her integrity. She was sacked.

        Those vaccine recommendations are currently under discussion, and scientists like Monarez are worried about how they might change. Kennedy fired all 17 members of the previous committee in June. (Monarez said she was not consulted on the firings and found out about them through media reports.)

        “A clean sweep is needed to reestablish public confidence in vaccine science,” Kennedy wrote in a piece for the Wall Street Journal at the time. He went on to replace those individuals with eight new members, some of whom have been prominent vaccine critics and have spread misinformation about vaccines. One later withdrew.

        That new panel met two weeks later. The meeting included a presentation about thimerosal—a chemical that Kennedy has incorrectly linked to autism, and which is no longer included in vaccines in the US—and a proposal to recommend that the MMRV vaccine (for measles, mumps, rubella, and varicella) not be offered to children under the age of four.

        Earlier this week, five new committee members were named. They include individuals who have advocated against vaccine mandates and who have argued that mRNA-based covid vaccines should be removed from the market.

        All 12 members are convening for a meeting that runs today and tomorrow. At that meeting, members will propose recommendations for the MMRV vaccine and vaccines for covid-19 and hepatitis B, according to an agenda published on the CDC website.

        Those are the recommendations for which Monarez says she was asked to provide “blanket approval.” “My worst fear is that I would then be in a position of approving something that reduces access [to] lifesaving vaccines to children and others who need them,” she said.

        That job now goes to Jim O’Neill, the deputy health secretary and acting CDC director (also a longevity enthusiast), who now holds the authority to approve those recommendations.

        We don’t yet know what those recommendations will be. But if they are approved, they could reshape access to vaccines for children and vulnerable people in the US. As six former chairs of the committee wrote for STAT: “ACIP is directly linked to the Vaccines for Children program, which provides vaccines without cost to approximately 50% of children in the US, and the Affordable Care Act that requires insurance coverage for ACIP-recommended vaccines to approximately 150 million people in the US.”

        Drops in vaccine uptake have already contributed to this year’s measles outbreak in the US, which is the biggest in decades. Two children have died. We are already seeing the impact of undermined trust in childhood vaccines. As Monarez put it: “The stakes are not theoretical.”

        This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

        How to measure the returns on R&D spending

        MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

        Given the draconian cuts to US federal funding for science, including the administration’s proposal to reduce the 2026 budgets of the National Institutes of Health by 40% and the National Science Foundation by 57%, it’s worth asking some hard-nosed money questions: How much should we be spending on R&D? How much value do we get out of such investments, anyway? To answer that, it’s important to look at both successful returns and investments that went nowhere.

        Sure, it’s easy to argue for the importance of spending on science by pointing out that many of today’s most useful technologies had their origins in government-funded R&D. The internet, CRISPR, GPS—the list goes on and on. All true. But this argument ignores all the technologies that received millions in government funding and haven’t gone anywhere—at least not yet. We still don’t have DNA computers or molecular electronics. Never mind the favorite examples cited by contrarian politicians of seemingly silly or frivolous science projects (think shrimp on treadmills).

        While cherry-picking success stories help illustrate the glories of innovation and the role of science in creating technologies that have changed our lives, it provides little guidance for how much we should spend in the future—and where the money should go.

        A far more useful approach to quantifying the value of R&D is to look at its return on investment (ROI). A favorite metric for stock pickers and PowerPoint-wielding venture capitalists, ROI weighs benefits versus costs. If applied broadly to the nation’s R&D funding, the same kind of thinking could help account for both the big wins and all the money spent on research that never got out of the lab.

        The problem is that it’s notoriously difficult to calculate returns for science funding—the payoffs can take years to appear and often take a circuitous route, so the eventual rewards are distant from the original funding. (Who could have predicted Uber as an outcome of GPS? For that matter, who could have predicted that the invention of ultra-precise atomic clocks in the late 1940s and 1950s would eventually make GPS possible?) And forget trying to track the costs of countless failures or apparent dead ends.

        But in several recent papers, economists have approached the problem in clever new ways, and though they ask slightly different questions, their conclusions share a bottom line: R&D is, in fact, one of the better long-term investments that the government can make.

        This story is part of MIT Technology Review’s “America Undone” series, examining how the foundations of US success in science and innovation are currently under threat. You can read the rest here.

        That might not seem very surprising. We’ve long thought that innovation and scientific advances are key to our prosperity. But the new studies provide much-needed details, supplying systematic and rigorous evidence for the impact that R&D funding, including public investment in basic science, has on overall economic growth.

        And the magnitude of the benefits is surprising.

        Bang for your buck

        In “A Calculation of the Social Returns to Innovation,” Benjamin Jones, an economist at Northwestern University, and Lawrence Summers, a Harvard economist and former US Treasury secretary, calculate the effects of the nation’s total R&D spending on gross domestic product and our overall standard of living. They’re taking on the big picture, and it’s ambitious because there are so many variables. But they are able to come up with a convincing range of estimates for the returns, all of them impressive.

        On the conservative end of their estimates, says Jones, investing $1 in R&D yields about $5 in returns—defined in this case as additional GDP per person (basically, how much richer we become). Change some of the assumptions—for example, by attempting to account for the value of better medicines and improved health care, which aren’t fully captured in GDP—and you get even larger payoffs.

        While the $5 return is at the low end of their estimates, it’s still “a remarkably good investment,” Jones says. “There aren’t many where you put in $1 and get $5 back.”

        That’s the return for the nation’s overall R&D funding. But what do we get for government-funded R&D in particular? Andrew Fieldhouse, an economist at Texas A&M, and Karel Mertens at the Federal Reserve Bank of Dallas looked specifically at how changes in public R&D spending affect the total factor productivity (TFP) of businesses. A favorite metric of economists, TFP is driven by new technologies and innovative business know-how—not by adding more workers or machines—and is the main driver of the nation’s prosperity over the long term.

        The economists tracked changes in R&D spending at five major US science funding agencies over many decades to see how the shifts eventually affected private-sector productivity. They found that the government was getting a huge bang for its nondefense R&D buck.

        The benefits begin kicking in after around five to 10 years and often have a long-lasting impact on the economy. Nondefense public R&D funding has been responsible for 20% to 25% of all private-sector productivity growth in the country since World War II, according to the economists. It’s an astonishing number, given that the government invests relatively little in nondefense R&D. For example, its spending on infrastructure, another contributor to productivity growth, has been far greater over those years.

        The large impact of public R&D investments also provides insight into one of America’s most troubling economic mysteries: the slowdown in productivity growth that began in the 1970s, which has roiled the country’s politics as many people face stunted living standards and limited financial prospects. Their research, says Fieldhouse, suggests that as much as a quarter of that slowdown was caused by a decline in public R&D funding that happened roughly over the same time.

        After reaching a high of 1.86% of GDP in 1964, federal R&D spending began dropping. Starting in the early 1970s, TFP growth also began to decline, from above 2% a year in the late 1960s to somewhere around 1% since the 1970s (with the exception of a rise during the late 1990s), roughly tracking the spending declines with a lag of a few years.

        If in fact the productivity slowdown was at least partially caused by a drop in public R&D spending, it’s evidence that we would be far richer today if we had kept up a higher level of science investment. And it also flags the dangers of today’s proposed cuts. “Based on our research,” says Fieldhouse, “I think it’s unambiguously clear that if you actually slash the budget of the NIH by 40%, if you slash the NSF budget by 50%, there’s going to be a deceleration in US productivity growth over the next seven to 10 years that will be measurable.”

        Out of whack

        Though the Trump administration’s proposed 2026 budget would slash science budgets to an unusual degree, public funding of R&D has actually been in slow decline for decades. Federal funding of science is at its lowest rate in the last 70 years, accounting for only around 0.6% of GDP.

        Even as public funding has dropped, business R&D investments have steadily risen. Today businesses spend far more than the government; in 2023, companies invested about $700 billion in R&D while the US government spent $172 billion, according to data from the NSF’s statistical agency. You might think, Good—let companies do research. It’s more efficient. It’s more focused. Keep the government out of it.

        But there is a big problem with that argument. Publicly funded research, it turns out, tends to lead to relatively more productivity growth over time because it skews more toward fundamental science than the applied work typically done by companies.

        In a new working paper called “Public R&D Spillovers and Productivity Growth,” Arnaud Dyèvre, an assistant professor at of economics at HEC Paris, documents the broad and often large impacts of so-called knowledge spillovers—the benefits that flow to others from work done by the original research group. Dyèvre found that the spillovers of public-funded R&D have three times more impact on productivity growth across businesses and industries than those from private R&D funding.

        The findings are preliminary, and Dyèvre is still updating the research—much of which he did as a postdoc at MIT—but he says it does suggest that the US “is underinvesting in fundamental R&D,” which is heavily funded by the government. “I wouldn’t be able to tell you exactly which percentage of R&D in the US needs to be funded by the government or what percent needs to be funded by the private sector. We need both,” he says. But, he adds, “the empirical evidence” suggests that “we’re out of balance.”

        The big question

        Getting the balance of funding for fundamental science and applied research right is just one of the big questions that remain around R&D funding. In mid-July, Open Philanthropy and the Alfred P. Sloan Foundation, both nonprofit organizations, jointly announced that they planned to fund a five-year “pop-up journal” that would attempt to answer many of the questions still swirling around how to define and optimize the ROI of research funding.

        “There is a lot of evidence consistent with a really high return to R&D, which suggests we should do more of it,” says Matt Clancy, a senior program officer at Open Philanthropy. “But when you ask me how much more, I don’t have a good answer. And when you ask me what types of R&D should get more funding, we don’t have a good answer.”

        Pondering such questions should keep innovation economists busy for the next several years. But there is another mystifying piece of the puzzle, says Northwestern’s Jones. If the returns on R&D investments are so high—the kind that most venture capitalists or investors would gladly take—why isn’t the government spending more?

        “I think it’s unambiguously clear that if you actually slash the budget of the NIH by 40%, if you slash the NSF budget by 50%, there’s going to be a deceleration in US productivity growth over the next seven to 10 years that will be measurable.”

        Jones, who served as a senior economic advisor in the Obama administration, says discussions over R&D budgets in Washington are often “a war of anecdotes.” Science advocates cite the great breakthroughs that resulted from earlier government funding, while budget hawks point to seemingly ludicrous projects or spectacular failures. Both have plenty of ammunition. “People go back and forth,” says Jones, “and it doesn’t really lead to anywhere.”

        The policy gridlock is rooted in in the very nature of fundamental research. Today’s science will lead to great advances. And there will be countless failures; a lot of money will be wasted on fruitless experiments. The problem, of course, is that when you’re deciding to fund new projects, it’s impossible to predict which the outcome will be, even in the case of odd, seemingly silly science. Guessing just what research will or will not lead to the next great breakthrough is a fool’s errand.

        Take the cuts in the administration’s proposed fiscal 2026 budget for the NSF, a leading funder of basic science. The administration’s summary begins with the assertion that its NSF budget “is prioritizing investments that complement private-sector R&D and offer strong potential to drive economic growth and strengthen U.S. technological leadership.” So far, so good. It cites the government’s commitment to AI and quantum information science. But dig deeper and you will see the contradictions in the numbers.

        Not only is NSF’s overall budget cut by 57%, but funding for physical sciences like chemistry and materials research—fields critical to advancing AI and quantum computers—has also been blown apart. Funding for the NSF’s mathematical and physical sciences program was reduced by 67%. The directorate for computer and information science and engineering fared little better; its research funding was cut by 66%.

        There is a great deal of hope among many in the science community that Congress, when it passes the actual 2026 budget, will at least partially reverse these cuts. We’ll see. But even if it does, why attack R&D funding in the first place? It’s impossible to answer that without plunging into the messy depths of today’s chaotic politics. And it is equally hard to know whether the recent evidence gathered by academic economists on the strong returns to R&D investments will matter when it comes to partisan policymaking.

        But at least those defending the value of public funding now have a far more productive way to make their argument, rather than simply touting past breakthroughs. Even for fiscal hawks and those pronouncing concerns about budget deficits, the recent work provides a compelling and simple conclusion: More public funding for basic science is a sound investment that makes us more prosperous.

        AI-designed viruses are here and already killing bacteria

        Artificial intelligence can draw cat pictures and write emails. Now the same technology can compose a working genome.

        A research team in California says it used AI to propose new genetic codes for viruses—and managed to get several of these viruses to replicate and kill bacteria.

        The scientists, based at Stanford University and the nonprofit Arc Institute, both in Palo Alto, say the germs with AI-written DNA represent the “the first generative design of complete genomes.”

        The work, described in a preprint paper, has the potential to create new treatments and accelerate research into artificially engineered cells. It is also an “impressive first step” toward AI-designed life forms, says Jef Boeke, a biologist at NYU Langone Health, who was provided an advance copy of the paper by MIT Technology Review.  

        Boeke says the AI’s performance was surprisingly good and that its ideas were unexpected. “They saw viruses with new genes, with truncated genes, and even different gene orders and arrangements,” he says.

        This is not yet AI-designed life, however. That’s because viruses are not alive. They’re more like renegade bits of genetic code with relatively puny, simple genomes. 

        In the new work, researchers at the Arc Institute sought to develop variants of a bacteriophage—a virus that infects bacteria—called phiX174, which has only 11 genes and about 5,000 DNA letters.

        To do so, they used two versions of an AI called Evo, which works on the same principles as large language models like ChatGPT. Instead of feeding them textbooks and blog posts to learn from, the scientists trained the models on the genomes of about 2 million other bacteriophage viruses.

        But would the genomes proposed by the AI make any sense? To find out, the California researchers chemically printed 302 of the genome designs as DNA strands and then mixed those with E. coli bacteria.

        That led to a profound “AI is here” moment when, one night, the scientists saw plaques of dead bacteria in their petri dishes. They later took microscope pictures of the tiny viral particles, which look like fuzzy dots.

        “That was pretty striking, just actually seeing, like, this AI-generated sphere,” says Brian Hie, who leads the lab at the Arc Institute where the work was carried out.

        Overall, 16 of the 302 designs ended up working—that is, the computer-designed phage started to replicate, eventually bursting through the bacteria and killing them.

        J. Craig Venter, who created some of the first organisms with lab-made DNA nearly two decades ago, says the AI methods look to him like “just a faster version of trial-and-error experiments.”

        For instance, when a team he led managed to create a bacterium with a lab-printed genome in 2008, it was after a long hit-or-miss process of testing out different genes. “We did the manual AI version—combing through the literature, taking what was known,” he says. 

        But speed is exactly why people are betting AI will transform biology. The new methods already claimed a Nobel Prize in 2024 for predicting protein shapes. And investors are staking billions that AI can find new drugs. This week a Boston company, Lila, raised $235 million to build automated labs run by artificial intelligence.

        Computer-designed viruses could also find commercial uses. For instance, doctors have sometimes tried “phage therapy” to treat patients with serious bacterial infections. Similar tests are underway to cure cabbage of black rot, also caused by bacteria.

        “There is definitely a lot of potential for this technology,” says Samuel King, the student who spearheaded the project in Hei’s lab. He notes that most gene therapy uses viruses to shuttle genes into patients’ bodies, and AI might develop more effective ones.

        The Stanford researchers say they purposely haven’t taught their AI about viruses that can infect people. But this type of technology does create the risk that other scientists—out of curiosity, good intentions, or malice—could turn the methods on human pathogens, exploring new dimensions of lethality.

        “One area where I urge extreme caution is any viral enhancement research, especially when it’s random so you don’t know what you are getting,” says Venter. “If someone did this with smallpox or anthrax, I would have grave concerns.”

        Whether an AI can generate a bona fide genome for a larger organism remains an open question. For instance, E. coli has about a thousand times more DNA code than phiX174 does. “The complexity would rocket from staggering to … way way more than the number of subatomic particles in the universe,” says Boeke.

        Also, there’s still no easy way to test AI designs for larger genomes. While some viruses can “boot up” from just a DNA strand, that’s not the case with a bacterium, a mammoth, or a human. Scientists would instead have to gradually change an existing cell with genetic engineering—a still laborious process.

        Despite that, Jason Kelly, the CEO of Ginkgo Bioworks, a cell-engineering company in Boston, says exactly such an effort is needed. He believes it could be carried out in “automated” laboratories where genomes get proposed and tested and the results are fed back to AI for further improvement.

         “This would be a nation-scale scientific milestone, as cells are the building blocks of all life,” says Kelly. “The US should make sure we get to it first.”

        The looming crackdown on AI companionship

        As long as there has been AI, there have been people sounding alarms about what it might do to us: rogue superintelligence, mass unemployment, or environmental ruin from data center sprawl. But this week showed that another threat entirely—that of kids forming unhealthy bonds with AI—is the one pulling AI safety out of the academic fringe and into regulators’ crosshairs.

        This has been bubbling for a while. Two high-profile lawsuits filed in the last year, against Character.AI and OpenAI, allege that companion-like behavior in their models contributed to the suicides of two teenagers. A study by US nonprofit Common Sense Media, published in July, found that 72% of teenagers have used AI for companionship. Stories in reputable outlets about “AI psychosis” have highlighted how endless conversations with chatbots can lead people down delusional spirals.

        It’s hard to overstate the impact of these stories. To the public, they are proof that AI is not merely imperfect, but a technology that’s more harmful than helpful. If you doubted that this outrage would be taken seriously by regulators and companies, three things happened this week that might change your mind.

        A California law passes the legislature

        On Thursday, the California state legislature passed a first-of-its-kind bill. It would require AI companies to include reminders for users they know to be minors that responses are AI generated. Companies would also need to have a protocol for addressing suicide and self-harm and provide annual reports on instances of suicidal ideation in users’ conversations with their chatbots. It was led by Democratic state senator Steve Padilla, passed with heavy bipartisan support, and now awaits Governor Gavin Newsom’s signature. 

        There are reasons to be skeptical of the bill’s impact. It doesn’t specify efforts companies should take to identify which users are minors, and lots of AI companies already include referrals to crisis providers when someone is talking about suicide. (In the case of Adam Raine, one of the teenagers whose survivors are suing, his conversations with ChatGPT before his death included this type of information, but the chatbot allegedly went on to give advice related to suicide anyway.)

        Still, it is undoubtedly the most significant of the efforts to rein in companion-like behaviors in AI models, which are in the works in other states too. If the bill becomes law, it would strike a blow to the position OpenAI has taken, which is that “America leads best with clear, nationwide rules, not a patchwork of state or local regulations,” as the company’s chief global affairs officer, Chris Lehane, wrote on LinkedIn last week.

        The Federal Trade Commission takes aim

        The very same day, the Federal Trade Commission announced an inquiry into seven companies, seeking information about how they develop companion-like characters, monetize engagement, measure and test the impact of their chatbots, and more. The companies are Google, Instagram, Meta, OpenAI, Snap, X, and Character Technologies, the maker of Character.AI.

        The White House now wields immense, and potentially illegal, political influence over the agency. In March, President Trump fired its lone Democratic commissioner, Rebecca Slaughter. In July, a federal judge ruled that firing illegal, but last week the US Supreme Court temporarily permitted the firing.

        “Protecting kids online is a top priority for the Trump-Vance FTC, and so is fostering innovation in critical sectors of our economy,” said FTC chairman Andrew Ferguson in a press release about the inquiry. 

        Right now, it’s just that—an inquiry—but the process might (depending on how public the FTC makes its findings) reveal the inner workings of how the companies build their AI companions to keep users coming back again and again. 

        Sam Altman on suicide cases

        Also on the same day (a busy day for AI news), Tucker Carlson published an hour-long interview with OpenAI’s CEO, Sam Altman. It covers a lot of ground—Altman’s battle with Elon Musk, OpenAI’s military customers, conspiracy theories about the death of a former employee—but it also includes the most candid comments Altman’s made so far about the cases of suicide following conversations with AI. 

        Altman talked about “the tension between user freedom and privacy and protecting vulnerable users” in cases like these. But then he offered up something I hadn’t heard before.

        “I think it’d be very reasonable for us to say that in cases of young people talking about suicide seriously, where we cannot get in touch with parents, we do call the authorities,” he said. “That would be a change.”

        So where does all this go next? For now, it’s clear that—at least in the case of children harmed by AI companionship—companies’ familiar playbook won’t hold. They can no longer deflect responsibility by leaning on privacy, personalization, or “user choice.” Pressure to take a harder line is mounting from state laws, regulators, and an outraged public.

        But what will that look like? Politically, the left and right are now paying attention to AI’s harm to children, but their solutions differ. On the right, the proposed solution aligns with the wave of internet age-verification laws that have now been passed in over 20 states. These are meant to shield kids from adult content while defending “family values.” On the left, it’s the revival of stalled ambitions to hold Big Tech accountable through antitrust and consumer-protection powers. 

        Consensus on the problem is easier than agreement on the cure. As it stands, it looks likely we’ll end up with exactly the patchwork of state and local regulations that OpenAI (and plenty of others) have lobbied against. 

        For now, it’s down to companies to decide where to draw the lines. They’re having to decide things like: Should chatbots cut off conversations when users spiral toward self-harm, or would that leave some people worse off? Should they be licensed and regulated like therapists, or treated as entertainment products with warnings? The uncertainty stems from a basic contradiction: Companies have built chatbots to act like caring humans, but they’ve postponed developing the standards and accountability we demand of real caregivers. The clock is now running out.

        This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

        How do AI models generate videos?

        MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

        It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV.

        Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable. 

        The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation. 

        With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work.

        How do you generate a video?

        Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted. 

        So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion. 

        What’s a diffusion model?

        Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set. 

        A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. 

        The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set. 

        But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt. 

        An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography).

        It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image. 

        What’s a latent diffusion model? 

        All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest. 

        A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video. 

        And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch.  

        With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.) 

        What’s a latent diffusion transformer?

        Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video. 

        Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences. 

        But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora.

        A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan

        Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example. 

        And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats. 

        What about the audio? 

        A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.” 

        The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.  

        You said that diffusion models can generate different kinds of data. Is this how LLMs work too? 

        No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text. 

        Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

        Texas banned lab-grown meat. What’s next for the industry?

        Last week, a legal battle over lab-grown meat kicked off in Texas. On September 1, a two-year ban on the technology went into effect across the state; the following day, two companies filed a lawsuit against state officials.

        The two companies, Wildtype Foods and Upside Foods, are part of a growing industry that aims to bring new types of food to people’s plates. These products, often called cultivated meat by the industry, take live animal cells and grow them in the lab to make food products without the need to slaughter animals.

        Texas joins six other US states and the country of Italy in banning these products. These legal challenges are adding barriers to an industry that’s still in its infancy and already faces plenty of challenges before it can reach consumers in a meaningful way.

        The agriculture sector makes up a hefty chunk of global greenhouse-gas emissions, with livestock alone accounting for somewhere between 10% and 20% of climate pollution. Alternative meat products, including those grown in a lab, could help cut the greenhouse gases from agriculture.

        The industry is still in its early days, though. In the US, just a handful of companies can legally sell products including cultivated chicken, pork fat, and salmon. Australia, Singapore, and Israel also allow a few companies to sell within their borders.

        Upside Foods, which makes cultivated chicken, was one of the first to receive the legal go-ahead to sell its products in the US, in 2022. Wildtype Foods, one of the latest additions to the US market, was able to start selling its cultivated salmon in June.

        Upside, Wildtype, and other cultivated-meat companies are still working to scale up production. Products are generally available at pop-up events or on special menus at high-end restaurants. (I visited San Francisco to try Upside’s cultivated chicken at a Michelin-starred restaurant a few years ago.)

        Until recently, the only place you could reliably find lab-grown meat in Texas was a sushi restaurant in Austin. Otoko featured Wildtype’s cultivated salmon on a special tasting menu starting in July. (The chef told local publication Culture Map Austin that the cultivated fish tastes like wild salmon, and it was included in a dish with grilled yellowtail to showcase it side-by-side with another type of fish.)

        The as-yet-limited reach of lab-grown meat didn’t stop state officials from moving to ban the technology, effective from now until September 2027.

        The office of state senator Charles Perry, the author of the bill, didn’t respond to requests for comment. Neither did the Texas and Southwestern Cattle Raisers Association, whose president, Carl Ray Polk Jr., testified in support of the bill in a March committee hearing.

        “The introduction of lab-grown meat could disrupt traditional livestock markets, affecting rural communities and family farms,” Perry said during the meeting.

        In an interview with the Texas Tribune, Polk said the two-year moratorium would help the industry put checks and balances in place before the products could be sold. He also expressed concern about how clearly cultivated-meat companies will be labeling their products.

        “The purpose of these bans is to try to kill the cultivated-meat industry before it gets off the ground,” said Myra Pasek, general counsel of Upside Foods, via email. The company is working to scale up its manufacturing and get the product on the market, she says, “but that can’t happen if we’re not allowed to compete in the marketplace.”

        Others in the industry have similar worries. “Moratoriums on sale like this not only deny Texans new choices and economic growth, but they also send chilling signals to researchers and entrepreneurs across the country,” said Pepin Andrew Tuma, the vice president of policy and government relations for the Good Food Institute, a nonprofit think tank focused on alternative proteins, in a statement. (The group isn’t involved in the lawsuit.) 

        One day after the moratorium took effect on September 1, Wildtype Foods and Upside Foods filed a lawsuit challenging the ban, naming Jennifer Shuford, commissioner of the Texas Department of State Health Services, among other state officials.

        A lawsuit wasn’t necessarily part of the scale-up plan. “This was really a last resort for us,” says Justin Kolbeck, cofounder and CEO of Wildtype.

        Growing cells to make meat in the lab isn’t easy—some companies have spent a decade or more trying to make significant amounts of a product that people want to eat. These legal battles certainly aren’t going to help. 

        This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.