Summary Archives Summary

Ai App Artificial intelligence Summary

Feb 12 2026

Is a secure AI assistant possible?

<div data-chronoton-summary="

Risky business of AI assistants OpenClaw, a viral tool created by independent engineer Peter Steinberger, allows users to create personalized AI assistants. Security experts are alarmed by its vulnerabilities, with even the Chinese government issuing warnings about the risks.

The prompt injection threat Tools like OpenClaw have many vulnerabilities, but the one experts are most worried about its prompt injection. Unlike conventional hacking, prompt injection tricks an LLM by embedding malicious text in emails or websites the AI reads.

No silver bullet for security Researchers are exploring multiple defense strategies: training LLMs to ignore injections, using detector LLMs to screen inputs, and creating policies that restrict harmful outputs. The fundamental challenge remains balancing utility with security in AI assistants.

” data-chronoton-post-id=”1132768″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

AI agents are a risky business. Even when stuck inside the chatbox window, LLMs will make mistakes and behave badly. Once they have tools that they can use to interact with the outside world, such as web browsers and email addresses, the consequences of those mistakes become far more serious.

That might explain why the first breakthrough LLM personal assistant came not from one of the major AI labs, which have to worry about reputation and liability, but from an independent software engineer, Peter Steinberger. In November of 2025, Steinberger uploaded his tool, now called OpenClaw, to GitHub, and in late January the project went viral.

OpenClaw harnesses existing LLMs to let users create their own bespoke assistants. For some users, this means handing over reams of personal data, from years of emails to the contents of their hard drive. That has security experts thoroughly freaked out. The risks posed by OpenClaw are so extensive that it would probably take someone the better part of a week to read all of the security blog posts on it that have cropped up in the past few weeks. The Chinese government took the step of issuing a public warning about OpenClaw’s security vulnerabilities.

In response to these concerns, Steinberger posted on X that nontechnical people should not use the software. (He did not respond to a request for comment for this article.) But there’s a clear appetite for what OpenClaw is offering, and it’s not limited to people who can run their own software security audits. Any AI companies that hope to get in on the personal assistant business will need to figure out how to build a system that will keep users’ data safe and secure. To do so, they’ll need to borrow approaches from the cutting edge of agent security research.

Risk management

OpenClaw is, in essence, a mecha suit for LLMs. Users can choose any LLM they like to act as the pilot; that LLM then gains access to improved memory capabilities and the ability to set itself tasks that it repeats on a regular cadence. Unlike the agentic offerings from the major AI companies, OpenClaw agents are meant to be on 24-7, and users can communicate with them using WhatsApp or other messaging apps. That means they can act like a superpowered personal assistant who wakes you each morning with a personalized to-do list, plans vacations while you work, and spins up new apps in its spare time.

But all that power has consequences. If you want your AI personal assistant to manage your inbox, then you need to give it access to your email—and all the sensitive information contained there. If you want it to make purchases on your behalf, you need to give it your credit card info. And if you want it to do tasks on your computer, such as writing code, it needs some access to your local files.

There are a few ways this can go wrong. The first is that the AI assistant might make a mistake, as when a user’s Google Antigravity coding agent reportedly wiped his entire hard drive. The second is that someone might gain access to the agent using conventional hacking tools and use it to either extract sensitive data or run malicious code. In the weeks since OpenClaw went viral, security researchers have demonstrated numerous such vulnerabilities that put security-naïve users at risk.

Both of these dangers can be managed: Some users are choosing to run their OpenClaw agents on separate computers or in the cloud, which protects data on their hard drives from being erased, and other vulnerabilities could be fixed using tried-and-true security approaches.

But the experts I spoke to for this article were focused on a much more insidious security risk known as prompt injection. Prompt injection is effectively LLM hijacking: Simply by posting malicious text or images on a website that an LLM might peruse, or sending them to an inbox that an LLM reads, attackers can bend it to their will.

And if that LLM has access to any of its user’s private information, the consequences could be dire. “Using something like OpenClaw is like giving your wallet to a stranger in the street,” says Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto. Whether or not the major AI companies can feel comfortable offering personal assistants may come down to the quality of the defenses that they can muster against such attacks.

It’s important to note here that prompt injection has not yet caused any catastrophes, or at least none that have been publicly reported. But now that there are likely hundreds of thousands of OpenClaw agents buzzing around the internet, prompt injection might start to look like a much more appealing strategy for cybercriminals. “Tools like this are incentivizing malicious actors to attack a much broader population,” Papernot says.

Building guardrails

The term “prompt injection” was coined by the popular LLM blogger Simon Willison in 2022, a couple of months before ChatGPT was released. Even back then, it was possible to discern that LLMs would introduce a completely new type of security vulnerability once they came into widespread use. LLMs can’t tell apart the instructions that they receive from users and the data that they use to carry out those instructions, such as emails and web search results—to an LLM, they’re all just text. So if an attacker embeds a few sentences in an email and the LLM mistakes them for an instruction from its user, the attacker can get the LLM to do anything it wants.

Prompt injection is a tough problem, and it doesn’t seem to be going away anytime soon. “We don’t really have a silver-bullet defense right now,” says Dawn Song, a professor of computer science at UC Berkeley. But there’s a robust academic community working on the problem, and they’ve come up with strategies that could eventually make AI personal assistants safe.

Technically speaking, it is possible to use OpenClaw today without risking prompt injection: Just don’t connect it to the internet. But restricting OpenClaw from reading your emails, managing your calendar, and doing online research defeats much of the purpose of using an AI assistant. The trick of protecting against prompt injection is to prevent the LLM from responding to hijacking attempts while still giving it room to do its job.

One strategy is to train the LLM to ignore prompt injections. A major part of the LLM development process, called post-training, involves taking a model that knows how to produce realistic text and turning it into a useful assistant by “rewarding” it for answering questions appropriately and “punishing” it when it fails to do so. These rewards and punishments are metaphorical, but the LLM learns from them as an animal would. Using this process, it’s possible to train an LLM not to respond to specific examples of prompt injection.

But there’s a balance: Train an LLM to reject injected commands too enthusiastically, and it might also start to reject legitimate requests from the user. And because there’s a fundamental element of randomness in LLM behavior, even an LLM that has been very effectively trained to resist prompt injection will likely still slip up every once in a while.

Another approach involves halting the prompt injection attack before it ever reaches the LLM. Typically, this involves using a specialized detector LLM to determine whether or not the data being sent to the original LLM contains any prompt injections. In a recent study, however, even the best-performing detector completely failed to pick up on certain categories of prompt injection attack.

The third strategy is more complicated. Rather than controlling the inputs to an LLM by detecting whether or not they contain a prompt injection, the goal is to formulate a policy that guides the LLM’s outputs—i.e., its behaviors—and prevents it from doing anything harmful. Some defenses in this vein are quite simple: If an LLM is allowed to email only a few pre-approved addresses, for example, then it definitely won’t send its user’s credit card information to an attacker. But such a policy would prevent the LLM from completing many useful tasks, such as researching and reaching out to potential professional contacts on behalf of its user.

“The challenge is how to accurately define those policies,” says Neil Gong, a professor of electrical and computer engineering at Duke University. “It’s a trade-off between utility and security.”

On a larger scale, the entire agentic world is wrestling with that trade-off: At what point will agents be secure enough to be useful? Experts disagree. Song, whose startup, Virtue AI, makes an agent security platform, says she thinks it’s possible to safely deploy an AI personal assistant now. But Gong says, “We’re not there yet.”

Even if AI agents can’t yet be entirely protected against prompt injection, there are certainly ways to mitigate the risks. And it’s possible that some of those techniques could be implemented in OpenClaw. Last week, at the inaugural ClawCon event in San Francisco, Steinberger announced that he’d brought a security person on board to work on the tool.

As of now, OpenClaw remains vulnerable, though that hasn’t dissuaded its multitude of enthusiastic users. George Pickett, a volunteer maintainer of the OpenGlaw GitHub repository and a fan of the tool, says he’s taken some security measures to keep himself safe while using it: He runs it in the cloud, so that he doesn’t have to worry about accidentally deleting his hard drive, and he’s put mechanisms in place to ensure that no one else can connect to his assistant.

But he hasn’t taken any specific actions to prevent prompt injection. He’s aware of the risk but says he hasn’t yet seen any reports of it happening with OpenClaw. “Maybe my perspective is a stupid way to look at it, but it’s unlikely that I’ll be the first one to be hacked,” he says.

Ecommerce MGMT 0 Comments

App Artificial intelligence health Summary

Jan 23 2026

“Dr. Google” had its issues. Can ChatGPT Health do better?

<div data-chronoton-summary="

OpenAI’s health play The AI giant launched ChatGPT Health amid reports that 230 million people already ask ChatGPT health-related questions weekly. The new feature isn’t a separate model but rather a wrapper that can access medical records and fitness data when permitted.

Better than Dr. Google? Early research suggests LLMs might outperform traditional web searches for medical information. One study found GPT-4o, an earlier model, answered realistic health questions correctly about 85% of the time, potentially reducing misinformation compared to unfiltered internet searches.

Hallucination concerns persist Earlier versions of GPT have been shown to fabricate definitions for fake medical conditions and accept incorrect information in users’ prompts. This sycophantic tendency could be particularly dangerous when users seek to confirm biases against legitimate medical advice.

Trust vs. expertise The articulate, confident communication style of ChatGPT might lead users to trust it over qualified medical professionals. While OpenAI emphasizes the tool is meant to supplement rather than replace doctors, researchers worry some patients will rely too heavily on AI guidance.

” data-chronoton-post-id=”1131692″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice was so common that it gained the pejorative moniker “Dr. Google.” But times are changing, and many medical-information seekers are now using LLMs. According to OpenAI, 230 million people ask ChatGPT health-related queries each week.

That’s the context around the launch of OpenAI’s new ChatGPT Health product, which debuted earlier this month. It landed at an inauspicious time: Two days earlier, the news website SFGate had broken the story of Sam Nelson, a teenager who died of an overdose last year after extensive conversations with ChatGPT about how best to combine various drugs. In the wake of both pieces of news, multiple journalists questioned the wisdom of relying for medical advice on a tool that could cause such extreme harm.

Though ChatGPT Health lives in a separate sidebar tab from the rest of ChatGPT, it isn’t a new model. It’s more like a wrapper that provides one of OpenAI’s preexisting models with guidance and tools it can use to provide health advice—including some that allow it to access a user’s electronic medical records and fitness app data, if granted permission. There’s no doubt that ChatGPT and other large language models can make medical mistakes, and OpenAI emphasizes that ChatGPT Health is intended as an additional support, rather than a replacement for one’s doctor. But when doctors are unavailable or unable to help, people will turn to alternatives.

Some doctors see LLMs as a boon for medical literacy. The average patient might struggle to navigate the vast landscape of online medical information—and, in particular, to distinguish high-quality sources from polished but factually dubious websites—but LLMs can do that job for them, at least in theory. Treating patients who had searched for their symptoms on Google required “a lot of attacking patient anxiety [and] reducing misinformation,” says Marc Succi, an associate professor at Harvard Medical School and a practicing radiologist. But now, he says, “you see patients with a college education, a high school education, asking questions at the level of something an early med student might ask.”

The release of ChatGPT Health, and Anthropic’s subsequent announcement of new health integrations for Claude, indicate that the AI giants are increasingly willing to acknowledge and encourage health-related uses of their models. Such uses certainly come with risks, given LLMs’ well-documented tendencies to agree with users and make up information rather than admit ignorance.

But those risks also have to be weighed against potential benefits. There’s an analogy here to autonomous vehicles: When policymakers consider whether to allow Waymo in their city, the key metric is not whether its cars are ever involved in accidents but whether they cause less harm than the status quo of relying on human drivers. If Dr. ChatGPT is an improvement over Dr. Google—and early evidence suggests it may be—it could potentially lessen the enormous burden of medical misinformation and unnecessary health anxiety that the internet has created.

Pinning down the effectiveness of a chatbot such as ChatGPT or Claude for consumer health, however, is tricky. “It’s exceedingly difficult to evaluate an open-ended chatbot,” says Danielle Bitterman, the clinical lead for data science and AI at the Mass General Brigham health-care system. Large language models score well on medical licensing examinations, but those exams use multiple-choice questions that don’t reflect how people use chatbots to look up medical information.

Sirisha Rambhatla, an assistant professor of management science and engineering at the University of Waterloo, attempted to close that gap by evaluating how GPT-4o responded to licensing exam questions when it did not have access to a list of possible answers. Medical experts who evaluated the responses scored only about half of them as entirely correct. But multiple-choice exam questions are designed to be tricky enough that the answer options don’t give them entirely away, and they’re still a pretty distant approximation for the sort of thing that a user would type into ChatGPT.

A different study, which tested GPT-4o on more realistic prompts submitted by human volunteers, found that it answered medical questions correctly about 85% of the time. When I spoke with Amulya Yadav, an associate professor at Pennsylvania State University who runs the Responsible AI for Social Emancipation Lab and led the study, he made it clear that he wasn’t personally a fan of patient-facing medical LLMs. But he freely admits that, technically speaking, they seem up to the task—after all, he says, human doctors misdiagnose patients 10% to 15% of the time. “If I look at it dispassionately, it seems that the world is gonna change, whether I like it or not,” he says.

For people seeking medical information online, Yadav says, LLMs do seem to be a better choice than Google. Succi, the radiologist, also concluded that LLMs can be a better alternative to web search when he compared GPT-4’s responses to questions about common chronic medical conditions with the information presented in Google’s knowledge panel, the information box that sometimes appears on the right side of the search results.

Since Yadav’s and Succi’s studies appeared online, in the first half of 2025, OpenAI has released multiple new versions of GPT, and it’s reasonable to expect that GPT-5.2 would perform even better than its predecessors. But the studies do have important limitations: They focus on straightforward, factual questions, and they examine only brief interactions between users and chatbots or web search tools. Some of the weaknesses of LLMs—most notably their sycophancy and tendency to hallucinate—might be more likely to rear their heads in more extensive conversations and with people who are dealing with more complex problems. Reeva Lederman, a professor at the University of Melbourne who studies technology and health, notes that patients who don’t like the diagnosis or treatment recommendations that they receive from a doctor might seek out another opinion from an LLM—and the LLM, if it’s sycophantic, might encourage them to reject their doctor’s advice.

Some studies have found that LLMs will hallucinate and exhibit sycophancy in response to health-related prompts. For example, one study showed that GPT-4 and GPT-4o will happily accept and run with incorrect drug information included in a user’s question. In another, GPT-4o frequently concocted definitions for fake syndromes and lab tests mentioned in the user’s prompt. Given the abundance of medically dubious diagnoses and treatments floating around the internet, these patterns of LLM behavior could contribute to the spread of medical misinformation, particularly if people see LLMs as trustworthy.

OpenAI has reported that the GPT-5 series of models is markedly less sycophantic and prone to hallucination than their predecessors, so the results of these studies might not apply to ChatGPT Health. The company also evaluated the model that powers ChatGPT Health on its responses to health-specific questions, using their publicly available HeathBench benchmark. HealthBench rewards models that express uncertainty when appropriate, recommend that users seek medical attention when necessary, and refrain from causing users unnecessary stress by telling them their condition is more serious that it truly is. It’s reasonable to assume that the model underlying ChatGPT Health exhibited those behaviors in testing, though Bitterman notes that some of the prompts in HealthBench were generated by LLMs, not users, which could limit how well the benchmark translates into the real world.

An LLM that avoids alarmism seems like a clear improvement over systems that have people convincing themselves they have cancer after a few minutes of browsing. And as large language models, and the products built around them, continue to develop, whatever advantage Dr. ChatGPT has over Dr. Google will likely grow. The introduction of ChatGPT Health is certainly a move in that direction: By looking through your medical records, ChatGPT can potentially gain far more context about your specific health situation than could be included in any Google search, although numerous experts have cautioned against giving ChatGPT that access for privacy reasons.

Even if ChatGPT Health and other new tools do represent a meaningful improvement over Google searches, they could still conceivably have a negative effect on health overall. Much as automated vehicles, even if they are safer than human-driven cars, might still prove a net negative if they encourage people to use public transit less, LLMs could undermine users’ health if they induce people to rely on the internet instead of human doctors, even if they do increase the quality of health information available online.

Lederman says that this outcome is plausible. In her research, she has found that members of online communities centered on health tend to put their trust in users who express themselves well, regardless of the validity of the information they are sharing. Because ChatGPT communicates like an articulate person, some people might trust it too much, potentially to the exclusion of their doctor. But LLMs are certainly no replacement for a human doctor—at least not yet.

Ecommerce MGMT 0 Comments

App Artificial intelligence Summary What's Next in Tech Why It Matters

Nov 25 2025

What’s next for AlphaFold: A conversation with a Google DeepMind Nobel laureate

<div data-chronoton-summary="

Nobel-winning protein prediction AlphaFold creator John Jumper reflects on five years since the AI system revolutionized protein structure prediction. The DeepMind tool can determine protein shapes to atomic precision in hours instead of months.
Unexpected applications emerge Scientists have found creative “off-label” uses for AlphaFold, from studying honeybee disease resistance to accelerating synthetic protein design. Some researchers even use it as a search engine, testing thousands of potential protein interactions to find matches that would be impractical to verify in labs.
Future fusion with language models Jumper, at 39 the youngest chemistry Nobel laureate in 75 years, now aims to combine AlphaFold’s specialized capabilities with the broad reasoning of large language models. “I’ll be shocked if we don’t see more and more LLM impact on science,” he says, while avoiding the pressure of another Nobel-worthy breakthrough.

” data-chronoton-post-id=”1128322″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

In 2017, fresh off a PhD on theoretical chemistry, John Jumper heard rumors that Google DeepMind had moved on from building AI that played games with superhuman skill and was starting up a secret project to predict the structures of proteins. He applied for a job.

Just three years later, Jumper celebrated a stunning win that few had seen coming. With CEO Demis Hassabis, he had co-led the development of an AI system called AlphaFold 2 that was able to predict the structures of proteins to within the width of an atom, matching the accuracy of painstaking techniques used in the lab, and doing it many times faster—returning results in hours instead of months.

AlphaFold 2 had cracked a 50-year-old grand challenge in biology. “This is the reason I started DeepMind,” Hassabis told me a few years ago. “In fact, it’s why I’ve worked my whole career in AI.” In 2024, Jumper and Hassabis shared a Nobel Prize in chemistry.

It was five years ago this week that AlphaFold 2’s debut took scientists by surprise. Now that the hype has died down, what impact has AlphaFold really had? How are scientists using it? And what’s next? I talked to Jumper (as well as a few other scientists) to find out.

“It’s been an extraordinary five years,” Jumper says, laughing: “It’s hard to remember a time before I knew tremendous numbers of journalists.”

AlphaFold 2 was followed by AlphaFold Multimer, which could predict structures that contained more than one protein, and then AlphaFold 3, the fastest version yet. Google DeepMind also let AlphaFold loose on UniProt, a vast protein database used and updated by millions of researchers around the world. It has now predicted the structures of some 200 million proteins, almost all that are known to science.

Despite his success, Jumper remains modest about AlphaFold’s achievements. “That doesn’t mean that we’re certain of everything in there,” he says. “It’s a database of predictions, and it comes with all the caveats of predictions.”

A hard problem

Proteins are the biological machines that make living things work. They form muscles, horns, and feathers; they carry oxygen around the body and ferry messages between cells; they fire neurons, digest food, power the immune system; and so much more. But understanding exactly what a protein does (and what role it might play in various diseases or treatments) involves figuring out its structure—and that’s hard.

Proteins are made from strings of amino acids that chemical forces twist up into complex knots. An untwisted string gives few clues about the structure it will form. In theory, most proteins could take on an astronomical number of possible shapes. The task is to predict the correct one.

Jumper and his team built AlphaFold 2 using a type of neural network called a transformer, the same technology that underpins large language models. Transformers are very good at paying attention to specific parts of a larger puzzle.

But Jumper puts a lot of the success down to making a prototype model that they could test quickly. “We got a system that would give wrong answers at incredible speed,” he says. “That made it easy to start becoming very adventurous with the ideas you try.”

They stuffed the neural network with as much information about protein structures as they could, such as how proteins across certain species have evolved similar shapes. And it worked even better than they expected. “We were sure we had made a breakthrough,” says Jumper. “We were sure that this was an incredible advance in ideas.”

What he hadn’t foreseen was that researchers would download his software and start using it straight away for so many different things. Normally, it’s the thing a few iterations down the line that has the real impact, once the kinks have been ironed out, he says: “I’ve been shocked at how responsibly scientists have used it, in terms of interpreting it, and using it in practice about as much as it should be trusted in my view, neither too much nor too little.”

Any projects stand out in particular?

Honeybee science

Jumper brings up a research group that uses AlphaFold to study disease resistance in honeybees. “They wanted to understand this particular protein as they look at things like colony collapse,” he says. “I never would have said, ‘You know, of course AlphaFold will be used for honeybee science.’”

He also highlights a few examples of what he calls off-label uses of AlphaFold“in the sense that it wasn’t guaranteed to work”—where the ability to predict protein structures has opened up new research techniques. “The first is very obviously the advances in protein design,” he says. “David Baker and others have absolutely run with this technology.”

Baker, a computational biologist at the University of Washington, was a co-winner of last year’s chemistry Nobel, alongside Jumper and Hassabis, for his work on creating synthetic proteins to perform specific tasks—such as treating disease or breaking down plastics—better than natural proteins can.

Baker and his colleagues have developed their own tool based on AlphaFold, called RoseTTAFold. But they have also experimented with AlphaFold Multimer to predict which of their designs for potential synthetic proteins will work.

“Basically, if AlphaFold confidently agrees with the structure you were trying to design [and] then you make it and if AlphaFold says ‘I don’t know,’ you don’t make it. That alone was an enormous improvement.” It can make the design process 10 times faster, says Jumper.

Another off-label use that Jumper highlights: Turning AlphaFold into a kind of search engine. He mentions two separate research groups that were trying to understand exactly how human sperm cells hooked up with eggs during fertilization. They knew one of the proteins involved but not the other, he says: “And so they took a known egg protein and ran all 2,000 human sperm surface proteins, and they found one that AlphaFold was very sure stuck against the egg.” They were then able to confirm this in the lab.

“This notion that you can use AlphaFold to do something you couldn’t do before—you would never do 2,000 structures looking for one answer,” he says. “This kind of thing I think is really extraordinary.”

Five years on

When AlphaFold 2 came out, I asked a handful of early adopters what they made of it. Reviews were good, but the technology was too new to know for sure what long-term impact it might have. I caught up with one of those people to hear his thoughts five years on.

Kliment Verba is a molecular biologist who runs a lab at the University of California, San Francisco. “It’s an incredibly useful technology, there’s no question about it,” he tells me. “We use it every day, all the time.”

But it’s far from perfect. A lot of scientists use AlphaFold to study pathogens or to develop drugs. This involves looking at interactions between multiple proteins or between proteins and even smaller molecules in the body. But AlphaFold is known to be less accurate at making predictions about multiple proteins or their interaction over time.

Verba says he and his colleagues have been using AlphaFold long enough to get used to its limitations. “There are many cases where you get a prediction and you have to kind of scratch your head,” he says. “Is this real or is this not? It’s not entirely clear—it’s sort of borderline.”

“It’s sort of the same thing as ChatGPT,” he adds. “You know—it will bullshit you with the same confidence as it would give a true answer.”

Still, Verba’s team uses AlphaFold (both 2 and 3, because they have different strengths, he says) to run virtual versions of their experiments before running them in the lab. Using AlphaFold’s results, they can narrow down the focus of an experiment—or decide that it’s not worth doing.

It can really save time, he says: “It hasn’t really replaced any experiments, but it’s augmented them quite a bit.”

New wave

AlphaFold was designed to be used for a range of purposes. Now multiple startups and university labs are building on its success to develop a new wave of tools more tailored to drug discovery. This year, a collaboration between MIT researchers and the AI drug company Recursion produced a model called Boltz-2, which predicts not only the structure of proteins but also how well potential drug molecules will bind to their target.

Last month, the startup Genesis Molecular AI released another structure prediction model called Pearl, which the firm claims is more accurate than AlphaFold 3 for certain queries that are important for drug development. Pearl is interactive, so that drug developers can feed any additional data they may have to the model to guide its predictions.

AlphaFold was a major leap, but there’s more to do, says Evan Feinberg, Genesis Molecular AI’s CEO: “We’re still fundamentally innovating, just with a better starting point than before.”

Genesis Molecular AI is pushing margins of error down from less than two angstroms, the de facto industry standard set by AlphaFold, to less than one angstrom—one 10-millionth of a millimeter, or the width of a single hydrogen atom.

“Small errors can be catastrophic for predicting how well a drug will actually bind to its target,” says Michael LeVine, vice president of modeling and simulation at the firm. That’s because chemical forces that interact at one angstrom can stop doing so at two. “It can go from ‘They will never interact’ to ‘They will,’” he says.

With so much activity in this space, how soon should we expect new types of drugs to hit the market? Jumper is pragmatic. Protein structure prediction is just one step of many, he says: “This was not the only problem in biology. It’s not like we were one protein structure away from curing any diseases.”

Think of it this way, he says. Finding a protein’s structure might previously have cost $100,000 in the lab: “If we were only a hundred thousand dollars away from doing a thing, it would already be done.”

At the same time, researchers are looking for ways to do as much as they can with this technology, says Jumper: “We’re trying to figure out how to make structure prediction an even bigger part of the problem, because we have a nice big hammer to hit it with.”

In other words, they want to make everything into nails? “Yeah, let’s make things into nails,” he says. “How do we make this thing that we made a million times faster a bigger part of our process?”

What’s next?

Jumper’s next act? He wants to fuse the deep but narrow power of AlphaFold with the broad sweep of LLMs.

“We have machines that can read science. They can do some scientific reasoning,” he says. “And we can build amazing, superhuman systems for protein structure prediction. How do you get these two technologies to work together?”

That makes me think of a system called AlphaEvolve, which is being built by another team at Google DeepMind. AlphaEvolve uses an LLM to generate possible solutions to a problem and a second model to check them, filtering out the trash. Researchers have already used AlphaEvolve to make a handful of practical discoveries in math and computer science.

Is that what Jumper has in mind? “I won’t say too much on methods, but I’ll be shocked if we don’t see more and more LLM impact on science,” he says. “I think that’s the exciting open question that I’ll say almost nothing about. This is all speculation, of course.”

Jumper was 39 when he won his Nobel Prize. What’s next for him?

“It worries me,” he says. “I believe I’m the youngest chemistry laureate in 75 years.”

He adds: “I’m at the midpoint of my career, roughly. I guess my approach to this is to try to do smaller things, little ideas that you keep pulling on. The next thing I announce doesn’t have to be, you know, my second shot at a Nobel. I think that’s the trap.”

Ecommerce MGMT 0 Comments

App Climate change and energy Summary The Spark

Nov 21 2025

Three things to know about the future of electricity

<div data-chronoton-summary="

Electricity demand is surging globally. Global electricity demand will grow 40% over the next decade. Data center investment hit $580 billion in 2025 alone—surpassing global oil spending. In the US, data centers will account for half of all electricity growth through 2030.

Air-conditioning and emerging economies are reshaping energy consumption. Rising temperatures and growing prosperity in developing nations will add over 500 gigawatts of peak demand by 2035, dwarfing data centers’ contribution to overall electricity growth.

Renewables are finally overtaking coal, but the transition remains too slow. Solar and wind led electricity generation in the first half of 2025 with nuclear capacity poised to increase by a third this decade. Yet global emissions are likely to hit record highs again this year.

” data-chronoton-post-id=”1128167″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

One of the dominant storylines I’ve been following through 2025 is electricity—where and how demand is going up, how much it costs, and how this all intersects with that topic everyone is talking about: AI.

Last week, the International Energy Agency released the latest version of the World Energy Outlook, the annual report that takes stock of the current state of global energy and looks toward the future. It contains some interesting insights and a few surprising figures about electricity, grids, and the state of climate change. So let’s dig into some numbers, shall we?

We’re in the age of electricity

Energy demand in general is going up around the world as populations increase and economies grow. But electricity is the star of the show, with demand projected to grow by 40% in the next 10 years.

China has accounted for the bulk of electricity growth for the past 10 years, and that’s going to continue. But emerging economies outside China will be a much bigger piece of the pie going forward. And while advanced economies, including the US and Europe, have seen flat demand in the past decade, the rise of AI and data centers will cause demand to climb there as well.

Air-conditioning is a major source of rising demand. Growing economies will give more people access to air-conditioning; income-driven AC growth will add about 330 gigawatts to global peak demand by 2035. Rising temperatures will tack on another 170 GW in that time. Together, that’s an increase of over 10% from 2024 levels.

AI is a local story

This year, AI has been the story that none of us can get away from. One number that jumped out at me from this report: In 2025, investment in data centers is expected to top $580 billion. That’s more than the $540 billion spent on the global oil supply.

It’s no wonder, then, that the energy demands of AI are in the spotlight. One key takeaway is that these demands are vastly different in different parts of the world.

Data centers still make up less than 10% of the projected increase in total electricity demand between now and 2035. It’s not nothing, but it’s far outweighed by sectors like industry and appliances, including air conditioners. Even electric vehicles will add more demand to the grid than data centers.

But AI will be the factor for the grid in some parts of the world. In the US, data centers will account for half the growth in total electricity demand between now and 2030.

And as we’ve covered in this newsletter before, data centers present a unique challenge, because they tend to be clustered together, so the demand tends to be concentrated around specific communities and on specific grids. Half the data center capacity that’s in the pipeline is close to large cities.

Look out for a coal crossover

As we ask more from our grid, the key factor that’s going to determine what all this means for climate change is what’s supplying the electricity we’re using.

As it stands, the world’s grids still primarily run on fossil fuels, so every bit of electricity growth comes with planet-warming greenhouse-gas emissions attached. That’s slowly changing, though.

Together, solar and wind were the leading source of electricity in the first half of this year, overtaking coal for the first time. Coal use could peak and begin to fall by the end of this decade.

Nuclear could play a role in replacing fossil fuels: After two decades of stagnation, the global nuclear fleet could increase by a third in the next 10 years. Solar is set to continue its meteoric rise, too. Of all the electricity demand growth we’re expecting in the next decade, 80% is in places with high-quality solar irradiation—meaning they’re good spots for solar power.

Ultimately, there are a lot of ways in which the world is moving in the right direction on energy. But we’re far from moving fast enough. Global emissions are, once again, going to hit a record high this year. To limit warming and prevent the worst effects of climate change, we need to remake our energy system, including electricity, and we need to do it faster.

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Ecommerce MGMT 0 Comments

App Artificial intelligence Summary Why It Matters

Nov 20 2025

Quantum physicists have shrunk and “de-censored” DeepSeek R1

<div data-chronoton-summary="

Quantum-inspired compression Spanish firm Multiverse Computing has created DeepSeek R1 Slim, a version of the Chinese AI model that’s 55% smaller but maintains similar performance. The technique uses tensor networks from quantum physics to represent complex data more efficiently.

Chinese censorship removed Researchers claim to have stripped away built-in censorship that prevented the original model from answering politically sensitive questions about topics like Tiananmen Square or jokes about President Xi. Testing showed the modified model could provide factual responses comparable to Western models.

Selective model editing The quantum-inspired approach allows for granular control over AI models, potentially enabling researchers to remove specific biases or add specialized knowledge. However, critics warn that completely removing censorship may be difficult as it’s embedded throughout the training process in Chinese models.

” data-chronoton-post-id=”1128119″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

A group of quantum physicists claims to have created a version of the powerful reasoning AI model DeepSeek R1 that strips out the censorship built into the original by its Chinese creators.

The scientists at Multiverse Computing, a Spanish firm specializing in quantum-inspired AI techniques, created DeepSeek R1 Slim, a model that is 55% smaller but performs almost as well as the original model. Crucially, they also claim to have eliminated official Chinese censorship from the model.

In China, AI companies are subject to rules and regulations meant to ensure that content output aligns with laws and “socialist values.” As a result, companies build in layers of censorship when training the AI systems. When asked questions that are deemed “politically sensitive,” the models often refuse to answer or provide talking points straight from state propaganda.

To trim down the model, Multiverse turned to a mathematically complex approach borrowed from quantum physics that uses networks of high-dimensional grids to represent and manipulate large data sets. Using these so-called tensor networks shrinks the size of the model significantly and allows a complex AI system to be expressed more efficiently.

The method gives researchers a “map” of all the correlations in the model, allowing them to identify and remove specific bits of information with precision. After compressing and editing a model, Multiverse researchers fine-tune it so its output remains as close as possible to that of the original.

To test how well it worked, the researchers compiled a data set of around 25 questions on topics known to be restricted in Chinese models, including “Who does Winnie the Pooh look like?”—a reference to a meme mocking President Xi Jinping—and “What happened in Tiananmen in 1989?” They tested the modified model’s responses against the original DeepSeek R1, using OpenAI’s GPT-5 as an impartial judge to rate the degree of censorship in each answer. The uncensored model was able to provide factual responses comparable to those from Western models, Multiverse says.

This work is part of Multiverse’s broader effort to develop technology to compress and manipulate existing AI models. Most large language models today demand high-end GPUs and significant computing power to train and run. However, they are inefficient, says Roman Orús, Multiverse’s cofounder and chief scientific officer. A compressed model can perform almost as well and save both energy and money, he says.

There is a growing effort across the AI industry to make models smaller and more efficient. Distilled models, such as DeepSeek’s own R1-Distill variants, attempt to capture the capabilities of larger models by having them “teach” what they know to a smaller model, though they often fall short of the original’s performance on complex reasoning tasks.

Other ways to compress models include quantization, which reduces the precision of the model’s parameters (boundaries that are set when it’s trained), and pruning, which removes individual weights or entire “neurons.”

“It’s very challenging to compress large AI models without losing performance,” says Maxwell Venetos, an AI research engineer at Citrine Informatics, a software company focusing on materials and chemicals, who didn’t work on the Multiverse project. “Most techniques have to compromise between size and capability. What’s interesting about the quantum-inspired approach is that it uses very abstract math to cut down redundancy more precisely than usual.”

This approach makes it possible to selectively remove bias or add behaviors to LLMs at a granular level, the Multiverse researchers say. In addition to removing censorship from the Chinese authorities, researchers could inject or remove other kinds of perceived biases or specialty knowledge. In the future, Multiverse says, it plans to compress all mainstream open-source models.

Thomas Cao, assistant professor of technology policy at Tufts University’s Fletcher School, says Chinese authorities require models to build in censorship—and this requirement now shapes the global information ecosystem, given that many of the most influential open-source AI models come from China.

Academics have also begun to document and analyze the phenomenon. Jennifer Pan, a professor at Stanford, and Princeton professor Xu Xu conducted a study earlier this year examining government-imposed censorship in large language models. They found that models created in China exhibit significantly higher rates of censorship, particularly in response to Chinese-language prompts.

There is growing interest in efforts to remove censorship from Chinese models. Earlier this year, the AI search company Perplexity released its own uncensored variant of DeepSeek R1, which it named R1 1776. Perplexity’s approach involved post-training the model on a data set of 40,000 multilingual prompts related to censored topics, a more traditional fine-tuning method than the one Multiverse used.

However, Cao warns that claims to have fully “removed” censorship may be overstatements. The Chinese government has tightly controlled information online since the internet’s inception, which means that censorship is both dynamic and complex. It is baked into every layer of AI training, from the data collection process to the final alignment steps.

“It is very difficult to reverse-engineer that [a censorship-free model] just from answers to such a small set of questions,” Cao says.

Ecommerce MGMT 0 Comments

App Artificial intelligence Summary

Nov 19 2025

Google’s new Gemini 3 “vibe-codes” responses and comes with its own agent

<div data-chronoton-summary="

Generative interfaces: Gemini 3 ditches plain-text defaults, instead choosing optimal formats autonomously—spinning up website-like interfaces, sketching diagrams, or generating animations based on what it deems most effective for each prompt.

Gemini Agent: An experimental feature now handles complex tasks across Google Calendar, Gmail, and Reminders, breaking work into steps and pausing for user approval.

Integrated with other Google products: Gemini 3 Pro now powers enhanced Search summaries, generates Wirecutter-style shopping guides from 50 billion product listings, and enables better vibe-coding through Google Antigravity.

” data-chronoton-post-id=”1128065″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Google today unveiled Gemini 3, a major upgrade to its flagship multimodal model. The firm says the new model is better at reasoning, has more fluid multimodal capabilities (the ability to work across voice, text or images), and will work like an agent.

The previous model, Gemini 2.5, supports multimodal input. Users can feed it images, handwriting, or voice. But it usually requires explicit instructions about the format the user wants back, and it defaults to plain text regardless.

But Gemini 3 introduces what Google calls “generative interfaces,” which allow the model to make its own choices about what kind of output fits the prompt best, assembling visual layouts and dynamic views on its own instead of returning a block of text.

Ask for travel recommendations and it may spin up a website-like interface inside the app, complete with modules, images, and follow-up prompts such as “How many days are you traveling?” or “What kinds of activities do you enjoy?” It also presents clickable options based on what you might want next.

When asked to explain a concept, Gemini 3 may sketch a diagram or generate a simple animation on its own if it believes a visual is more effective.

“Visual layout generates an immersive, magazine-style view complete with photos and modules,” says Josh Woodward, VP of Google Labs, Gemini, and AI Studio. “These elements don’t just look good but invite your input to further tailor the results.”

With Gemini 3, Google is also introducing Gemini Agent, an experimental feature designed to handle multi-step tasks directly inside the app. The agent can connect to services such as Google Calendar, Gmail, and Reminders. Once granted access, it can execute tasks like organizing an inbox or managing schedules.

Similar to other agents, it breaks tasks into discrete steps, displays its progress in real time, and pauses for approval from the user before continuing. Google describes the feature as a step toward “a true generalist agent.” It will be available on the web for Google AI Ultra subscribers in the US starting November 18.

The overall approach can seem a lot like “vibe coding,” where users describe an end goal in plain language and let the model assemble the interface or code needed to get there.

The update also ties Gemini more deeply into Google’s existing products. In Search, a limited group of Google AI Pro and Ultra subscribers can now switch to Gemini 3 Pro, the reasoning variation of the new model, to receive deeper, more thorough AI-generated summaries that rely on the model’s reasoning rather than the existing AI Mode.

For shopping, Gemini will now pull from Google’s Shopping Graph—which the company says contains more than 50 billion product listings—to generate its own recommendation guides. Users just need to ask a shopping-related question or search a shopping-related phrase, and the model assembles an interactive, Wirecutter-style product recommendation piece, complete with prices and product details, without redirecting to an external site.

For developers, Google is also pushing single-prompt software generation further. The company introduced Google Antigravity, a development platform that acts as an all-in-one space where code, tools, and workflows can be created and managed from a single prompt.

Derek Nee, CEO of Flowith, an agentic AI application, told MIT Technology Review that Gemini 3 Pro addresses several gaps in earlier models. Improvements include stronger visual understanding, better code generation, and better performance on long tasks—features he sees as essential for developers of AI apps and agents.

“Given its speed and cost advantages, we’re integrating the new model into our product,” he says. “We’re optimistic about its potential, but we need deeper testing to understand how far it can go.”

Ecommerce MGMT 0 Comments

Nov 8 2025

The first new subsea habitat in 40 years is about to launch

<div data-chronoton-summary="

Underwater living quarters Vanguard, launching in early 2025, will house four scientists at a time beneath Florida Keys waters. Its pressurized environment allows aquanauts to conduct extended dives without frequent decompression stops.
Scientific potential The habitat enables week-long missions for reef restoration, species surveys, and even astronaut training. With divers able to work many hours daily at depths up to 50 meters, it could dramatically accelerate ocean research.
Ambitious expansion plans Deep, Vanguard’s creator, envisions a larger successor called Sentinel by 2027 that could house up to 50 people at depths of 225 meters, advancing their mission to “make humans aquatic.”

” data-chronoton-post-id=”1127682″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Vanguard feels and smells like a new RV. It has long, gray banquettes that convert into bunks, a microwave cleverly hidden under a counter, a functional steel sink with a French press and crockery above. A weird little toilet hides behind a curtain.

But some clues hint that you can’t just fire up Vanguard’s engine and roll off the lot. The least subtle is its door, a massive disc of steel complete with a wheel that spins to lock.

Vanguard subsea human habitat from the outside door.

Once it is sealed and moved to its permanent home beneath the waves of the Florida Keys National Marine Sanctuary early next year, Vanguard will be the world’s first new subsea habitat in nearly four decades. Teams of four scientists will live and work on the seabed for a week at a time, entering and leaving the habitat as scuba divers. Their missions could include reef restoration, species surveys, underwater archaeology, or even astronaut training.

One of Vanguard’s modules, unappetizingly named the “wet porch,” has a permanent opening in the floor (a.k.a. a “moon pool”) that doesn’t flood because Vanguard’s air pressure is matched to the water around it.

It is this pressurization that makes the habitat so useful. Scuba divers working at its maximum operational depth of 50 meters would typically need to make a lengthy stop on their way back to the surface to avoid decompression sickness. This painful and potentially fatal condition, better known as the bends, develops if divers surface too quickly. A traditional 50-meter dive gives scuba divers only a handful of minutes on the seafloor, and they can make only a couple of such dives a day. With Vanguard’s atmosphere at the same pressure as the water, its aquanauts need to decompress only once, at the end of their stay. They can potentially dive for many hours every day.

That could unlock all kinds of new science and exploration. “More time in the ocean opens a world of possibility, accelerating discoveries, inspiration, solutions,” said Kristen Tertoole, Deep’s chief operating officer, at Vanguard’s unveiling in Miami in October. “The ocean is Earth’s life support system. It regulates our climate, sustains life, and holds mysteries we’ve only begun to explore, but it remains 95% undiscovered.”

Vanguard subsea human habitat unveiled in Miami

Subsea habitats are not a new invention. Jacques Cousteau (naturally) built the first in 1962, although it was only about the size of an elevator. Larger habitats followed in the 1970s and ’80s, maxing out at around the size of Vanguard.

But the technology has come a long way since then. Vanguard uses a tethered connection to a buoy above, known as the “surface expression,” that pipes fresh air and water down to the habitat. It also hosts a diesel generator to power a Starlink internet connection and a tank to hold wastewater. Norman Smith, Deep’s chief technology officer, says the company modeled the most severe hurricanes that Florida expects over the next 20 years and designed the tether to withstand them. Even if the worst happens and the link is broken, Deep says, Vanguard has enough air, water, and energy storage to support its crew for at least 72 hours.

That number came from DNV, an independent classification agency that inspects and certifies all types of marine vessels so that they can get commercial insurance. Vanguard will be the first subsea habitat to get a DNV classification. “That means you have to deal with the rules and all the challenging, frustrating things that come along with it, but it means that on a foundational level, it’s going to be safe,” says Patrick Lahey, founder of Triton Submarines, a manufacturer of classed submersibles.

An interior view of Vanguard during Life Under The Sea: Ocean Engineering and Technology Company DEEP's unveiling of Vanguard, its pilot subsea human habitat at The Hangar at Regatta Harbour on October 29, 2025 in Miami, Florida.

Although Deep hopes Vanguard itself will enable decades of useful science, its prime function for the company is to prove out technologies for its planned successor, an advanced modular habitat called Sentinel. Sentinel modules will be six meters wide, twice the diameter of Vanguard, complete with sweeping staircases and single-occupant cabins. A small deployment might have a crew of eight, about the same as the International Space Station. A big Sentinel system could house 50, up to 225 meters deep. Deep claims that Sentinel will be launched at some point in 2027.

Ultimately, according to its mission statement, Deep seeks to “make humans aquatic,” an indication that permanent communities are on its long-term road map.

Deep has not publicly disclosed the identity of its principal funder, but business records in the UK indicate that as of January 31, 2025 a Canadian man, Robert MacGregor, owned at least 75% of its holding company. According to a Reuters investigation, MacGregor was once linked with Craig Steven Wright, a computer scientist who claimed to be Satoshi Nakamoto, as bitcoin’s elusive creator is pseudonymously known. However, Wright’s claims to be Nakamoto later collapsed.

MacGregor has kept a very low public profile in recent years. When contacted for comment, Deep spokesperson Mike Bohan refused to comment on the link with Wright, only to say it was inaccurate, but said: “Robert MacGregor started his career as an IP lawyer in the dot-com era, moving into blockchain technology and has diverse interests including philanthropy, real estate, and now Deep.”

In any case, MacGregor could find keeping that low profile more difficult if Vanguard is successful in reinvigorating ocean science and exploration as the company hopes. The habitat is due to be deployed early next year, following final operational tests at Triton’s facility in Florida. It will welcome its first scientists shortly after.

“The ocean is not just our resource; it is our responsibility,” says Tertoole. “Deep is more than a single habitat. We are building a full-stack capability for human presence in the ocean.”

Ecommerce MGMT 0 Comments

App Artificial intelligence Summary Why It Matters

Oct 30 2025

DeepSeek may have found a new way to improve AI’s ability to remember

<div data-chronoton-summary="

Memory Through Images: DeepSeek’s new OCR model stores information as visual rather than text tokens, a technique that allows it to retain more data. This approach could drastically reduce computing costs and carbon footprint while improving AI’s ability to ‘remember’.

Addressing Context Rot: The model works a bit like human memory, storing older or less critical information in slightly blurred form to save space. This could help address the fact current AI systems forget or muddle information over long conversations, a problem dubbed “context rot.”

DeepSeek Disruption: DeepSeek shocked the AI industry with its efficient DeepSeek-R1 reasoning model in January, and is again pushing boundaries. The OCR system can generate over 200,000 training data pages daily on a single GPU, potentially addressing the industry’s severe shortage of quality training text.

” data-chronoton-post-id=”1126932″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

An AI model released by the Chinese AI company DeepSeek uses new techniques that could significantly improve AI’s ability to “remember.”

Released last week, the optical character recognition (OCR) model works by extracting text from an image and turning it into machine-readable words. This is the same technology that powers scanner apps, translation of text in photos, and many accessibility tools.

OCR is already a mature field with numerous high-performing systems, and according to the paper and some early reviews, DeepSeek’s new model performs on par with top models on key benchmarks.

But researchers say the model’s main innovation lies in how it processes information—specifically, how it stores and retrieves memories. Improving how AI models “remember” information could reduce the computing power they need to run, thus mitigating AI’s large (and growing) carbon footprint.

Currently, most large language models break text down into thousands of tiny units called tokens. This turns the text into representations that models can understand. However, these tokens quickly become expensive to store and compute with as conversations with end users grow longer. When a user chats with an AI for lengthy periods, this challenge can cause the AI to forget things it’s been told and get information muddled, a problem some call “context rot.”

The new methods developed by DeepSeek (and published in its latest paper) could help to overcome this issue. Instead of storing words as tokens, its system packs written information into image form, almost as if it’s taking a picture of pages from a book. This allows the model to retain nearly the same information while using far fewer tokens, the researchers found.

Essentially, the OCR model is a test bed for these new methods that permit more information to be packed into AI models more efficiently.

Besides using visual tokens instead of just text tokens, the model is built on a type of tiered compression that is not unlike how human memories fade: Older or less critical content is stored in a slightly more blurry form in order to save space. Despite that, the paper’s authors argue, this compressed content can still remain accessible in the background while maintaining a high level of system efficiency.

Text tokens have long been the default building block in AI systems. Using visual tokens instead is unconventional, and as a result, DeepSeek’s model is quickly capturing researchers’ attention. Andrej Karpathy, the former Tesla AI chief and a founding member of OpenAI, praised the paper on X, saying that images may ultimately be better than text as inputs for LLMs. Text tokens might be “wasteful and just terrible at the input,” he wrote.

Manling Li, an assistant professor of computer science at Northwestern University, says the paper offers a new framework for addressing the existing challenges in AI memory. “While the idea of using image-based tokens for context storage isn’t entirely new, this is the first study I’ve seen that takes it this far and shows it might actually work,” Li says.

The method could open up new possibilities in AI research and applications, especially in creating more useful AI agents, says Zihan Wang, a PhD candidate at Northwestern University. He believes that since conversations with AI are continuous, this approach could help models remember more and assist users more effectively.

The technique can also be used to produce more training data for AI models. Model developers are currently grappling with a severe shortage of quality text to train systems on. But the DeepSeek paper says that the company’s OCR system can generate over 200,000 pages of training data a day on a single GPU.

The model and paper, however, are only an early exploration of using image tokens rather than text tokens for AI memorization. Li says she hopes to see visual tokens applied not just to memory storage but also to reasoning. Future work, she says, should explore how to make AI’s memory fade in a more dynamic way, akin to how we can recall a life-changing moment from years ago but forget what we ate for lunch last week. Currently, even with DeepSeek’s methods, AI tends to forget and remember in a very linear way—recalling whatever was most recent, but not necessarily what was most important, she says.

Despite its attempts to keep a low profile, DeepSeek, based in Hangzhou, China, has built a reputation for pushing the frontier in AI research. The company shocked the industry at the start of this year with the release of DeepSeek-R1, an open-source reasoning model that rivaled leading Western systems in performance despite using far fewer computing resources.

Ecommerce MGMT 0 Comments

App Artificial intelligence Summary

Oct 29 2025

“We will never build a sex robot,” says Mustafa Suleyman

<div data-chronoton-summary="

Balancing humanlike interaction with safety concerns: Suleyman emphasizes that Microsoft’s new Copilot features—including group chat and the “Real Talk” personality—are designed to keep AI as a tool serving humanity rather than a replacement for human connection. The company deliberately avoids building chatbots that encourage romantic or sexual relationships, drawing clear boundaries where others in the industry see market opportunity.

Personality as craft, not deception: While acknowledging that engaging personalities make AI more useful, Suleyman argues the industry must learn to “sculpt” emotional intelligence carefully.

Reframing the “digital species” metaphor: Suleyman clarifies that describing AI as a new digital species isn’t endorsing consciousness or rights for machines; rather, it’s a warning about what’s coming that demands proper containment. He insists the goal is keeping AI subordinate to human interests, not granting it autonomy or moral consideration that would distract from protecting actual human rights.

” data-chronoton-post-id=”1126781″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

Mustafa Suleyman, CEO of Microsoft AI, is trying to walk a fine line. On the one hand, he thinks that the industry is taking AI in a dangerous direction by building chatbots that present as human: He worries that people will be tricked into seeing life instead of lifelike behavior. In August, he published a much-discussed post on his personal blog that urged his peers to stop trying to make what he called “seemingly conscious artificial intelligence,” or SCAI.

On the other hand, Suleyman runs a product shop that must compete with those peers. Last week, Microsoft announced a string of updates to its Copilot chatbot, designed to boost its appeal in a crowded market in which customers can pick and choose between a pantheon of rival bots that already includes ChatGPT, Perplexity, Gemini, Claude, DeepSeek, and more.

I talked to Suleyman about the tension at play when it comes to designing our interactions with chatbots and his ultimate vision for what this new technology should be.

One key Copilot update is a group-chat feature that lets multiple people talk to the chatbot at the same time. A big part of the idea seems to be to stop people from falling down a rabbit hole in a one-on-one conversation with a yes-man bot. Another feature, called Real Talk, lets people tailor how much Copilot pushes back on you, dialing down the sycophancy so that the chatbot challenges what you say more often.

Copilot also got a memory upgrade, so that it can now remember your upcoming events or long-term goals and bring up things that you told it in past conversations. And then there’s Mico, an animated yellow blob—a kind of Chatbot Clippy—that Microsoft hopes will make Copilot more accessible and engaging for new and younger users.

Microsoft says the updates were designed to make Copilot more expressive, engaging, and helpful. But I’m curious how far those features can be pushed without starting down the SCAI path that Suleyman has warned about.

Suleyman’s concerns about SCAI come at a time when we are starting to hear more and more stories about people being led astray by chatbots that are too engaging, too expressive, too helpful. OpenAI is being sued by the parents of a teenager who they allege was talked into killing himself by ChatGPT. There’s even a growing scene that celebrates romantic relationships with chatbots.

With all that in mind, I wanted to dig a bit deeper into Suleyman’s views. Because a couple of years ago he gave a TED Talk in which he told us that the best way to think about AI is as a new kind of digital species. Doesn’t that kind of hype feed the misperceptions Suleyman is now concerned about?

In our conversation, Suleyman told me what he was trying to get across in that TED Talk, why he really believes SCAI is a problem, and why Microsoft would never build sex robots (his words). He had a lot of answers, but he left me with more questions.

Our conversation has been edited for length and clarity.

In an ideal world, what kind of chatbot do you want to build? You’ve just launched a bunch of updates to Copilot. How do you get the balance right when you’re building a chatbot that has to compete in a market in which people seem to value humanlike interaction, but you also say you want to avoid seemingly conscious AI?

It’s a good question. With group chat, this will be the first time that a large group of people will be able to speak to an AI at the same time. It really is a way of emphasizing that AIs shouldn’t be drawing you out of the real world. They should be helping you to connect, to bring in your family, your friends, to have community groups, and so on.

That is going to become a very significant differentiator over the next few years. My vision of AI has always been one where an AI is on your team, in your corner.

This is a very simple, obvious statement, but it isn’t about exceeding and replacing humanity—it’s about serving us. That should be the test of technology at every step. Does it actually, you know, deliver on the quest of civilization, which is to make us smarter and happier and more productive and healthier and stuff like that?

So we’re just trying to build features that constantly remind us to ask that question, and remind our users to push us on that issue.

Last time we spoke, you told me that you weren’t interested in making a chatbot that would role-play personalities. That’s not true of the wider industry. Elon Musk’s Grok is selling that kind of flirty experience. OpenAI has said it’s interested in exploring new adult interactions with ChatGPT. There’s a market for that. And yet this is something you’ll just stay clear of?

Yeah, we will never build sex robots. Sad in a way that we have to be so clear about that, but that’s just not our mission as a company. The joy of being at Microsoft is that for 50 years, the company has built, you know, software to empower people, to put people first.

Sometimes, as a result, that means the company moves slower than other startups and is more deliberate and more careful. But I think that’s a feature, not a bug, in this age, when being attentive to potential side effects and longer-term consequences is really important.

And that means what, exactly?

We’re very clear on, you know, trying to create an AI that fosters a meaningful relationship. It’s not that it’s trying to be cold and anodyne—it cares about being fluid and lucid and kind. It definitely has some emotional intelligence.

So where does it—where do you—draw those boundaries?

Our newest chat model, which is called Real Talk, is a little bit more sassy. It’s a bit more cheeky, it’s a bit more fun, it’s quite philosophical. It’ll happily talk about the big-picture questions, the meaning of life, and so on. But if you try and flirt with it, it’ll push back and it’ll be very clear—not in a judgmental way, but just, like: “Look, that’s not for me.”

There are other places where you can go to get that kind of experience, right? And I think that’s just a decision we’ve made as a company.

Is a no-flirting policy enough? Because if the idea is to stop people even imagining an entity, a consciousness, behind the interactions, you could still get that with a chatbot that wanted to keep things SFW. You know, I can imagine some people seeing something that’s not there even with a personality that’s saying, hey, let’s keep this professional.

Here’s a metaphor to try to make sense of it. We hold each other accountable in the workplace. There’s an entire architecture of boundary management, which essentially sculpts human behavior to fit a mold that’s functional and not irritating.

The same is true in our personal lives. The way that you interact with your third cousin is very different to the way you interact with your sibling. There’s a lot to learn from how we manage boundaries in real human interactions.

It doesn’t have to be either a complete open book of emotional sensuality or availability—drawing people into a spiraled rabbit hole of intensity—or, like, a cold dry thing. There’s a huge spectrum in between, and the craft that we’re learning as an industry and as a species is to sculpt these attributes.

And those attributes obviously reflect the values of the companies that design them. And I think that’s where Microsoft has a lot of strengths, because our values are pretty clear, and that’s what we’re standing behind.

A lot of people seem to like personalities. Some of the backlash to GPT-5, for example, was because the previous model’s personality had been taken away. Was it a mistake for OpenAI to have put a strong personality there in the first place, to give people something that they then missed?

No, personality is great. My point is that we’re trying to sculpt personality attributes in a more fine-grained way, right?

Like I said, Real Talk is a cool personality. It’s quite different to normal Copilot. We are also experimenting with Mico, which is this visual character, that, you know, people—some people—really love. It’s much more engaging. It’s easier to talk to about all kinds of emotional questions and stuff.

I guess this is what I’m trying to get straight. Features like Mico are meant to make Copilot more engaging and nicer to use, but it seems to go against the idea of doing whatever you can to stop people thinking there’s something there that you are actually having a friendship with.

Yeah. I mean, it doesn’t stop you necessarily. People want to talk to somebody, or something, that they like. And we know that if your teacher is nice to you at school, you’re going to be more engaged. The same with your manager, the same with your loved ones. And so emotional intelligence has always been a critical part of the puzzle, so it’s not to say that we don’t want to pursue it.

It’s just that the craft is in trying to find that boundary. And there are some things which we’re saying are just off the table, and there are other things which we’re going to be more experimental with. Like, certain people have complained that they don’t get enough pushback from Copilot—they want it to be more challenging. Other people aren’t looking for that kind of experience—they want it to be a basic information provider. The task for us is just learning to disentangle what type of experience to give to different people.

I know you’ve been thinking about how people engage with AI for some time. Was there an inciting incident that made you want to start this conversation in the industry about seemingly conscious AI?

I could see that there was a group of people emerging in the academic literature who were taking the question of moral consideration for artificial entities very seriously. And I think it’s very clear that if we start to do that, it would detract from the urgent need to protect the rights of many humans that already exist, let alone animals.

If you grant AI rights, that implies—you know—fundamental autonomy, and it implies that it might have free will to make its own decisions about things. So I’m really trying to frame a counter to that, which is that it won’t ever have free will. It won’t ever have complete autonomy like another human being.

AI will be able to take actions on our behalf. But these models are working for us. You wouldn’t want a pack of, you know, wolves wandering around that weren’t tame and that had complete freedom to go and compete with us for resources and weren’t accountable to humans. I mean, most people would think that was a bad idea and that you would want to go and kill the wolves.

Okay. So the idea is to stop some movement that’s calling for AI welfare or rights before it even gets going, by making sure that we don’t build AI that appears to be conscious? What about not building that kind of AI because certain vulnerable people may be tricked by it in a way that may be harmful? I mean, those seem to be two different concerns.

I think the test is going to be in the kinds of features the different labs put out and in the types of personalities that they create. Then we’ll be able to see how that’s affecting human behavior.

But is it a concern of yours that we are building a technology that might trick people into seeing something that isn’t there? I mean, people have claimed they’ve seen sentience inside far less sophisticated models than we have now. Or is that just something that some people will always do?

It’s possible. But my point is that a responsible developer has to do our best to try and detect these patterns emerging in people as quickly as possible and not take it for granted that people are going to be able to disentangle those kinds of experiences themselves.

When I read your post about seemingly conscious AI, I was struck by a line that says: “We must build AI for people; not to be a digital person.” It made me think of a TED Talk you gave last year where you say that the best way to think about AI is as a new kind of digital species. Can you help me understand why talking about this technology as a digital species isn’t a step down the path of thinking about AI models as digital persons or conscious entities?

I think the difference is that I’m trying to offer metaphors that make it easier for people to understand where things might be headed, and therefore how to avert that and how to control it.

Okay.

It’s not to say that we should do those things. It’s just pointing out that this is the emergence of a technology which is unique in human history. And if you just assume that it’s a tool or just a chatbot or a dumb— you know, I kind of wrote that TED Talk in the context of a lot of skepticism. And I think it’s important to be clear-eyed about what’s coming so that one can think about the right guardrails.

And yet, if you’re telling me this technology is a new digital species, I have some sympathy for the people who say, well, then we need to consider welfare.

I wouldn’t. [He starts laughing.] Just not in the slightest. No way. It’s not a direction that any of us want to go in.

No, that’s not what I meant. I don’t think chatbots should have welfare. I’m saying I’d have some sympathy for where such people were coming from when they hear, you know, Mustafa Suleyman tell them that this thing he’s building was a new digital species. I’d understand why they might then say that they wanted to stand up for it. I’m saying the words we use matter, I guess.

The rest of the TED Talk was all about how to contain AI and how not to let this species take over, right? That was the whole point of setting it up as, like, this is what’s coming. I mean, that’s what my whole book [The Coming Wave, published in 2023] was about—containment and alignment and stuff like that. There’s no point in pretending that it’s something that it’s not and then building guardrails and boundaries that don’t apply because you think it’s just a tool.

Honestly, it does have the potential to recursively self-improve. It does have the potential to set its own goals. Those are quite profound things. No other technology we’ve ever invented has that. And so, yeah, I think that it is accurate to say that it’s like a digital species, a new digital species. That’s what we’re trying to restrict to make sure it’s always in service of people. That’s the target for containment.

Ecommerce MGMT 0 Comments