DeepSeek may have found a new way to improve AI’s ability to remember

<div data-chronoton-summary="

  • Memory Through Images: DeepSeek’s new OCR model stores information as visual rather than text tokens, a technique that allows it to retain more data. This approach could drastically reduce computing costs and carbon footprint while improving AI’s ability to ‘remember’.
  • Addressing Context Rot: The model works a bit like human memory, storing older or less critical information in slightly blurred form to save space. This could help address the fact current AI systems forget or muddle information over long conversations, a problem dubbed “context rot.”
  • DeepSeek Disruption: DeepSeek shocked the AI industry with its efficient DeepSeek-R1 reasoning model in January, and is again pushing boundaries. The OCR system can generate over 200,000 training data pages daily on a single GPU, potentially addressing the industry’s severe shortage of quality training text.

” data-chronoton-post-id=”1126932″ data-chronoton-expand-collapse=”1″ data-chronoton-analytics-enabled=”1″>

An AI model released by the Chinese AI company DeepSeek uses new techniques that could significantly improve AI’s ability to “remember.”

Released last week, the optical character recognition (OCR) model works by extracting text from an image and turning it into machine-readable words. This is the same technology that powers scanner apps, translation of text in photos, and many accessibility tools. 

OCR is already a mature field with numerous high-performing systems, and according to the paper and some early reviews, DeepSeek’s new model performs on par with top models on key benchmarks.

But researchers say the model’s main innovation lies in how it processes information—specifically, how it stores and retrieves memories. Improving how AI models “remember” information could reduce the computing power they need to run, thus mitigating AI’s large (and growing) carbon footprint. 

Currently, most large language models break text down into thousands of tiny units called tokens. This turns the text into representations that models can understand. However, these tokens quickly become expensive to store and compute with as conversations with end users grow longer. When a user chats with an AI for lengthy periods, this challenge can cause the AI to forget things it’s been told and get information muddled, a problem some call “context rot.”

The new methods developed by DeepSeek (and published in its latest paper) could help to overcome this issue. Instead of storing words as tokens, its system packs written information into image form, almost as if it’s taking a picture of pages from a book. This allows the model to retain nearly the same information while using far fewer tokens, the researchers found. 

Essentially, the OCR model is a test bed for these new methods that permit more information to be packed into AI models more efficiently. 

Besides using visual tokens instead of just text tokens, the model is built on a type of tiered compression that is not unlike how human memories fade: Older or less critical content is stored in a slightly more blurry form in order to save space. Despite that, the paper’s authors argue, this compressed content can still remain accessible in the background while maintaining a high level of system efficiency.

Text tokens have long been the default building block in AI systems. Using visual tokens instead is unconventional, and as a result, DeepSeek’s model is quickly capturing researchers’ attention. Andrej Karpathy, the former Tesla AI chief and a founding member of OpenAI, praised the paper on X, saying that images may ultimately be better than text as inputs for LLMs. Text tokens might be “wasteful and just terrible at the input,” he wrote. 

Manling Li, an assistant professor of computer science at Northwestern University, says the paper offers a new framework for addressing the existing challenges in AI memory. “While the idea of using image-based tokens for context storage isn’t entirely new, this is the first study I’ve seen that takes it this far and shows it might actually work,” Li says.

The method could open up new possibilities in AI research and applications, especially in creating more useful AI agents, says Zihan Wang, a PhD candidate at Northwestern University. He believes that since conversations with AI are continuous, this approach could help models remember more and assist users more effectively.

The technique can also be used to produce more training data for AI models. Model developers are currently grappling with a severe shortage of quality text to train systems on. But the DeepSeek paper says that the company’s OCR system can generate over 200,000 pages of training data a day on a single GPU.

The model and paper, however, are only an early exploration of using image tokens rather than text tokens for AI memorization. Li says she hopes to see visual tokens applied not just to memory storage but also to reasoning. Future work, she says, should explore how to make AI’s memory fade in a more dynamic way, akin to how we can recall a life-changing moment from years ago but forget what we ate for lunch last week. Currently, even with DeepSeek’s methods, AI tends to forget and remember in a very linear way—recalling whatever was most recent, but not necessarily what was most important, she says. 

Despite its attempts to keep a low profile, DeepSeek, based in Hangzhou, China, has built a reputation for pushing the frontier in AI research. The company shocked the industry at the start of this year with the release of DeepSeek-R1, an open-source reasoning model that rivaled leading Western systems in performance despite using far fewer computing resources. 

An AI adoption riddle

A few weeks ago, I set out on what I thought would be a straightforward reporting journey. 

After years of momentum for AI—even if you didn’t think it would be good for the world, you probably thought it was powerful enough to take seriously—hype for the technology had been slightly punctured. First there was the underwhelming release of GPT-5 in August. Then a report released two weeks later found that 95% of generative AI pilots were failing, which caused a brief stock market panic. I wanted to know: Which companies are spooked enough to scale back their AI spending?

I searched and searched for them. As I did, more news fueled the idea of an AI bubble that, if popped, would spell doom economy-wide. Stories spread about the circular nature of AI spending, layoffs, the inability of companies to articulate what exactly AI will do for them. Even the smartest people building modern AI systems were saying the tech has not progressed as much as its evangelists promised. 

But after all my searching, companies that took these developments as a sign to perhaps not go all in on AI were nowhere to be found. Or, at least, none that were willing to admit it. What gives? 

There are several interpretations of this one reporter’s quest (which, for the record, I’m presenting as an anecdote and not a representation of the economy), but let’s start with the easy ones. First is that this is a huge score for the “AI is a bubble” believers. What is a bubble if not a situation where companies continue to spend relentlessly even in the face of worrying news? The other is that underneath the bad headlines, there’s not enough genuinely troubling news about AI to convince companies they should pivot.

But it could also be that the unbelievable speed of AI progress and adoption has made me think industries are more sensitive to news than they perhaps should be. I spoke with Martha Gimbel, who leads the Yale Budget Lab and coauthored a report finding that AI has not yet changed anyone’s jobs. What I gathered is that Gimbel, like many economists, thinks on a longer time scale than anyone in the AI world is used to. 

“It would be historically shocking if a technology had had an impact as quickly as people thought that this one was going to,” she says. In other words, perhaps most of the economy is still figuring out what the hell AI even does, not deciding whether to abandon it. 

The other reaction I heard—particularly from the consultant crowd—is that when executives hear that so many AI pilots are failing, they indeed take it very seriously. They’re just not reading it as a failure of the technology itself. They instead point to pilots not moving quickly enough, companies lacking the right data to build better AI, or a host of other strategic reasons.

Even if there is incredible pressure, especially on public companies, to invest heavily in AI, a few have taken big swings on the technology only to pull back. The buy now, pay later company Klarna laid off staff and paused hiring in 2024, claiming it could use AI instead. Less than a year later it was hiring again, explaining that “AI gives us speed. Talent gives us empathy.” 

Drive-throughs, from McDonald’s to Taco Bell, ended pilots testing the use of AI voice assistants. The vast majority of Coca-Cola advertisements, according to experts I spoke with, are not made with generative AI, despite the company’s $1 billion promise. 

So for now, the question remains unanswered: Are there companies out there rethinking how much their bets on AI will pay off, or when? And if there are, what’s keeping them from talking out loud about it? (If you’re out there, email me!)