Google Files Patent On Personal History-Based Search via @sejournal, @martinibuster

Google recently filed a patent for a way to provide search results based on a user’s browsing and email history. The patent outlines a new way to search within the context of a search engine, within an email interface, and through a voice-based assistant (referred to in the patent as a voice-based dialog system).

A problem that many people have is that they can remember what they saw but they can’t remember where they saw it or how they found it. The new patent, titled Generating Query Answers From A User’s History, solves that problem by helping people find information they’ve previously seen within a webpage or an email by enabling them to ask for what they’re looking for using everyday language such as “What was that article I read last week about chess?”

The problem the invention solves is that traditional search engines don’t enable users to easily search their own browsing or email history using natural language. The invention works by taking a user’s spoken or typed question, recognizing that the question is asking for previously viewed content, and then retrieving search results from the user’s personal history (such as their browser history or emails). In order to accomplish this it uses filters like date, topic, or device used.

What’s novel about the invention is the system’s ability to understand vague or fuzzy natural language queries and match them to a user’s specific past interactions, including showing the version of a page as it looked when the user originally saw it (a cached version of the web page).

Query Classification (Intent) And Filtering

Query Classification

The system first determines whether the intent of the user’s spoken or typed query is to retrieve previously accessed information. This process is called query classification and involves analyzing the phrasing of the query to detect the intent. The system compares parts of the query to known patterns associated with history-seeking questions and uses techniques like semantic analysis and similarity thresholds to identify if the user’s intent is to seek something they’d seen before, even when the wording is vague or conversational.

The similarity threshold is an interesting part of the invention because it compares what the user is saying or typing to known history-seeking phrases to see if they are similar. It’s not looking for an exact match but rather a close match.

Filtering

The next part is filtering, and it happens after the system has identified the history-seeking intent. It then applies filters such as the topic, time, or device to limit the search to content from the user’s personal history that matches those criteria.

The time filter is a way to constrain the search to within a specific time frame that’s mentioned or implied in the search query. This helps the system narrow down the search results to what the user is trying to find. So if a user speaks phrases like “last week” or “a few days ago” then it knows to restrict the query to those respective time frames.

An interesting quality of the time filter is that it’s applied with a level of fuzziness, which means it’s not exact. So when a person asks the voice assistant to find something from the past week it won’t do a literal search of the past seven days but will expand it to a longer period of time.

The patent describes the fuzzy quality of the time filter:

“For example, the browser history collection… may include a list of web pages that were accessed by the user. The search engine… may obtain documents from the index… based on the filters from the formatted query.

For example, if the formatted query… includes a date filter (e.g., “last week”) and a topic filter (e.g., “chess story”), the search engine… may retrieve only documents from the collection… that satisfy these filters, i.e., documents that the user accessed in the previous week that relate to a “chess story.”

In this example, the search engine… may apply fuzzy time ranges to the “last week” filter to account for inaccuracies in human memory. In particular, while “last week” literally refers to the seven calendar days of the previous week, the search engine… may search for documents over a wider range, e.g., anytime in the past two weeks.”

Once a query is classified as asking for something that was previously seen, the system identifies details in the user’s phrasing that are indicative of topic, date or time, source, device, sender, or location and uses them as filters to search the user’s personal history.

Each filter helps narrow the scope of the search to match what the user is trying to recall: for example, a topic filter (“turkey recipe”) targets the subject of the content; a time filter (“last week”) restricts results to when it was accessed; a source filter (“WhiteHouse.gov”) limits the search to specific websites; a device filter (e.g., “on my phone”) further restricts the search results from a certain device; a sender filter (“from grandma”) helps locate emails or shared content; and a location filter (e.g., “at work”) restricts results to those accessed in a particular physical place.

By combining these context-sensitive filters, the system mimics the way people naturally remember content in order to help users retrieve exactly what they’re looking for, even when their query is vague or incomplete.

Scope of Search: What Is Searched

The next part of the patent is about figuring out the scope of what is going to be searched, which is limited to predefined sources such as browser history, cached versions of web pages, or emails. So, rather than searching the entire web, the system focuses only on the user’s personal history, making the results more relevant to what the user is trying to recall.

Cached Versions of Previously Viewed Content

Another interesting feature described in the patent is web page caching. Caching refers to saving a copy of a web page as it appeared when the user originally viewed it. This enables the system to show the user that specific version of the page in search results, rather than the current version, which may have changed or been removed.

The cached version acts like a snapshot in time, making it easier for the user to recognize or remember the content they are looking for. This is especially useful when the user doesn’t remember precise details like the name of the page or where they found it, but would recognize it if they saw it again. By showing the version that the user actually saw, the system makes the search experience more aligned with how people remember things.

Potential Applications Of The Patent Invention

The system described in the patent can be applied in several real-world contexts where users may want to retrieve content they’ve previously seen:

Search Engines

The patent refers multiple times to the use of this technique in the context of a search engine that retrieves results not from the public web, but from the user’s personal history, such as previously visited web pages and emails. While the system is designed to search only content the user has previously accessed, the patent notes that some implementations may also include additional documents relevant to the query, even if the user hasn’t viewed them before.

Email Clients

The system treats previously accessed emails as part of the searchable history. For example, it can return an old email like “Grandma’s turkey meatballs” based on vague, natural language queries.

Voice Assistants

The patent includes examples of “a voice-based search” where users speak conversational queries like “I’m looking for a turkey recipe I read on my phone.” The system handles speech recognition and interprets intent to retrieve relevant results from personal history.

Read the entire patent here:

Generating query answers from a user’s history

Google’s New Infini-Attention And SEO via @sejournal, @martinibuster

Google has published a research paper on a new technology called Infini-attention that allows it to process massively large amounts of data with “infinitely long contexts” while also being capable of being easily inserted into other models to vastly improve their capabilities

That last part should be of interest to those who are interested in Google’s algorithm. Infini-Attention is plug-and-play, which means it’s relatively easy to insert into other models, including those in use b Google’s core algorithm. The part about “infinitely long contexts” may have implications for how some of Google’s search systems may work.

The name of the research paper is: Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Memory Is Computationally Expensive For LLMs

Large Language Models (LLM) have limitations on how much data they can process at one time because the computational complexity and memory usage can spiral upward significantly. Infini-Attention gives the LLM the ability to handle longer contexts while keeping the down memory and processing power needed.

The research paper explains:

“Memory serves as a cornerstone of intelligence, as it enables efficient computations tailored to specific contexts. However, Transformers …and Transformer-based LLMs …have a constrained context-dependent memory, due to the nature of the attention mechanism.

Indeed, scaling LLMs to longer sequences (i.e. 1M tokens) is challenging with the standard Transformer architectures and serving longer and longer context models becomes costly financially.”

And elsewhere the research paper explains:

“Current transformer models are limited in their ability to process long sequences due to quadratic increases in computational and memory costs. Infini-attention aims to address this scalability issue.”

The researchers hypothesized that Infini-attention can scale to handle extremely long sequences with Transformers without the usual increases in computational and memory resources.

Three Important Features

Google’s Infini-Attention solves the shortcomings of transformer models by incorporating three features that enable transformer-based LLMs to handle longer sequences without memory issues and use context from earlier data in the sequence, not just data near the current point being processed.

The features of Infini-Attention

  • Compressive Memory System
  • Long-term Linear Attention
  • Local Masked Attention

Compressive Memory System

Infini-Attention uses what’s called a compressive memory system. As more data is input (as part of a long sequence of data), the compressive memory system compresses some of the older information in order to reduce the amount of space needed to store the data.

Long-term Linear Attention

Infini-attention also uses what’s called, “long-term linear attention mechanisms” which enable the LLM to process data that exists earlier in the sequence of data that’s being processed which enables to retain the context. That’s a departure from standard transformer-based LLMs.

This is important for tasks where the context exists on a larger plane of data. It’s like being able to discuss and entire book and all of the chapters and explain how the first chapter relates to another chapter closer to the end of the book.

Local Masked Attention

In addition to the long-term attention, Infini-attention also uses what’s called local masked attention. This kind of attention processes nearby (localized) parts of the input data, which is useful for responses that depend on the closer parts of the data.

Combining the long-term and local attention together helps solve the problem of transformers being limited to how much input data it can remember and use for context.

The researchers explain:

“The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block.”

Results Of Experiments And Testing

Infini-attention was tested with other models for comparison across multiple benchmarks involving long input sequences, such as long-context language modeling, passkey retrieval, and book summarization tasks. Passkey retrieval is a test where the language model has to retrieve specific data from within a extremely long text sequence.

List of the three tests:

  1. Long-context Language Modeling
  2. Passkey Test
  3. Book Summary

Long-Context Language Modeling And The Perplexity Score

The researchers write that the Infini-attention outperformed the baseline models and that increasing the training sequence length brought even further improvements in the Perplexity score. The Perplexity score is a metric that measures language model performance with lower scores indicating better performance.

The researchers shared their findings:

“Infini-Transformer outperforms both Transformer-XL …and Memorizing Transformers baselines while maintaining 114x less memory parameters than the Memorizing Transformer model with a vector retrieval-based KV memory with length of 65K at its 9th layer. Infini-Transformer outperforms memorizing transformers with memory length of 65K and achieves 114x compression ratio.

We further increased the training sequence length to 100K from 32K and trained the models on Arxiv-math dataset. 100K training further decreased the perplexity score to 2.21 and 2.20 for Linear and Linear + Delta models.”

Passkey Test

The passkey test is wherea random number is hidden within a long text sequence with the task being that the model must fetch the hidden text. The passkey is hidden either near the beginning, middle or the end of the long text. The model was able to solve the passkey test up to a length of 1 million.

“A 1B LLM naturally scales to 1M sequence length and solves the passkey retrieval task when injected with Infini-attention. Infini-Transformers solved the passkey task with up to 1M context length when fine-tuned on 5K length inputs. We report token-level retrieval accuracy for passkeys hidden in a different part (start/middle/end) of long inputs with lengths 32K to 1M.”

Book Summary Test

Infini-attention also excelled at the book summary test by outperforming top benchmarks achieving new state of the art (SOTA) performance levels.

The results are described:

“Finally, we show that a 8B model with Infini-attention reaches a new SOTA result on a 500K length book summarization task after continual pre-training and task fine-tuning.

…We further scaled our approach by continuously pre-training a 8B LLM model with 8K input length for 30K steps. We then fine-tuned on a book summarization task, BookSum (Kry´sci´nski et al., 2021) where the goal is to generate a summary of an entire book text.

Our model outperforms the previous best results and achieves a new SOTA on BookSum by processing the entire text from book. …There is a clear trend showing that with more text provided as input from books, our Infini-Transformers improves its summarization performance metric.”

Implications Of Infini-Attention For SEO

Infini-attention is a breakthrough in modeling long and short range attention with greater efficiency than previous models without Infini-attention. It also supports “plug-and-play continual pre-training and long-context adaptation
by design” which means that it can easily be integrated into existing models.

Lastly, the “continual pre-training and long-context adaptation” makes it exceptionally useful for scenarios where it’s necessary to constantly train the model on new data. This last part is super interesting because it may make it useful for applications on the back end of Google’s search systems, particularly where it is necessary to be able to analyze long sequences of information and understand the relevance from one part near the beginning of the sequence and another part that’s closer to the end.

Other articles focused on the “infinitely long inputs” that this model is capable of but where it’s relevant to SEO is how that ability to handle huge input and “Leave No Context Behind” is what’s relevant to search marketing and how some of Google’s systems might work if Google adapted Infini-attention to their core algorithm.

Read the research paper:

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Featured Image by Shutterstock/JHVEPhoto