SEO In The Age Of AI: Build AI Best Practices via @sejournal, @BennyJamminS

For better or worse, AI has become a dominant force in SEO.

SEO professionals have been grappling with AI for years in Google’s algorithms, but the technology has moved to the forefront of digital marketing. The largest tech companies are developing the technology quickly and pushing products out to customers, trying to stay ahead of the curve.

This has resulted in several AI and generative AI releases, including LLM chatbots, chatbot integrations into search platforms, and AI-based search and research products.

AI threatens to be one of the most disruptive forces in SEO and digital marketing.

SEJ’s latest ebook explores the recent history of AI and developments in the search and marketing industries. It also provides guides and expert advice on building AI into your strategy and workflows.

Download SEO In The Age Of AI to expand on the learning in this post.

SEO Professionals Must Develop AI Best Practices

To compete in search environments built on AI algorithms and with user-facing generative AI features, SEO professionals must learn how the technology works. You need to know how to interact with AI on several fronts:

  • Optimizing for AI-powered search algorithms.
  • Building keyword and search strategies that take generative AI search features into account.
  • Employing AI tools to help improve productivity.
  • Understanding where AI needs human guidance and what tasks should not be delegated to it.
  • Differentiating your brand and content from competitors where AI tools have lowered the cost and barriers to marketing at scale.

Creating best practices that define your stance on and relationship to AI and generative AI will position you to succeed as the technology continues to develop and as user trends continue to change.

Look Back On AI Development To Predict Future Trends

Google spent many months rolling out AI products gradually, testing as it went. To understand how AI development will continue impacting SEO, study recent developments and releases, such as how Google has been changing SERP features and algorithms.

See where AI fits into these developments to predict how search might change.

The ebook collects almost a year of SEJ’s coverage of events in the industry and updates from Google, from product testing and releases to public reactions and studies about impact.

One key point is Google’s development of on-SERP features that give users answers without clicking through to a website. These features, including generative AI answers, can make it much more difficult to acquire traffic from certain queries.

That doesn’t mean you can’t make use of these queries, but it’s imperative that you correctly identify user intent for your target queries and build strategies specifically for acquiring SERP features.

Explore the history of AI in SEO and predict what’s coming next by downloading the report.

SEO Professionals Must Focus On Authority, Brand, And Trust

While disruptive, new user interactions with AI present opportunities. Becoming a cited source can be a great way to power brand awareness. However, trust is also at a premium if you want to keep users’ attention and earn conversions.

Building your content and information architecture with AI in mind can help you stand out in multiple touchpoints of a user’s journey.

Understanding where you must differentiate yourself from automated marketing and build humanity into your brand is now a powerful way to stand out in the minds of users. Building content with AI-friendly organization but human-focused insights helps you serve the right audiences at the right time.

The ebook collects insights from SEJ contributors focused on how building AI into your content strategy goes beyond using it to create for you.

Learn how to build trust and authority in the age of AI with insights from top experts.

Get Familiar With What Generative AI Is Best At

Effectively incorporating generative AI into your workflows requires that you understand how it works and what it’s good at.

You can use generative AI tools to build connections between ideas and words quickly, to parse a lot of data to find commonalities, and to draft and expand ideas, among many other things.

Generative AI can make some tasks much faster, but accuracy will always be an issue, so it’s best when the tasks involve redundancy or human checks.

For example, you could use generative AI to assist with internal linking. It’s ideal for quickly evaluating the pages of a website and suggesting semantic connections between pages. Then, a human can review for accuracy and execute the links that make sense.

We collected some of the best examples of how generative AI tools can improve human workflows in the SEO In The Age Of AI Ebook.

To learn about all this and more, download your copy of SEO In The Age Of AI.

Google’s AI Overviews Slammed By News Publishers via @sejournal, @MattGSouthern

Since its U.S. launch in May, Google’s AI Overviews feature has created controversy among news publishers.

The generative search tool attempts to directly answer queries by synthesizing information from web sources into AI-generated overviews.

While offering users a new level of convenience, AI Overviews has been criticized for factual inaccuracies, lack of transparency in sourcing content, and disincentivizing clicks to original articles.

Despite an initial scale-back, Google has doubled down – releasing Overviews in six more countries and additional languages in August.

Background on AI Overviews

Google introduced AI Overviews as an experimental opt-in feature that has since been rolled out to general search results.

Instead of listing links to webpages, AI Overviews aim to provide a complete answer using natural language.

Many publishers are concerned that AI Overviews could cannibalize their organic search traffic by satisfying user queries without requiring a click-through.

There are also complaints that Google is repackaging and republishing content without attribution or revenue sharing.

Audience Directors Speak Out

In interviews with the Nieman Journalism Lab at Harvard, seven leading audience strategy experts shared their perspectives on adapting to the AI Overviews disruption.

Veronica de Souza of New York Public Radio emphasized reducing reliance on Google by building direct audience relationships through owned channels like apps and newsletters.

Souza states:

“We’ve doubled down on converting people to our O&O (owned-and-operated) platforms like our app and newsletters…More transparency about which categories of search queries surface AI Overviews would be a good start.”

Washington Post’s Bryan Flaherty raised concerns about misinformation risks and lack of performance data insights from Google.

Flaherty states:

“If Google loses users due to the quality issues in its results and AI Overviews, users could continue to turn to non-traditional search platforms that don’t have as direct a tie back to sites, like YouTube and TikTok, which will have an impact on traffic.”

Vermont Public’s Mike Dougherty pointed out the lack of clear citations to original sources in Overviews.

Dougherty states:

“This product could so easily put clickable citations into or above the text. It could even write, ‘According to [publisher],…’ the way one news outlet might credit another.”

Scott Brodbeck of Local News Now remained optimistic that quality journalism can outcompete brief AI summaries.

Brodbeck states:

“If you as a news publisher cannot out-compete a brief AI-written summary, I think you have a big problem that’s not just being caused by Google and AI.”

Marat Gaziev of IGN advocated for deeper symbiosis between Google and reputable information providers to uphold accuracy standards.

Gaziev states:

“RAG requires a deep and symbiotic relationship with content publishers and the media industry to ensure that only credible sources are utilized during retrieval and augmentation.”

YESEO founder Ryan Restivo warned about potential carbon impacts from the heavy computing power required at scale.

Restivo states:

“The biggest problem, in my opinion, is the competition entering this space…The amount of compute needed to produce these at scale is hurting our environment.”

LA Times’ Seth Liss speculated Google may eventually prioritize generating answers over linking to external sites.

Liss states:

“If Google decides its best way forward is to keep all of those readers on its own site, there will be a lot of sites that have to figure out other ways to find new audiences.”

Measured Optimism

While most publishers interviewed by Nieman Journalism Lab expressed reservations, some took a more optimistic view.

The consensus is that high-quality, in-depth journalism will draw readers to visit publisher websites for full context beyond a brief AI summary.

There’s also hope that Google will find mutually beneficial ways to incorporate publisher content without usurping it entirely.

The Path Forward

As the search evolves, publishers are exploring strategies to adapt – from re-investing in email newsletters and mobile apps to developing AI-focused SEO best practices.

The debate highlights a challenge all publishers share – how to remain discoverable and generate traffic/revenue when search engines can directly answer queries themselves.


Featured Image: Marco Lazzarini/Shutterstock

AI for Maximum ROI: Expert Insights for Agencies and Small Businesses via @sejournal, @hethr_campbell

The rise of generative AI has opened up a world of possibilities for agencies and small businesses, but with so many tools available, it can be challenging to determine which ones will truly drive results. 

How do you choose the right AI products to elevate your business and ensure a strong return on investment?

On September 11th, join us for an expert panel discussion where we’ll cut through the noise and highlight the AI tools that can genuinely make a difference in your performance. 

Whether you’re looking to enhance your SEO, boost your paid channels, or streamline your overall marketing efforts, this session is designed to provide you with actionable insights and practical strategies.

Register for this webinar, where you’ll hear from Zac Elbel, Senior Product Marketing Manager at CallRail, and Sean Whitmore, Director of Digital at Snapshot Interactive. Together, they’ll break down the reasons that AI is essential for your business’s success. 

They’ll share real-life examples from Snapshot Interactive, demonstrating how they’ve integrated AI into their daily operations to optimize both organic and paid channels, improve client outcomes, and ultimately increase ROI. By adopting similar approaches, you can reach new levels of efficiency and prove your agency’s value to clients.

One of the highlights of the session will be a detailed look at CallRail’s innovative AI products. You’ll learn how these tools can be utilized to simplify workflows, drive revenue, and position your business for long-term success.

From AI-driven insights to automation, we’ll explore how to implement these technologies to get real results.

What You’ll Learn:

  • Why AI is critical for your business and how to implement it effectively.
  • Real-world examples of AI in action including its impact on organic and paid channels.
  • How to utilize CallRail’s AI products to deliver superior results.

Following the presentation, there will be a LIVE Q&A session where you can ask all your AI-related questions. This is your opportunity to gain personalized advice from industry experts who have successfully integrated AI into their operations, so save your seat!

If you’re serious about staying ahead of the curve and driving meaningful results for your clients, this is one webinar you can’t afford to miss.

Can’t attend the live event? No problem. Register now, and we’ll send you a recording so you can catch up at your convenience.

Take the first step toward mastering AI in your business and maximizing your ROI. Sign up today!

New Google Gemini AI Experts Called Gems Might Be Good For SEO via @sejournal, @martinibuster

Google announced a new feature for Gemini AI called Gems that are pre-defined specialized experts to help users code, coach, create content, brainstorm and handle other tasks. Gems will soon roll out with premade experts and the ability for user to create their own experts to handle specific tasks.

What Is Gemini Gems?

Gemini Gems is a feature of Google’s Gemini AI platform that are created for specific narrowly defined tasks. Users can create their custom AI experts by providing specific instructions that will make the Gems an expert that can offer help in a highly defined role.

Real-World Practical Uses

I haven’t seen Gems yet but I wonder what would happen if you feed it Google’s quality raters guidelines, their SEO starter guide, and other documentation then set it loose on content to see if it could identify where it could be improved and why.

Google offered examples of how Gems can be used in business and professional settings.

  • Coding Assistance:
    Gems can be a coding assistant that can focus on a specific need like debugging code or making improvement suggestions.
  • Career Planning:
    A career planning professional can create a Gem to behave like a career coach that can offer advice and personalized career plans.
  • Content
    Gem can provide writers ideas, improve content and offer feedback like a writing expert.

An analogy of Gemini Gems, for example, can be like a bag of tools. Each tool specializes in something different like a drill, screwdriver and a hammer.

Impact Of Gems

Gems is a useful feature for Gemini users because they may no longer need to subscribe to a service that provides AI assistance in any given task. This may be bad news for SaaS businesses that offer AI content creation and other services but it’s good news for businesses because it will make users able to do more and do it better.

According to Google’s announcement:

“With Gems, you can create a team of experts to help you think through a challenging project, brainstorm ideas for an upcoming event, or write the perfect caption for a social media post. Your Gem can also remember a detailed set of instructions to help you save time on tedious, repetitive or difficult tasks.”

This new feature may very well make a subscription to Google Gemini something to give a try because it has the potential to make an impact in business and personal settings.

Read Google’s announcement

New in Gemini: Custom Gems and improved image generation with Imagen 3

Featured Image by Shutterstock/Cast Of Thousands

ChatGPT Outage Crashes Service For OpenAI via @sejournal, @martinibuster

ChatGPT is experiencing a noticeable outage that is apparently reaching a critical point where it’s become highly noticeable. The current outage is a part of a series of outages that began on August 26th, becoming progressively serious with time.

Timeline Of ChatGPT Outage

August has seen numerous ChatGPT incidents, more than in July but so far the equaling the entire month of June. Some of the the incidents documented in July were related to the new GPT-4o language model.

In comparison, August has experienced elevated error rates, reaching a peak on August 28th where the amount of errors, Bad Gateway errors, were enough to cause a large blip on the Downdetector website.

ChatGPT Bad Gateway Error August 28, 2024

Most of the reported problems involved ChatGPT and the website, while 4% of reported outages were on the ChatGPT app.

OpenAI Incident Reports

The official OpenAI status page has a notation indicating severe outage levels.

Elevated error rates for ChatGPT
A fix has been implemented and we are monitoring the results.
Aug 28, 08:19 PDT

But that’s a part of a multi-day series of incidents:

Elevated error rates for gpt-4o-mini-2024-07-18 fine-tuned models
This issue has now been resolved. Thank you for your patience.
Aug 27, 12:27 – 14:14 PDT

Increased conversation latency in ChatGPT
Today, between 12:51AM – 12:51PM PT, conversations on ChatGPT experienced increased latency. This issue is now resolved.
Aug 26, 20:27 – 20:27 PD

A fix has apparently been deployed. If the outage is still ongoing for you then it may be something that has to propagate through datacenters or some other issue, perhaps related to the cloud gateway.

Data Confirms Disruptive Potential Of SearchGPT via @sejournal, @martinibuster

Researchers analyzed SearchGPT’s responses to queries and identified how it may impact publishers, B2B websites, and e-commerce, discovering key differences between SearchGPT, AI Overviews, and Perplexity.

What is SearchGPT?

SearchGPT is a prototype natural language search engine created by OpenAI that combines a generative AI model with the most current web data to provide contextually relevant answers in a natural language interface that includes citations to relevant online sources.

OpenAI has not offered detailed information about how SearchGPT accesses web information. But the fact that it uses generative AI models means that it likely uses Retrieval Augmented Generation (RAG), a technology that connects an AI language model to indexed web data to give it access to information that it wasn’t trained on. This enables AI search to provide contextually relevant answers that are up to date and grounded with authoritative and trustworthy web sources.

How BrightEdge Analyzed SearchGPT

BrightEdge used a pair of search marketing research tools developed for enterprise users to help identify search and content opportunities, emerging trends and conduct deep competitor analysis.

They used their proprietary DataCube X and the BrightEdge Generative Parser™ to extract data points from SearchGPT, AI Overviews and Perplexity.

Here’s how it was done:

“BrightEdge compared SearchGPT, Google’s AI Overviews, and Perplexity.

To evaluate SearchGPT against Google’s AI Overviews and Perplexity, BrightEdge utilized DataCube X alongside BrightEdge Generative Parser ™ to identify a high-volume term and question based on exact match volumes. These queries were then input into all three engines to evaluate their approach, intent interpretation, and answer-sourcing methods.

This comparative study employs real, popular searches within each sector to accurately reflect the performance of these engines for typical users.”

DataCube X was used for identifying high-volume keywords and questions, all volumes were based on exact matches.

Each search engine was analyzed for:

  1. Approach to the query
  2. Ability to interpret intent
  3. Method of sourcing answers

SearchGPT Versus Google AI Overviews

Research conducted by BrightEdge indicates that SearchGPT offers comprehensive answers while Google AI Overviews (AIO) provides answers that are more concise but also has an edge with surfacing current trends.

The difference found is that SearchGPT in its current state is better for deep research and Google AIO excels at giving quick answers that are also aware of current trends.

Strength: BrightEdge’s report indicates that SearchGPT answers rely on a diverse set of authoritative web resources that reflect academic, industry-specific, and government sources.

Weakness: The results of the report imply that SearchGPT’s weakness in a comparison with AIO is in the area of trends, where Google AIO was found to be more articulate.

SearchGPT Versus Perplexity

The researchers concluded that Perplexity offers concise answers that are tightly focused on topicality. This suggests that Perplexity, which styles itself as an “answer engine” shares the strengths with Google’s AIO in terms of providing concise answers. If I were to speculate I would say that this might reflect a focus on satisfaction metrics that are biased toward more immediate answers.

Strength: Because SearchGPT seems to be tuned more for research and on high quality information sources, it could be said to have an edge over Perplexity as a more comprehensive and potentially more trustworthy tool for research than Perplexity.

Weakness: Perplexity was found to be a more concise source of answers, excelling at summarizing online sources of information for answers to questions.

SearchGPT’s focus on facilitating research makes sense because the eventual context of SearchGPT is as a complement to ChatGPT.

Is SearchGPT A Competitor To Google?

SearchGPT is not a competitor to Google because OpenAI’s stated plans are to incorporate it into ChatGPT and not as a standalone search engine. SearchGPT’s official purpose is not as a standalone search engine but to be integrated into ChatGPT.

This is how OpenAI explains it:

“We also plan to get feedback on the prototype and bring the best of the experience into ChatGPT.

…Please note that we plan to integrate the SearchGPT experience into ChatGPT in the future. SearchGPT combines the strength of our AI models with information from the web to give you fast and timely answers with clear, relevant sources.”

Is SearchGPT then a competitor to Google? The more appropriate question is if ChatGPT is building toward disrupting the entire concept of organic search.

Google has done a fair job of exhausting and disenchanting users with ads, tracking and data mining their personal lives.  So it’s not implausible that a more capable version of ChatGPT could redefine how people get answers.

BrightEdge’s research discovered that SearchGPT’s strength was in facilitating trustworthy research. That makes even more sense with the understanding that SearchGPT is currently planned to be integrated into ChatGPT, not as a competitor to Google but as a competitor to the concept of organic search.

Takeaways: What SEOs And Marketers Need To Know

The major takeaways from the research can be broken down into five ways SearchGPT is better than Google AIO and Perplexity.

1. Diverse Authoritative Sources.
The research shows that SearchGPT consistently surfaces answers from authoritative and trustworthy sources.

“Its knowledge base spans academic resources, specialized industry platforms, official government pages, and reputable commercial websites.”

2. Comprehensive Answers
BrightEdge’s analysis showed that SearchGPT delivers comprehensive answers on any topic, simplifying them into clear, understandable responses.

3. Proactive Query Interpretation
This is really interesting because the researchers discovered that SearchGPT was not only able to understand the user’s immediate information need, it answered questions with an expanded breadth of coverage.

BrightEdge explained it like this:

“Its initial response often incorporates additional relevant information, illustrative examples, and real-world applications.”

4. Pragmatic And Practical
SearchGPT tended to provide practical answers that were good for ecommerce search queries. BrightEdge noted:

“It frequently offers specific product suggestions and recommendations.”

5. Wide-Ranging Topic Expertise
The research paper noted that SearchGPT correctly used industry jargon, even for esoteric B2B search queries. The researchers explained:

“This approach caters to both general users and industry professionals alike.”

Read the research results on SearchGPT here.

Featured Image by Shutterstock/Khosro

Google’s “Information Gain” Patent For Ranking Web Pages via @sejournal, @martinibuster

Google was recently granted a patent on ranking web pages, which may offer insights into how AI Overviews ranks content. The patent describes a method for ranking pages based on what a user might be interested in next.

Contextual Estimation Of Link Information Gain

The name of the patent is Contextual Estimation Of Link Information Gain, it was filed in 2018 and granted in June 2024. It’s about calculating a ranking score called Information Gain that is used to rank a second set of web pages that are likely to be of interest to a user as a slightly different follow-up topic related to a previous question.

The patent starts with general descriptions then adds layers of specifics over the course of paragraphs.  An analogy can be that it’s like a pizza. It starts out as a mozzarella pizza, then they add mushrooms, so now it’s a mushroom pizza. Then they add onions, so now it’s a mushroom and onion pizza. There are layers of specifics that build up to the entire context.

So if you read just one section of it, it’s easy to say, “It’s clearly a mushroom pizza” and be completely mistaken about what it really is.

There are layers of context but what it’s building up to is:

  • Ranking a web page that is relevant for what a user might be interested in next.
  • The context of the invention is an automated assistant or chatbot
  • A search engine plays a role in a way that seems similar to Google’s AI Overviews

Information Gain And SEO: What’s Really Going On?

A couple of months ago I read a comment on social media asserting that “Information Gain” was a significant factor in a recent Google core algorithm update.  That mention surprised me because I’d never heard of information gain before. I asked some SEO friends about it and they’d never heard of it either.

What the person on social media had asserted was something like Google was using an “Information Gain” score to boost the ranking of web pages that had more information than other web pages. So the idea was that it was important to create pages that have more information than other pages, something along those lines.

So I read the patent and discovered that “Information Gain” is not about ranking pages with more information than other pages. It’s really about something that is more profound for SEO because it might help to understand one dimension of how AI Overviews might rank web pages.

TL/DR Of The Information Gain Patent

What the information gain patent is really about is even more interesting because it may give an indication of how AI Overviews (AIO) ranks web pages that a user might be interested next.  It’s sort of like introducing personalization by anticipating what a user will be interested in next.

The patent describes a scenario where a user makes a search query and the automated assistant or chatbot provides an answer that’s relevant to the question. The information gain scoring system works in the background to rank a second set of web pages that are relevant to a what the user might be interested in next. It’s a new dimension in how web pages are ranked.

The Patent’s Emphasis on Automated Assistants

There are multiple versions of the Information Gain patent dating from 2018 to 2024. The first version is similar to the last version with the most significant difference being the addition of chatbots as a context for where the information gain invention is used.

The patent uses the phrase “automated assistant” 69 times and uses the phrase “search engine” only 25 times.  Like with AI Overviews, search engines do play a role in this patent but it’s generally in the context of automated assistants.

As will become evident, there is nothing to suggest that a web page containing more information than the competition is likelier to be ranked higher in the organic search results. That’s not what this patent talks about.

General Description Of Context

All versions of the patent describe the presentation of search results within the context of an automated assistant and natural language question answering. The patent starts with a general description and progressively becomes more specific. This is a feature of patents in that they apply for protection for the widest contexts in which the invention can be used and become progressively specific.

The entire first section (the Abstract) doesn’t even mention web pages or links. It’s just about the information gain score within a very general context:

“An information gain score for a given document is indicative of additional information that is included in the document beyond information contained in documents that were previously viewed by the user.”

That is a nutshell description of the patent, with the key insight being that the information gain scoring happens on pages after the user has seen the first search results.

More Specific Context: Automated Assistants

The second paragraph in the section titled “Background” is slightly more specific and adds an additional layer of context for the invention because it mentions  links. Specifically, it’s about a user that makes a search query and receives links to search results – no information gain score calculated yet.

The Background section says:

“For example, a user may submit a search request and be provided with a set of documents and/or links to documents that are responsive to the submitted search request.”

The next part builds on top of a user having made a search query:

“Also, for example, a user may be provided with a document based on identified interests of the user, previously viewed documents of the user, and/or other criteria that may be utilized to identify and provide a document of interest. Information from the documents may be provided via, for example, an automated assistant and/or as results to a search engine. Further, information from the documents may be provided to the user in response to a search request and/or may be automatically served to the user based on continued searching after the user has ended a search session.”

That last sentence is poorly worded.

Here’s the original sentence:

“Further, information from the documents may be provided to the user in response to a search request and/or may be automatically served to the user based on continued searching after the user has ended a search session.”

Here’s how it makes more sense:

“Further, information from the documents may be provided to the user… based on continued searching after the user has ended a search session.”

The information provided to the user is “in response to a search request and/or may be automatically served to the user”

It’s a little clearer if you put parentheses around it:

Further, information from the documents may be provided to the user (in response to a search request and/or may be automatically served to the user) based on continued searching after the user has ended a search session.

Takeaways:

  • The patent describes identifying documents that are relevant to the “interests of the user” based on “previously viewed documents” “and/or other criteria.”
  • It sets a general context of an automated assistant “and/or” a search engine
  • Information from the documents that are based on “previously viewed documents” “and/or other criteria” may be shown after the user continues searching.

More Specific Context: Chatbot

The patent next adds an additional layer of context and specificity by mentioning how chatbots can “extract” an answer from a web page (“document”) and show that as an answer. This is about showing a summary that contains the answer, kind of like featured snippets, but within the context of a chatbot.

The patent explains:

“In some cases, a subset of information may be extracted from the document for presentation to the user. For example, when a user engages in a spoken human-to-computer dialog with an automated assistant software process (also referred to as “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “personal voice assistants,” “conversational agents,” “virtual assistants,” etc.), the automated assistant may perform various types of processing to extract salient information from a document, so that the automated assistant can present the information in an abbreviated form.

As another example, some search engines will provide summary information from one or more responsive and/or relevant documents, in addition to or instead of links to responsive and/or relevant documents, in response to a user’s search query.”

The last sentence sounds like it’s describing something that’s like a featured snippet or like AI Overviews where it provides a summary. The sentence is very general and ambiguous because it uses “and/or” and “in addition to or instead of” and isn’t as specific as the preceding sentences. It’s an example of a patent being general for legal reasons.

Ranking The Next Set Of Search Results

The next section is called the Summary and it goes into more details about how the Information Gain score represents how likely the user will be interested in the next set of documents. It’s not about ranking search results, it’s about ranking the next set of search results (based on a related topic).

It states:

“An information gain score for a given document is indicative of additional information that is included in the given document beyond information contained in other documents that were already presented to the user.”

Ranking Based On Topic Of Web Pages

It then talks about presenting the web page in a browser, audibly reading the relevant part of the document or audibly/visually presenting a summary of the document (“audibly/visually presenting salient information extracted from the document to the user, etc.”)

But the part that’s really interesting is when it next explains using a topic of the web page as a representation of the the content, which is used to calculate the information gain score.

It describes many different ways of extracting the representation of what the page is about. But what’s important is that it’s describes calculating the Information Gain score based on a representation of what the content is about, like the topic.

“In some implementations, information gain scores may be determined for one or more documents by applying data indicative of the documents, such as their entire contents, salient extracted information, a semantic representation (e.g., an embedding, a feature vector, a bag-of-words representation, a histogram generated from words/phrases in the document, etc.) across a machine learning model to generate an information gain score.”

The patent goes on to describe ranking a first set of documents and using the Information Gain scores to rank additional sets of documents that anticipate follow up questions or a progression within a dialog of what the user is interested in.

The automated assistant can in some implementations query a search engine and then apply the Information Gain rankings to the multiple sets of search results (that are relevant to related search queries).

There are multiple variations of doing the same thing but in general terms this is what it describes:

“Based on the information gain scores, information contained in one or more of the new documents may be selectively provided to the user in a manner that reflects the likely information gain that can be attained by the user if the user were to be presented information from the selected documents.”

What All Versions Of The Patent Have In Common

All versions of the patent share general similarities over which more specifics are layered in over time (like adding onions to a mushroom pizza). The following are the baseline of what all the versions have in common.

Application Of Information Gain Score

All versions of the patent describe applying the information gain score to a second set of documents that have additional information beyond the first set of documents. Obviously, there is no criteria or information to guess what the user is going search for when they start a search session. So information gain scores are not applied to the first search results.

Examples of passages that are the same for all versions:

  • A second set of documents is identified that is also related to the topic of the first set of documents but that have not yet been viewed by the user.
  • For each new document in the second set of documents, an information gain score is determined that is indicative of, for the new document, whether the new document includes information that was not contained in the documents of the first set of documents…

Automated Assistants

All four versions of the patent refer to automated assistants that show search results in response to natural language queries.

The 2018 and 2023 versions of the patent both mention search engines 25 times. The 2o18 version mentions “automated assistant” 74 times and the latest version mentions it 69 times.

They all make references to “conversational agents,” “interactive personal assistants,” “intelligent personal assistants,” “personal voice assistants,” and “virtual assistants.”

It’s clear that the emphasis of the patent is on automated assistants, not the organic search results.

Dialog Turns

Note: In everyday language we use the word dialogue. In computing they the spell it dialog.

All versions of the patents refer to a way of interacting with the system in the form of a dialog, specifically a dialog turn. A dialog turn is the back and forth that happens when a user asks a question using natural language, receives an answer and then asks a follow up question or another question altogether. This can be natural language in text, text to speech (TTS), or audible.

The main aspect the patents have in common is the back and forth in what is called a “dialog turn.” All versions of the patent have this as a context.

Here’s an example of how the dialog turn works:

“Automated assistant client 106 and remote automated assistant 115 can process natural language input of a user and provide responses in the form of a dialog that includes one or more dialog turns. A dialog turn may include, for instance, user-provided natural language input and a response to natural language input by the automated assistant.

Thus, a dialog between the user and the automated assistant can be generated that allows the user to interact with the automated assistant …in a conversational manner.”

Problems That Information Gain Scores Solve

The main feature of the patent is to improve the user experience by understanding the additional value that a new document provides compared to documents that a user has already seen. This additional value is what is meant by the phrase Information Gain.

There are multiple ways that information gain is useful and one of the ways that all versions of the patent describes is in the context of an audio response and how a long-winded audio response is not good, including in a TTS (text to speech) context).

The patent explains the problem of a long-winded response:

“…and so the user may wait for substantially all of the response to be output before proceeding. In comparison with reading, the user is able to receive the audio information passively, however, the time taken to output is longer and there is a reduced ability to scan or scroll/skip through the information.”

The patent then explains how information gain can speed up answers by eliminating redundant (repetitive) answers or if the answer isn’t enough and forces the user into another dialog turn.

This part of the patent refers to the information density of a section in a web page, a section that answers the question with the least amount of words. Information density is about how “accurate,” “concise,” and “relevant”‘ the answer is for relevance and avoiding repetitiveness. Information density is important for audio/spoken answers.

This is what the patent says:

“As such, it is important in the context of an audio output that the output information is relevant, accurate and concise, in order to avoid an unnecessarily long output, a redundant output, or an extra dialog turn.

The information density of the output information becomes particularly important in improving the efficiency of a dialog session. Techniques described herein address these issues by reducing and/or eliminating presentation of information a user has already been provided, including in the audio human-to-computer dialog context.”

The idea of “information density” is important in a general sense because it communicates better for users but it’s probably extra important in the context of being shown in chatbot search results, whether it’s spoken or not. Google AI Overviews shows snippets from a web page but maybe more importantly, communicating in a concise manner is the best way to be on topic and make it easy for a search engine to understand content.

Search Results Interface

All versions of the Information Gain patent are clear that the invention is not in the context of organic search results. It’s explicitly within the context of ranking web pages within a natural language interface of an automated assistant and an AI chatbot.

However, there is a part of the patent that describes a way of showing users with the second set of results within a “search results interface.” The scenario is that the user sees an answer and then is interested in a related topic. The second set of ranked web pages are shown in a “search results interface.”

The patent explains:

“In some implementations, one or more of the new documents of the second set may be presented in a manner that is selected based on the information gain stores. For example, one or more of the new documents can be rendered as part of a search results interface that is presented to the user in response to a query that includes the topic of the documents, such as references to one or more documents. In some implementations, these search results may be ranked at least in part based on their respective information gain scores.”

…The user can then select one of the references and information contained in the particular document can be presented to the user. Subsequently, the user may return to the search results and the references to the document may again be provided to the user but updated based on new information gain scores for the documents that are referenced.

In some implementations, the references may be reranked and/or one or more documents may be excluded (or significantly demoted) from the search results based on the new information gain scores that were determined based on the document that was already viewed by the user.”

What is a search results interface? I think it’s just an interface that shows search results.

Let’s pause here to underline that it should be clear at this point that the patent is not about ranking web pages that are comprehensive about a topic. The overall context of the invention is showing documents within an automated assistant.

A search results interface is just an interface, it’s never described as being organic search results, it’s just an interface.

There’s more that is the same across all versions of the patent but the above are the important general outlines and context of it.

Claims Of The Patent

The claims section is where the scope of the actual invention is described and for which they are seeking legal protection over. It is mainly focused on the invention and less so on the context. Thus, there is no mention of a search engines, automated assistants, audible responses, or TTS (text to speech) within the Claims section. What remains is the context of search results interface which presumably covers all of the contexts.

Context: First Set Of Documents

It starts out by outlining the context of the invention. This context is receiving a query, identifying the topic, and ranking a first group of relevant web pages (documents) and selecting at least one of them as being relevant and either showing the document or communicating the information from the document (like a summary).

“1. A method implemented using one or more processors, comprising: receiving a query from a user, wherein the query includes a topic; identifying a first set of documents that are responsive to the query, wherein the documents of the set of documents are ranked, and wherein a ranking of a given document of the first set of documents is indicative of relevancy of information included in the given document to the topic; selecting, based on the rankings and from the documents of the first set of documents, a most relevant document providing at least a portion of the information from the most relevant document to the user;”

Context: Second Set Of Documents

Then what immediately follows is the part about ranking a second set of documents that contain additional information. This second set of documents is ranked using the information gain scores to show more information after showing a relevant document from the first group.

This is how it explains it:

“…in response to providing the most relevant document to the user, receiving a request from the user for additional information related to the topic; identifying a second set of documents, wherein the second set of documents includes at one or more of the documents of the first set of documents and does not include the most relevant document; determining, for each document of the second set, an information gain score, wherein the information gain score for a respective document of the second set is based on a quantity of new information included in the respective document of the second set that differs from information included in the most relevant document; ranking the second set of documents based on the information gain scores; and causing at least a portion of the information from one or more of the documents of the second set of documents to be presented to the user, wherein the information is presented based on the information gain scores.”

Granular Details

The rest of the claims section contains granular details about the concept of Information Gain, which is a ranking of documents based on what the user already has seen and represents a related topic that the user may be interested in. The purpose of these details is to lock them in for legal protection as part of the invention.

Here’s an example:

The method of claim 1, wherein identifying the first set comprises:
causing to be rendered, as part of a search results interface that is presented to the user in response to a previous query that includes the topic, references to one or more documents of the first set;
receiving user input that that indicates selection of one of the references to a particular document of the first set from the search results interface, wherein at least part of the particular document is provided to the user in response to the selection;

To make an analogy, it’s describing how to make the pizza dough, clean and cut the mushrooms, etc. It’s not important for our purposes to understand it as much as the general view of what the patent is about.

Information Gain Patent

An opinion was shared on social media that this patent has something to do with ranking web pages in the organic search results, I saw it, read the patent and discovered that’s not how the patent works. It’s a good patent and it’s important to correctly understand it. I analyzed multiple versions of the patent to see what they  had in common and what was different.

A careful reading of the patent shows that it is clearly focused on anticipating what the user may want to see based on what they have already seen. To accomplish this the patent describes the use of an Information Gain score for ranking web pages that are on topics that are related to the first search query but not specifically relevant to that first query.

The context of the invention is generally automated assistants, including chatbots. A search engine could be used as part of finding relevant documents but the context is not solely an organic search engine.

This patent could be applicable to the context of AI Overviews. I would not limit the context to AI Overviews as there are additional contexts such as spoken language in which Information Gain scoring could apply. Could it apply in additional contexts like Featured Snippets? The patent itself is not explicit about that.

Read the latest version of Information Gain patent:

Contextual estimation of link information gain

Featured Image by Shutterstock/Khosro

Anthropic Announces Prompt Caching with Claude via @sejournal, @martinibuster

Anthropic announced announced a new Prompt Caching with Claude feature that boosts Claude’s capabilities for repetitive tasks with large amounts of detailed contextual information. The new feature makes it faster, cheaper and more powerful, available today in Beta through the Anthropic API.

Prompt Caching

This new feature provides a powerful boosts for users that consistently use highly detailed instructions that use example responses and contain a large amount of background information in the prompt, enabling Claude to re-use the data with the cache. This improves the consistency of output, speeds up Claude responses by to 50% (lower latency), and it also makes it up to 90% cheaper to use.

Prompt Caching with Claude is especially useful for complex projects that rely on the same data and is useful for businesses of all sizes, not just enterprise level organizations. This feature is available in a public Beta via the Anthropic API for use with Claude 3.5 Sonnet and Claude 3 Haiku.

The announcement lists the following ways Prompt Caching improves performance:

  • “Conversational agents: Reduce cost and latency for extended conversations, especially those with long instructions or uploaded documents.
  • Large document processing: Incorporate complete long-form material in your prompt without increasing response latency.
  • Detailed instruction sets: Share extensive lists of instructions, procedures, and examples to fine-tune Claude’s responses without incurring repeated costs.
  • Coding assistants: Improve autocomplete and codebase Q&A by keeping a summarized version of the codebase in the prompt.
  • Agentic tool use: Enhance performance for scenarios involving multiple tool calls and iterative code changes, where each step typically requires a new API call.”

More information about the Anthropic API here:

Build with Claude

Explore latest models – Pricing

Featured Image by Shutterstock/gguy