Sentence-Level Semantic Internal Links For SEO via @sejournal, @martinibuster

Internal in-content linking practices have remained the same for the past twenty years, which is strange because Google has gone through dramatic changes within the last ten years and even more so in the past five. It may be time to consider freshening up internal linking strategies so that they more closely align with how Google understands and ranks webpages.

Standard Internal Linking Practices

When considering a new way of doing something, it’s important to keep an open mind because what follows will almost be startling, like a slap on the face.

Raise your hand if this is you:

An SEO is writing or updating content and comes across a keyword phrase that’s a match for the keywords targeted by an inner page, so those words get turned into anchor text.

Okay, you can put your hand down. 🙂

I expect that there will be a lot of hands raised and that’s okay because it’s how everybody does it.

As an example, I visited a so-called “white hat” website that offers an SEO-related service and in an article about a sub-topic of “internal linking” they link to another page about What Is Internal Linking using the anchor text “internal linking.”

The target page is an exact match for the two-word phrase targeted by the second page. The standard practice is if you find a keyword match for another internal page then turn it into an anchor text to the target page, right?

But it’s not right.

The sentence containing that anchor text and the paragraph that contains it are about the importance of internal linking for getting internal pages indexed and ranked. The target page is an explainer about is a general page about What Is Internal Linking.

If you think like an SEO then there’s nothing wrong with that link because the anchor text matches the target keyword of the second page.

But if you think like a site visitor who is reading the first page then what is the chance that the reader will stop reading and click the link to learn about What Is Internal Linking?

Quite likely it would be zero percent of readers would click on the link because the link is not contextually relevant.

What Does A Machine Think About It?

To see what a machine thought about that sentence I copied it asked ChatGPT:

ChatGPT replied:

“The sentence highlights the critical role of internal linking in SEO strategies.”

I then asked ChatGPT to summarize the paragraph in fifteen words or less and it responded:

“Internal linking is crucial for website indexing and ranking, with link context being particularly important.”

The context of both the sentence and the paragraph is the importance of internal links but not What Is Internal Linking.

The irony of the above example is that I pulled it from a webpage that was on the topic of the importance of context for internal linking, which shows how deeply engrained the idea is that the only context needed for an internal link is the anchor text.

But that’s not how Google understands context.

The takeaway is that for an internal link to be contextual, it’s important to consider the meaning of the sentence and the paragraph in which it exists.

What Internal Linking Is Not

There are decades-old precepts about internal linking that are commonly accepted as canonical without sufficient critical examination.

Here are a few examples:

  • Put your internal links closer to the top of the webpage.
  • Internal links are for helping other pages rank well.
  • Internal links are for helping other pages get indexed.
  • Use keyword-rich anchor text but make them look natural.
  • Internal linking is important for Google.
  • Add internal links to your most important webpage on a topic from all of the subtopic pages.

What’s missing from the above commonly accepted ideas about internal linking is that none of that has anything to do with the site visitors that are reading the content.

Those ideas aren’t even connected to how Google analyzes and understands webpages and as a consequence they’re not really what internal linking should be about. So before identifying a modern way to link internally that is in line with the modern search engine it’s useful to understand how Google is understanding webpages.

Taxonomy Of Topics In Webpage Content

A taxonomy is a way of classifying something and every well organized webpage can be subdivided into an overall topic and the subtopics beneath it, one flowing into the other so that the overall topic describes what all the subtopics as a group are about and also each subtopic describes an aspect of the main topic in what can be called a Taxonomy of Topics, the hidden structure within the content.

A webpage is called unstructured data. But in order to make sense of it Google has to impose some structure on it. So a webpage is divided into sections like the header, navigation, main content, sidebar and footer.

Google’s Martin Splitt went further and said that the main content is analyzed for the Centerpiece Annotation, a description of what the topic is about, explaining:

“That’s just us analyzing the content and… we have a thing called the Centerpiece Annotation, for instance, and there’s a few other annotations that we have where we look at the semantic content, as well as potentially the layout tree.

But fundamentally we can read that from the content structure in HTML already and figure out so “Oh! This looks like from all the natural language processing that we did on this entire text content here that we got, it looks like this is primarily about topic A, dog food.”

The centerpiece annotation is Google’s estimation of what the content is about and it identifies it by reading it from the Content Structure.

It is that content structure that can be called the Taxonomy of Topics, where a page of content is planned and created according to a topic and the subtopics.

Semantic Content Structure And Internal Links

Content has a hidden semantic structure that can be referred to as the Taxonomy of Topics.

A well constructed webpage has an overall structure that generally looks like this:

Introductory paragraph that introduces the main topic
 -Subtopic 1 (a content block)
 -Subtopic 2 (a content block)
 -Subtopic 3 (a content block)
Ending paragraph that wraps everything up

Subtopics actually have their own hierarchy as well, like this:

Subtopic 1
 -Paragraph A
 -Paragraph B
 -Paragraph C

And each paragraph also has their own hierarchy like this:

Paragraph A
 -Sentence 1
 -Sentence 2
 -Sentence 3
 -Sentence 4

The above outline is an example of how unstructured data like a webpage has a hidden structure that can help a machine understand it better by labeling it with a Centerpiece Annotation, for example.

Given that Google views content as a series of topics and subtopics that are organized in a “content structure” with headings (H1, H2) demarcating each block of content, doesn’t it make sense to also consider internal linking in the same way?

For example, my links to the Taxonomy of Topics article and the source of the Martin Splitt quote are contextually relevant and many readers of this article may likely to follow those links because they expand on the content in an interesting way, they are… contextually relevant.

And being contextually relevant, in my opinion it’s likely that Google will also find the topic matter of the the linked pages to also be relevant.

I didn’t link them to get them crawled or for ranking purposes either. I linked to them because they’re useful to readers and expand on the surrounding content in which those links are embedded.

Semantic Relevance And Contextual Internal Links

For more than ten years I’ve been encouraging the SEO industry to let go of their keywords and start thinking in terms of topics and it’s great to finally see more of the industry finally get it and start thinking about content in terms of what it means at the semantic level.

Now take the next step and let go of that “keyword-targeted” mindset and apply that understanding to internal links. Doing so makes sense for SEO and also for readers. In my 25 years of hands-on experience with SEO, I can say with confidence that the most future-proofed SEO strategy is one that thinks about the impact to site visitors because that’s how Google is looking at pages, too.

Featured Image by Shutterstock/Iconic Bestiary

Google SearchLiason: 4 Reasons Why A Webpage Couldn’t Rank via @sejournal, @martinibuster

Someone asked on Twitter why their articles aren’t ranking well and Google SearchLiaison surprised everyone with a mini site audit of things needing to be fixed.

A person (@iambrandonsalt) tweeted on X (formerly Twitter) asking if anyone could offer an explanation of why some pages of their site was having problems ranking.

He tweeted:

“Does anyone have an explanation to why some of our articles are not showing up in the SERPs… at all?

I’ll update the article with new info, it pops back in and ranks well, then disappears again.

This is happening to lots of our great content, it’s very frustrating :(“

That person subsequently shared the URL of the site under discussion and that’s when SearchLiaison tweeted a response.

Google SearchLiason Mini Site Audit

SearchLiaison’s mini audit spotted three problems that may be causing the site to underperform in the search engine results pages (SERPs).

Overview Of Why A Webpage Is Underperforming In SERPs

Below is an outline of four things that SearchLiaison called attention to. I wouldn’t take what they say as indicative of actual ranking factors.

But I would encourage taking his advice seriously. SearchLiaison/Danny Sullivan, has been involved in search for almost 28 years and now he’s working on the inside at Google.

So he understands what it’s like to be on the outside, which makes him a unique and valuable resource to listen to.

Four Highlighted Content Issues

1. Original Content Is Not Apparent

2. A lack of content that demonstrates experience

3. Unsatisfying Content

4. Stale Content That Doesn’t Deliver

The above are the three main reasons why SearchLiaison felt the webpage was having trouble ranking in the SERPs.

1. Originality Front And Center

Here is his post that offers the specific details:

Here’s the part where he called out the seeming lack of original content:

“Took a look. Will share a few things that maybe might be generally helpful. At first glance, it wasn’t clear to me that there was much original content here.

It looks and feels at first glance like a typical “here’s a bunch of product pages.”

I really had to go into it further to understand there’s original stuff going on.”

It’s great to have original content and it should be readily apparent. I think what SearchLiaison meant when he said that the page looked like “a bunch of product pages” is that the content was a list of features.

One can rewrite what the product features are but that doesn’t make it original. The words may be original and even unique but what they communicate is not original.

Going further from what SearchLiaison said, I would add that what’s lacking is any sign that the person writing the content has actually handled the product, which relates to experience.

2. Does Content Demonstrate Experience?

And yes! SearchLiaison also talked about experience.

He wrote:

“Deck 1, 4 and 9 have long video reviews, it looks like — so cool, you’ve used them, have experiences to share. That’s all great. Maybe make that a bit clearer to the reader? But … it could also be me.”

What SearchLiaison may mean is that reading the content there’s no mention of the physical properties of the product are. Is it light? Does it fit well in the hand? Does it feel cheap? The content is largely a list of product features, a way of writing that doesn’t communicate experience.

3. Unsatisfying Content

An important thing about content is that it should satisfy the reader.

Reader satisfaction is so important the Google Search Quality Raters Guide emphasizes this for the main content (MC):

“Consider the extent to which the MC is satisfying and helps the page achieve its purpose.”

This is what SearchLiaison said:

This is what SearchLiaison said:

“But most of the other devices … don’t exist yet.

You’re promising the reader that these are the best alternatives for 2024. And maybe some of these will be, but if they don’t exist yet, that’s potentially a bummer and unsatisifying to people coming to this page?

Maybe those upcoming devices belong a page about — upcoming devices?”

4. Stale Content That Doesn’t Deliver

An issue SearchLiaison picked up on is content that it is out of date and because of that it doesn’t deliver what it is promising to give.

The problem with some of the content is exactly what SearchLiaison says, that’s it’s out of date.

SearchLiaison observed:

“You also mentioned updating the page and … it feels out-of-date, so what’s being updated on it?

“As of today (April 22nd) the Rog Ally is not out yet, and it was just announced on April 1st” is on the article dated today, Jan 29, and you’d said on Jan 27 this page has also been updated, so what significant change is actually happening to warrant a new byline date?

“At the time of writing, the Lenovo Legion Go isn’t currently out, but all signs are pointing towards an October 2023 release date” — same thing, confusing to be out-of-date on a page claiming to be fresh as of today.

“The IndieGoGo pages goes live on September 5th, so bookmark it and get ready to make a very wise purchase!” — again, out-of-date.”

Advice Is Not Ranking Factors

SearchLiaison ended his critique by stating that none of what he said should be taken to be examples of ranking factors but rather things that tend to align with what Google is looking for.

“Clearly, you put work into some of the video reviews.

Maybe that needs to be more evident with some of the written write-ups. And mixing out-of-date info on a page that claims to be fresh isn’t a great experience.

It’s not that any or all of these things are direct ranking factors, and changing them won’t guarantee to move you up.

But the systems overall are designed to reward reliable helpful content meant for people, so the more this page aligns with that goal, the more you’re potentially going to be successful with it.”

Self-Assessment In Site Auditing

Sometimes it is difficult to critique ones own site. So it’s helpful to seek an outside opinion. One doesn’t necessarily need a full-blown site audit, sometimes critiquing a single page can provide a wealth of helpful information.

Featured Image by Shutterstock/Mix and Match Studio

Why Google SGE Is Stuck In Google Labs And What’s Next via @sejournal, @martinibuster

Google Search Generative Experience (SGE) was set to expire as a Google Labs experiment at the end of 2023 but its time as an experiment was quietly extended, making it clear that SGE is not coming to search in the near future. Surprisingly, letting Microsoft take the lead may have been the best perhaps unintended approach for Google.

Google’s AI Strategy For Search

Google’s decision to keep SGE as a Google Labs project fits into the broader trend of Google’s history of preferring to integrate AI in the background.

The presence of AI isn’t always apparent but it has been a part of Google Search in the background for longer than most people realize.

The very first use of AI in search was as part of Google’s ranking algorithm, a system known as RankBrain. RankBrain helped the ranking algorithms understand how words in search queries relate to concepts in the real world.

According to Google:

“When we launched RankBrain in 2015, it was the first deep learning system deployed in Search. At the time, it was groundbreaking… RankBrain (as its name suggests) is used to help rank — or decide the best order for — top search results.”

The next implementation was Neural Matching which helped Google’s algorithms understand broader concepts in search queries and webpages.

And one of the most well known AI systems that Google has rolled out is the Multitask Unified Model, also known as Google MUM.  MUM is a multimodal AI system that encompasses understanding images and text and is able to place them within the contexts as written in a sentence or a search query.

SpamBrain, Google’s spam fighting AI is quite likely one of the most important implementations of AI as a part of Google’s search algorithm because it helps weed out low quality sites.

These are all examples of Google’s approach to using AI in the background to solve different problems within search as a part of the larger Core Algorithm.

It’s likely that Google would have continued using AI in the background until the transformer-based large language models (LLMs) were able to step into the foreground.

But Microsoft’s integration of ChatGPT into Bing forced Google to take steps to add AI in a more foregrounded way with  their Search Generative Experience (SGE).

Why Keep SGE In Google Labs?

Considering that Microsoft has integrated ChatGPT into Bing, it might seem curious that Google hasn’t taken a similar step and is instead keeping SGE in Google Labs. There are good reasons for Google’s approach.

One of Google’s guiding principles for the use of AI is to only use it once the technology is proven to be successful and is implemented in a way that can be trusted to be responsible and those are two things that generative AI is not capable of today.

There are at least three big problems that must be solved before AI can successfully be integrated in the foreground of search:

  1. LLMs cannot be used as an information retrieval system because it needs to be completely retrained in order to add new data. .
  2. Transformer architecture is inefficient and costly.
  3. Generative AI tends to create wrong facts, a phenomenon known as hallucinating.

Why AI Cannot Be Used As A Search Engine

One of the most important problems to solve before AI can be used as the backend and the frontend of a search engine is that LLMs are unable to function as a search index where new data is continuously added.

In simple terms, what happens is that in a regular search engine, adding new webpages is a process where the search engine computes the semantic meaning of the words and phrases within the text (a process called “embedding”), which makes them searchable and ready to be integrated into the index.

Afterwards the search engine has to update the entire index in order to understand (so to speak) where the new webpages fit into the overall search index.

The addition of new webpages can change how the search engine understands and relates all the other webpages it knows about, so it goes through all the webpages in its index and updates their relations to each other if necessary. This is a simplification for the sake of communicating the general sense of what it means to add new webpages to a search index.

In contrast to current search technology, LLMs cannot add new webpages to an index because the act of adding new data requires a complete retraining of the entire LLM.

Google is researching how to solve this problem in order create a transformer-based LLM search engine, but the problem is not solved, not even close.

To understand why this happens, it’s useful to take a quick look at a recent Google research paper that is co-authored by Marc Najork and Donald Metzler (and several other co-authors). I mention their names because both of those researchers are almost always associated with some of the most consequential research coming out of Google. So if it has either of their names on it, then the research is likely very important.

In the following explanation, the search index is referred to as memory because a search index is a memory of what has been indexed.

The research paper is titled: “DSI++: Updating Transformer Memory with New Documents” (PDF)

Using LLMs as search engines is a process that uses a technology called Differentiable Search Indices (DSIs). The current search index technology is referenced as a dual-encoder.

The research paper explains:

“…index construction using a DSI involves training a Transformer model. Therefore, the model must be re-trained from scratch every time the underlying corpus is updated, thus incurring prohibitively high computational costs compared to dual-encoders.”

The paper goes on to explore ways to solve the problem of LLMs that “forget” but at the end of the study they state that they only made progress toward better understanding what needs to be solved in future research.

They conclude:

“In this study, we explore the phenomenon of forgetting in relation to the addition of new and distinct documents into the indexer. It is important to note that when a new document refutes or modifies a previously indexed document, the model’s behavior becomes unpredictable, requiring further analysis.

Additionally, we examine the effectiveness of our proposed method on a larger dataset, such as the full MS MARCO dataset. However, it is worth noting that with this larger dataset, the method exhibits significant forgetting. As a result, additional research is necessary to enhance the model’s performance, particularly when dealing with datasets of larger scales.”

LLMs Can’t Fact Check Themselves

Google and many others are also researching multiple ways to have AI fact check itself in order to keep from giving false information (referred to as hallucinations). But so far that research is not making significant headway.

Bing’s Experience Of AI In The Foreground

Bing took a different route by incorporating AI directly into its search interface in a hybrid approach that joined a traditional search engine with an AI frontend. This new kind of search engine revamped the search experience and differentiated Bing in the competition for search engine users.

Bing’s AI integration initially created significant buzz, drawing users intrigued by the novelty of an AI-driven search interface. This resulted in an increase in Bing’s user engagement.

But after nearly a year of buzz, Bing’s market share saw only a marginal increase. Recent reports, including one from the Boston Globe, indicate less than 1% growth in market share since the introduction of Bing Chat.

Google’s Strategy Is Validated In Hindsight

Bing’s experience suggests that AI in the foreground of a search engine may not be as effective as hoped. The modest increase in market share raises questions about the long-term viability of a chat-based search engine and validates Google’s cautionary approach of using AI in the background.

Google’s focusing of AI in the background of search is vindicated in light of Bing’s failure to cause users to abandon Google for Bing.

The strategy of keeping AI in the background, where at this point in time it works best, allowed Google to maintain users while AI search technology matures in Google Labs where it belongs.

Bing’s approach of using AI in the foreground now serves as almost a cautionary tale about the pitfalls of rushing out a technology before the benefits are fully understood, providing insights into the limitations of that approach.

Ironically, Microsoft is finding better ways to integrate AI as a background technology in the form of useful features added to their cloud-based office products.

Future Of AI In Search

The current state of AI technology suggests that it’s more effective as a tool that supports the functions of a search engine rather than serving as the entire back and front ends of a search engine or even as a hybrid approach which users have refused to adopt.

Google’s strategy of releasing new technologies only when they have been fully tested explains why Search Generative Experience belongs in Google Labs.

Certainly, AI will take a bolder role in search but that day is definitely not today. Expect to see Google adding more AI based features to more of their products and it might not be surprising to see Microsoft continue along that path as well.

Featured Image by Shutterstock/ProStockStudio